Menu Home

Training Overview

Win-Vector LLC offers customized instructor led corporate training on topics of statistics, machine learning and data science in both Python and R.

Our training is tuned to the needs of your team. Content is based on current literature, best practices, and our own consulting experience. We then scope the course and deliver on-site custom training, including programming and analysis exercises, with real-world data sets. When possible, we can adapt your data to our exercises.

Data Science Intensive for Engineers

Data Science Intensive for Engineers is a 2 week workshop to give software engineers a solid experience in current data science topics. Topics include:

  • A/B Testing (or controlled business experiments)
  • Machine Learning
  • The statistics and probability theory needed to understand the above.

This intensive differs from a traditional course in that the topics are deliberately limited, and the experience emphasizes working concrete problems together in addition to lectures. The course is designed and taught by a team of working data science authors, practitioners, and consultants. The current material represents tuning over the course of 2 years, 12 successful cohorts, and over 400 students.

Because of the increasing importance of data science in modern business decision making, it is critical that software engineers across the organization have some literacy in data science topics and methods. This enables them to interact more productively with data science teams, anticipate future needs, and even apply sophisticated data analysis techniques in their own software projects.

For the past 2 years this course has been used as the sole “AI 200 for Engineers” training program at a GAFAM “Big Five” tech company (name available on request). The course has gotten very high reviews, and helped previous participants design and successfully execute projects. Some previous students have described the course as “transformative.”


Data Science Intensive for Engineers is currently designed as 8 three-quarter day engagements. Previously these were conducted in person; for now they are done online.

The material is taught in Python, using:

  • SciKit-Learn and Pandas
  • Jupyter notebooks
  • TensorFlow and Keras

We selected this toolset because they are the easiest and most generic tools to learn on, and experiences gained here translate easily to actual production and big data environments. Each day’s schedule includes periods of lectures, and periods of the instructor working notebooks with the class to give students experience using these tools.

The prerequisites are familiarity with Python, and some comfort with basic statistics or probability.

Material covered:

  1. Using Python, Pandas, and Jupyter.
  2. Basic probability and statistics.
  3. Linear and Logistic regression (prediction of quantities and decisions).
  4. Meaningful classification and regression metrics
  5. A/B testing or controlled experiments.
  6. Tree based machine learning such as xgboost (a contest winning methodology).
  7. Basic introduction to deep learning

Topics can be dropped or added as desired. More detailed outlines can be discussed. The course can also be adapted to R.

Participants keep all course materials and are very much encouraged to ask questions. The experience is timed and designed to support a high degree of interaction.

For quotes, reference customers, or discussion please reach out to us at .

We also offer data science consulting and contracting.