New! Updated for Winter 2019 with extra content on feature engineering, regularization techniques, and tuning neural networks – as well as Tensorflow 2.0!
Machine Learning and artificial intelligence (AI) is everywhere; if you want to know how companies like Google, Amazon, and even Udemy extract meaning and insights from massive data sets, this data science course will give you the fundamentals you need. Data Scientists enjoy one of the toppaying jobs, with an average salary of $120,000 according to Glassdoor and Indeed. That’s just the average! And it’s not just about money – it’s interesting work too!
If you’ve got some programming or scripting experience, this course will teach you the techniques used by real data scientists and machine learning practitioners in the tech industry – and prepare you for a move into this hot career path. This comprehensive machine learning tutorial includes over 100 lectures spanning 14 hours of video, and most topics include handson Python code examples you can use for reference and for practice. Iâ€™ll draw on my 9 years of experience at Amazon and IMDb to guide you through what matters, and what doesnâ€™t.
Each concept is introduced in plain English, avoiding confusing mathematical notation and jargon. Itâ€™s then demonstrated using Python code you can experiment with and build upon, along with notes you can keep for future reference. You won’t find academic, deeply mathematical coverage of these algorithms in this course – the focus is on practical understanding and application of them. At the end, you’ll be given a final project to apply what you’ve learned!
The topics in this course come from an analysis of real requirements in data scientist job listings from the biggest tech employers. We’ll cover the machine learning, AI, and data mining techniques real employers are looking for, including:

Deep Learning / Neural Networks (MLP’s, CNN’s, RNN’s) with TensorFlow and Keras

Data Visualization in Python with MatPlotLib and Seaborn

Transfer Learning

Sentiment analysis

Image recognition and classification

Regression analysis

KMeans Clustering

Principal Component Analysis

Train/Test and cross validation

Bayesian Methods

Decision Trees and Random Forests

Multiple Regression

MultiLevel Models

Support Vector Machines

Reinforcement Learning

Collaborative Filtering

KNearest Neighbor

Bias/Variance Tradeoff

Ensemble Learning

Term Frequency / Inverse Document Frequency

Experimental Design and A/B Tests

Feature Engineering

Hyperparameter Tuning
…and much more! There’s also an entire section on machine learning with Apache Spark, which lets you scale up these techniques to “big data” analyzed on a computing cluster. And you’ll also get access to this course’s Facebook Group, where you can stay in touch with your classmates.
If you’re new to Python, don’t worry – the course starts with a crash course. If you’ve done some programming before, you should pick it up quickly. This course shows you how to get set up on Microsoft Windowsbased PC’s, Linux desktops, and Macs.
If youâ€™re a programmer looking to switch into an exciting new career track, or a data analyst looking to make the transition into the tech industry â€“ this course will teach you the basic techniques used by realworld industry data scientists. These are topics any successful technologist absolutely needs to know about, so what are you waiting for? Enroll now!

“I started doing your course in 2015… Eventually I got interested and never thought that I will be working for corporate before a friend offered me this job. I am learning a lot which was impossible to learn in academia and enjoying it thoroughly. To me, your course is the one that helped me understand how to work with corporate problems. How to think to be a success in corporate AI research. I find you the most impressive instructor in ML, simple yet convincing.” – Kanad Basu, PhD
Getting Started
What to expect in this course, who it's for, and the general format we'll follow.
In a crash course on Python and what's different about it, we'll cover the importance of whitespace in Python scripts, and how to import Python modules.
In part 2 of our Python crash course, we'll cover Python data structures including lists, tuples, and dictionaries.
In this lesson, we'll see how functions work in Python.
We'll wrap up our Python crash course covering Boolean expressions and looping constructs.
Pandas is a library we'll use throughout the course for loading, examining, and manipulating data. Let's see how it works with some examples, and you'll have an exercise at the end too.
Statistics and Probability Refresher, and Python Practice
We cover the differences between continuous and discrete numerical data, categorical data, and ordinal data.
A refresher on mean, median, and mode  and when it's appropriate to use each.
We'll use mean, median, and mode in some real Python code, and set you loose to write some code of your own.
We'll cover how to compute the variation and standard deviation of a data distribution, and how to do it using some examples in Python.
Introducing the concepts of probability density functions (PDF's) and probability mass functions (PMF's).
We'll show examples of continuous, normal, exponential, binomial, and poisson distributions using iPython.
We'll look at some examples of percentiles and quartiles in data distributions, and then move on to the concept of the first four moments of data sets.
An overview of different tricks in matplotlib for creating graphs of your data, using different graph types and styles.
The concepts of covariance and correlation used to look for relationships between different sets of attributes, and some examples in Python.
We cover the concepts and equations behind conditional probability, and use it to try and find a relationship between age and purchases in some fabricated data using Python.
Here we'll go over my solution to the exercise I challenged you with in the previous lecture  changing our fabricated data to have no real correlation between ages and purchases, and seeing if you can detect that using conditional probability.
An overview of Bayes' Theorem, and an example of using it to uncover misleading statistics surrounding the accuracy of drug testing.
Predictive Models
We introduce the concept of linear regression and how it works, and use it to fit a line to some sample data using Python.
We cover the concepts of polynomial regression, and use it to fit a more complex page speed  purchase relationship in Python.
Multivariate models let us predict some value given more than one attribute. We cover the concept, then use it to build a model in Python to predict car prices based on their number of doors, mileage, and number of cylinders. We'll also get our first look at the statsmodels library in Python.
We'll just cover the concept of multilevel modeling, as it is a very advanced topic. But you'll get the ideas and challenges behind it.
Machine Learning with Python
The concepts of supervised and unsupervised machine learning, and how to evaluate the ability of a machine learning model to predict new values using the train/test technique.
We'll apply train test to a real example using Python.
We'll introduce the concept of Naive Bayes and how we might apply it to the problem of building a spam classifier.
We'll actually write a working spam classifier, using real email training data and a surprisingly small amount of code!
KMeans is a way to identify things that are similar to each other. It's a case of unsupervised learning, which could result in clusters you never expected!
We'll apply KMeans clustering to find interesting groupings of people based on their age and income.
Entropy is a measure of the disorder in a data set  we'll learn what that means, and how to compute it mathematically.
Decision trees can automatically create a flow chart for making some decision, based on machine learning! Let's learn how they work.
We'll create a decision tree and an entire "random forest" to predict hiring decisions for job candidates.
Random Forests was an example of ensemble learning; we'll cover over techniques for combining the results of many models to create a better result than any one could produce on its own.
XGBoost is perhaps the most powerful machine learning algorithm today, and it's really easy to use. We'll cover how it works, how to tune it, and run an example on the Iris data set showing how powerful XGBoost is.
Support Vector Machines are an advanced technique for classifying data that has multiple features. It treats those features as dimensions, and partitions this higherdimensional space using "support vectors."
We'll use scikitlearn to easily classify people using a CSupport Vector Classifier.
Recommender Systems
One way to recommend items is to look for other people similar to you based on their behavior, and recommend stuff they liked that you haven't seen yet.
The shortcomings of userbased collaborative filtering can be solved by flipping it on its head, and instead looking at relationships between items instead of relationships between people.
We'll use the realworld MovieLens data set of movie ratings to take a first crack at finding movies that are similar to each other, which is the first step in itembased collaborative filtering.
Our initial results for movies similar to Star Wars weren't very good. Let's figure out why, and fix it.
We'll implement a complete itembased collaborative filtering system that uses realworld movie ratings data to recommend movies to any user.
As a student exercise, try some of my ideas  or some ideas of your own  to make the results of our itembased collaborative filter even better.
More Data Mining and Machine Learning Techniques
KNN is a very simple supervised machine learning technique; we'll quickly cover the concept here.
We'll use the simple KNN technique and apply it to a more complicated problem: finding the most similar movies to a given movie just given its genre and rating information, and then using those "nearest neighbors" to predict the movie's rating.
Data that includes many features or many different vectors can be thought of as having many dimensions. Often it's useful to reduce those dimensions down to something more easily visualized, for compression, or to just distill the most important information from a data set (that is, information that contributes the most to the data's variance.) Principal Component Analysis and Singular Value Decomposition do that.
We'll use sckikitlearn's builtin PCA system to reduce the 4dimensions Iris data set down to 2 dimensions, while still preserving most of its variance.
Cloudbased data storage and analysis systems like Hadoop, Hive, Spark, and MapReduce are turning the field of data warehousing on its head. Instead of extracting, transforming, and then loading data into a data warehouse, the transformation step is now more efficiently done using a cluster after it's already been loaded. With computing and storage resources so cheap, this new approach now makes sense.
We'll describe the concept of reinforcement learning  including Markov Decision Processes, QLearning, and Dynamic Programming  all using a simple example of developing an intelligent PacMan.
What's a confusion matrix, and how do I read it?
Dealing with RealWorld Data
Bias and Variance both contribute to overall error; understand these components of error and how they relate to each other.
We'll introduce the concept of KFold CrossValidation to make train/test even more robust, and apply it to a real model.
Cleaning your raw input data is often the most important, and timeconsuming, part of your job as a data scientist!
In this example, we'll try to find the topviewed web pages on a web site  and see how much data pollution makes that into a very difficult task!
A brief reminder: some models require input data to be normalized, or within the same range, of each other. Always read the documentation on the techniques you are using.
A review of how outliers can affect your results, and how to identify and deal with them in a principled manner.
Apache Spark: Machine Learning on Big Data
We'll present an overview of the steps needed to install Apache Spark on your desktop in standalone mode, and get started by getting a Java Development Kit installed on your system.
We'll install Spark itself, along with all the associated environment variables and ancillary files and settings needed for it to function properly.
A highlevel overview of Apache Spark, what it is, and how it works.
We'll go in more depth on the core of Spark  the RDD object, and what you can do with it.
A quick overview of MLLib's capabilities, and the new data types it introduces to Spark.
We'll walk through an example of coding up and running a decision tree using Apache Spark's MLLib! In this exercise, we try to predict if a job candidate will be hired based on their work and educational history, using a decision tree that can be distributed across an entire cluster with Spark.
We'll take the same example of clustering people by age and income from our earlier KMeans lecture  but solve it in Spark!
We'll introduce the concept of TFIDF (Term Frequency / Inverse Document Frequency) and how it applies to search problems, in preparation for using it with MLLib.
Let's use TFIDF, Spark, and MLLib to create a rudimentary search engine for real Wikipedia pages!
Spark 2.0 introduced a new API for MLLib based on DataFrame objects; we'll look at an example of using this to create and use a linear regression model.
Experimental Design / ML in the Real World
Highlevel thoughts on various ways to deploy your trained models to production systems including apps and websites.
Running controlled experiments on your website usually involves a technique called the A/B test. We'll learn how they work.
How to determine significance of an A/B tests results, and measure the probability of the results being just from random chance, using TTests, the Tstatistic, and the Pvalue.
We'll fabricate A/B test data from several scenarios, and measure the Tstatistic and PValue for each using Python.
Some A/B tests just don't affect customer behavior one way or another. How do you know how long to let an experiment run for before giving up?
There are many limitations associated with running shortterm A/B tests  novelty effects, seasonal effects, and more can lead you to the wrong decisions. We'll discuss the forces that may result in misleading A/B test results so you can watch out for them.
Deep Learning and Neural Networks
If you skipped ahead, I'll show you where to get the course materials for just this section. And we'll cover some prerequisite concepts for understanding how neural networks operate: gradient descent, autodiff, and softmax.
We'll cover the evolution of artificial neural networks from 1943 to modernday architectures, which is a great way to understand how they work.
Google's Tensorflow Playground lets you experiment with deep neural networks and understand them  without writing a line of code!