In our Data Science group we raised a question about courses which person could take to get started with Data Science. Some time ago I was applying to one investment program where I had to present the list of courses which I need funds. When I created that list I was thinking about quick introduction to all important aspects of the industry with further more deep diving into details. The approach looks like the following: starting from a tool set and concepts, moving to mathematics and statistics background and finishing with advanced techniques. I extended the list with some additional courses and now this is a kind of ToDo list for me for the closest year. Below you can find this list with my comments about each of them. Please, take into account that not all of them I took at the moment of the writing this article. The taken courses with some insights you can find here.
Courses
Please, read annotation to the list below.
Skills | Title | Teacher | Comments | Length | Price |
---|---|---|---|---|---|
Data Engineer Python, Numpy, Pandas, Matlibplot |
DAT208x: Introduction to Python for Data Science | Microsoft | This course has everything you need in the beginning. It covers Numpy, Pandas, visualization with Matlibplot and Python language. The materials are very good and presented in good ways. It is done in collaboration with DataCamp and you can learn new things in action in their IPython shell. | 1 month | $100 |
Data Engineer Azure Machine Learning, Predictive Models, Azure Data Factory |
DAT228x: Developing Big Data Solutions with Azure Machine Learning | Microsoft | During the course you’ll build a pipeline to get data, train a model and update the model with Azure services. I liked this course since it gives detailed information about what you need to do to build the pipeline and which services you need and why. | 1 month | $100 |
Data Engineer Hadoop, HBase, Storm, Spark, Azure HDInsight, Hive and Pig |
Microsoft Azure HDInsight Big Data Analyst (X Series) | Microsoft | The series consists of three courses and covers very actual set of technologies. I took partially the first course of the series and found it has detailed explanation of tools, quite enough to understand what is happening under the hood. | 3 months | $300 |
Data Engineer HDFS, MapReduce and Spark RDD, Hive, Spark SQL, DataFrames and GraphFrames, Python |
Big Data for Data Engineers Specialization | Yandex | The specialization consists of 5 courses. They were made in collaboration of Yandex, Odnoklassniki and MIPT university. All teachers have good expertise in the industry of big data. The lectures, by some reviews, has very structured materials and clean way of storytelling. | 6 months | $300 |
Data Engineer Supervised, Unsupervised, Reinforcement, and Deep Learning |
Machine Learning Engineer | Kaggle | The course consists of 6 lessons 1 month length each. Each of the lessons is related to one of the main concepts of Data Science and student will touch all aspects of the industry. | 6 months | $1200 |
Data Engineer Python, jupyter, pandas, numpy, matplotlib |
DSE200x: Python for Data Science |
The University of California, San Diego | This course is a part of MicroMasters program Data Science. It covers important packages and techniques to work with data in Python and also introduces you to ML concepts. | 2 months | $350 |
Data Engineer Python and algorithms |
6.00.1x: Introduction to Computer Science and Programming Using Python | MITx | This is course for beginners. You’ll learn Python, algorithm and data structures. | 2 months | $50 |
Data Scientist R, ggplot2, swirl, etc. |
Data Science Specialization | Johns Hopkins University | The specialization has big focus on R language usage and data exploration techniques. There are some reviews that it has too abstract storytelling. | 10 months | $500 |
Data Scientist Python with packages for plotting and data analysis |
Applied Data Science with Python Specialization | University of Michigan | The specialization consists of 5 courses. It gives possibility to learn packages and techniques which are the standard in industry. They all covers such aspects as statistical, machine learning, visualization, text analysis, social network analysis, Python toolkits such as pandas, matplotlib, scikit-learn, nltk, and networkx. | 5 months | $250 |
Data Scientist supervised, unsupervised learning, bias/variance theory, neural networks |
Machine Learning | Stanford University Andrew Ng |
The course is conducted by Andrew Ng, who is a well-known person in Data Science. The course covers such aspects as supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks). (ii) unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning), bias/variance theory. This just a very good course to get started with Data Science. | 3 months | $79 |
Data Scientist Neural networks, Convolutional networks, RNNs, LSTM, Adam, Dropout, BatchNorm, Xavier/He initialization |
Deep Learning Specialization | deeplearning.ai Andrew Ng |
Andrew Ng with his colleagues conducted this course to get deeper introduction to Deep Learning and Neural Networks. Five courses of the specialization will draw you through theory and practice using examples on Python and TensorFlow. | 5 months | $250 |
Data Scientist NLP, reinforcement, deep learning, bayesian methods |
Advanced Machine Learning Specialization | Higher School of Economics | This specialization is built in collaboration with Yandex School of Data Analysis. Seven courses will draw you through Kaggle challengers, text analysis, computer vision, and more. | 7 months | $350 |
Data Scientist Neural networks, calculus, Python, Octave |
Neural Networks for Machine Learning | University of Toronto Geoffrey Hinton |
I partially finished this course. It’s not easy to grasp ideas without good background in math, but it’s possible if you learn connected areas in math in parallel. The overall speed will be slower, but it worth it. This course brings the very important basics, though it might be not so modern. It uses Octave which is a bit wired for me personally. There is a review here on this course. Anyway I would strongly recommend it because it brings clarity to the vision of NN. | 4 months | $50 |
Data Scientist Python, probability, distributions, Monte Carlo simulations |
6.00.2x: Introduction to Computational Thinking and Data Science | MITx | This is purely for people who needs to upgrade their skills in computation and data exploration. It’s good that Python is used in the labs. By reviews it gives information for beginners in good and clean way. It worth to take after another MITx course 6.00.1x | 3 months | $50 |
Annotation
MOOC platforms are edX, Coursera, Udacity.
Skills
In general Data Science as an industry consists of two specializations: theoretical and hardware. Theoretical specialization is about analyzing of data, finding some insights, building features and so on. Specialist in this area should have good math background and might not know programming very well. The hardware specialization is about constructing a pipeline to gather raw data, prepare it for further analysis, implementing models in real applications. Such specialists may know something about data analysis, but this is not the major skill. As the results in the list you can see Data Scientists and Data Engineers. I tried to sort all courses according to which area they belong more.
Length & Price
Length and prices are approximate. Some of the mentioned courses are available for free and only certification requires investments. Free access can vary from platform to platform and from course to course. For example, Coursera may hide assessments behind the paywall, when materials are available. On edX only certificate is paid.
Additional Courses
There are some other programs which may be interested, but I’ve heard very little about them at the moment. I’m just putting them here to not forget later to check.
- Master of Computer Science in Data Science. Very expensive course from the University of Illinois. Passing it you’ll get Master Degree in Data Science.
- Learning from Data. It’s free lectures and materials from the California Institute of Technology, taught by Professor Yaser Abu-Mostafa. I found good feedback about it, but didn’t include to the list since the course is closed on edX at the moment of writing this article.
- CS231n: Convolutional Neural Networks for Visual Recognition. This is highly recommended course from the Stanford University. I’m not sure at the moment how to take it online. I found feedback that this course requires good hardware and it might need to create VMs on cloud which satisfy requirements.
- CS224n: Natural Language Processing with Deep Learning. Same situation as above.
- MSc in Statistical Science. It’s provided by the University of Oxford. It is a twelve-month full-time taught master’s degree running from October to September each academic year. The MSc has a particular focus on modern computationally-intensive theory and methods.
- MPhil in Machine Learning, Speech and Language Technology. It’s provided by the University of Cambridge. It is a twelve-month full-time MPhil programme offered by the Computational and Biological Learning Group, the Speech Group, and the Computer Vision and Robotics Group in the Cambridge University Department of Engineering, with a unique, joint emphasis on both machine learning and on speech and language technology.
Books
Here I put some books found during the creation of the list. This paragraph may be moved to another article later.
- The Elements of Statistical Learning. Free PDF version is available. Data Mining, Inference, and Prediction.