# Data Science self-assessment topic list

Running a group by specialization is connected with understanding of the current awareness of the group about the subject that they’re interested in. This knowledge gives the opportunity to build a relevant road map for learning of the subject. Now I’m trying to build an assessment to get this understanding for the group which I’m about to run. The purpose of the group is to get familiar with Data Science and all important aspects of this discipline. In order to lunch the group the assessment should give a picture of the group profile. It should not be very detailed because the purpose is to get average level of the group. At the same time the assessment should be connected with all important topics of the discipline. Here I summarized my findings and you can find an example of the assessment.

# Discipline Key Notions

The heart of Data Science is mathematics. It is implied in all kinds so the good education in this area is a key factor. I found one book where the detailed assessment is explained – Doing Data Science: Straight Talk from the Frontline. In the book authors are suggested the following profile structure:

1. Computer science
2. Math
3. Statistics
4. Domain expertise
5. Communication and presentation skills
6. Data visualization

There is another assessment provided on edX: Data Science Readiness Assessment. It covers such aspects as calculus, linear algebra and programming. Also this and this explains other important aspects. This article has quite good coverage of algebra for machine learning. There are a lot of other materials can be found on internet. All of them have in common algebra, statistics, programming and special terms for Data Science. Bearing this in mind, I tried to identify essential topics to make a profile in the beginning.

# Only Key Factors

The assessment for the beginning should cover only topics which shows person’s awareness about the subject. For example, no sense to include some basic topics about math. For example, if person knows about integrals, he/she for sure also knows about reducing formulas of multiplication. I identified the following main topics for the assessment.

• Math
• Multiplication of matrices
• Inverse matrix
• Derivatives of functions
• Statistics
• Measure of center
• Mean
• Median
• Mode
• Measure of spread
• Variance
• Standard Deviation
• Covariance
• Correlation
• Measure of error
• Mean Absolute Error
• Root Mean Squared Error
• Relative Absolute Error
• Relative Squared Error
• The Coefficient of Determination
• Confusion Matrix
• Concepts
• Types
• Classification
• Regression
• Clusterization
• Model training concepts
• Supervised/Unsupervised learning
• Reinforced learning
• Model evaluation
• Overfitting
• Training and Testing data

## No Cloud Services

I didn’t include “Cloud services” or something else related to infrastructure because the questions in this area would be based on your technological targets and doesn’t show a person’s readiness from Data Science standpoint. In overall the assessment doesn’t have purpose to give a full and detailed overview. For sure it might be improved a lot. The approach I use is to make some questions which are related to those aspects which person should be aware at some level, which I consider well enough to be ready for Data Science area.

## No Programming Languages

I didn’t include any programming language to the assessment because it’s just a tool. The fact is that any of the statistic function can be implemented using any language: Python, R, C++, Java, C#, etc. It seems that Python is the most popular programming language for Data Science solutions because it’s a simple and effective language, with a lot of tools. But really it doesn’t show person’s awareness about Data Science.

# Wrapping Up

Building of the assessment with respect to these topics will take some time and soon I’ll publish the example of it. For sure it will contain algebra exercises and some questions with picking a correct answer. Meanwhile I need to verify the topic list and add or remove some items.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

## Dealing with Very Big Files using Pandas DataFrameDealing with Very Big Files using Pandas DataFrame

Recently working on some prototype of a demand prediction solution I faced with a couple of problems. Very big files. Such files cannot be loaded to memory since they are

## Activity structure in Data Science GroupActivity structure in Data Science Group

One of the purposes of the Data Science group which I run is to try in practice a real Data Science project from the setting a problem to the providing

## Data Science GlossaryData Science Glossary

In this article I put everything which I think important to be aware when you start learning Data Science. It includes math, statistic, DS concepts. I put examples where I