In this article I put everything which I think important to be aware when you start learning Data Science. It includes math, statistic, DS concepts. I put examples where I can. The article is updated from time to time.
Math
Multiplication of matrices 

If A is an n × m matrix and B is an m × p matrix, the matrix product C is defined to be the n × p matrix. More information here. 
Inverse Matrix 

The inverse of a square matrix A, sometimes called a reciprocal matrix, is a matrix A^{1} such that AA^{1}=I, where I is the identity matrix. More information here, here (rus). 
Derivatives of Functions 

The derivative of a function represents an infinitesimal change in the function with respect to one of its variables. More information here, here (rus). 
Statistics
Mean 

The mean is computed by summing all the scores in the distribution (SX) and dividing that sum by the total number of scores (N). The mean is the balance point in a distribution such that if you subtract each value in the distribution from the mean and sum all of these deviation scores, the result will be zero.

Median 

The median is the score that divides the distribution into halves; half of the scores are above the median and half are below it when the data are arranged in numerical order. The median is also referred to as the score at the 50th percentile in the distribution. The median location of N numbers can be found by the formula . When N is an odd number, the formula yields an integer that represents the value in a numerically ordered distribution corresponding to the median location.

Mode 

The mode of a distribution is simply defined as the most frequent or common score in the distribution. The mode is the point or value of X that corresponds to the highest point on the distribution. If the highest frequency is shared by more than one value, the distribution is said to be multimodal. 
More information about mean, median and mode is here.
Variance 

The variance is a measure based on the deviations of individual scores from the mean. As noted in the definition of the mean, however, simply summing the deviations will result in a value of 0. To get around this problem the variance is based on squared deviations of scores about the mean. When the deviations are squared, the rank order and relative distance of scores in the distribution is preserved while negative values are eliminated. More information here and here. 
Standard Deviation 

The measure of variability expressed in the same units as the data (measure of how spread out numbers are). The standard deviation is very much like a mean or an “average” of these deviations. It is the square root of the Variance More information here. 
Covariance 

Covariance provides a measure of the strength of the correlation between two or more sets of random variates. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the lesser values, i.e., the variables tend to show similar behavior, the covariance is positive. More information here and here. 
Correlation Coefficient 

It is used in statistics to measure how strong a relationship is between two variables. There are several types of correlation coefficient: Pearson’s correlation (also called Pearson’s R) is a correlation coefficient commonly used in linear regression. In fact, when anyone refers to the correlation coefficient, they are usually talking about Pearson’s. More information here and here.

Mean Absolute Error 

Absolute Error is the amount of error in your measurements. It is the difference between the measured value and “true” value. More information here and here. In formula y_{i} – predicted value, x_{i} – actual value. 
Root Mean Squared Error 

It is the standard deviation of the residuals (prediction errors). Residuals are a measure of how far from the regression line data points are; RMSE is a measure of how spread out these residuals are. In other words, it tells you how concentrated the data is around the line of best fit. Root mean square error is commonly used in climatology, forecasting, and regression analysis to verify experimental results. More information here and here. In formula y_{i} – predicted value, x_{i} – actual value. 
Relative Absolute Error 

The number shows how good actual values relative to the size of distribution. Can be compared between models whose errors are measured in the different units. More information here and here. 
Relative Squared Error 

The number shows how good actual values relative to the size of distribution. Can be compared between models whose errors are measured in the different units. More information here and here. 
Coefficient of Determination 

Shows how well observed values replicate forecasted values. It is literally a squared Correlation Coefficient. The coefficient of determination is used to explain how much variability of one factor can be caused by its relationship to another factor. More information here and here. 
Confusion Matrix 

A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known. Visual representation of errors of classifications. The ROC curve shows the tradeoff between the True Positive Rate (positive cases correctly classified) and False Positive Rate (positive cases incorrectly classified). This curve is above and to the left of the light 45 degree line. This indicates that the classifier is more effective than simply guessing. Recall, is the fraction of positive cases correctly classified. Notice this figure is only 0.585, which means the classifier misclassifies more than 4 of 10 diabetic patients as nondiabetic. Precision is the fraction of negative cases correctly classified. More information here and here. 