Speeding up calculation of an average in continuous processes

Processing big sets of data can cause a low performance due to inefficient calculations. In order to overcome this issue we should rewrite calculations in a way which allows getting results using the latest data about sequence. This text describes an algorithm of  sequential average.

Instead of using the sequence you can use only three values which can be retained at each step of calculation. They are #1 a current average, #2 number of elements with the new one and #3 the new element.
M_n=\frac{1}{n}\sum_{n}^{}{a_i}
M_{n+1}=M_n-\frac{M_n}{n+1}+\frac{a_{n+1}}{n+1}
This approach is the best one for systems with continuous calculations. The code below shows implementation on C#.
function double GetSequentialAverage(
    double average, // current average
    double value, // new value
    int length // current length of a sequence
){
    return average - (average + value)/(length + 1);
}

The method calculates a new average in about 0.0000016 sec.

At the same time, when you need to calculate an average for a whole sequence the best one is the simplest one. Below you can find a benchmark of three average functions:

  • Linq extension .Average()
  • the sequential average
  • and the simple one
The table below shows elapsed time in seconds for different sequence sizes. The smaller the better.
Items Number Linq Sequential Simple
10 0.0022526 0.0009596 0.0004526
100 0.0024495 0.0009739 0.0004636
1000 0.0023505 0.0010526 0.0004658
10000 0.0026887 0.0013248 0.0005741
100000 0.0046928 0.0044321 0.0013848
1000000 0.0231775 0.0358475 0.0078798
10000000 0.2114584 0.3422293 0.0735353
The charts below represent the same values.

[visualizer id=”70″]
[visualizer id=”71″]

Thus the sequential average fits best to a special use case of continuous calculations, when you’re able to retain information about the sequence.

 

Tags:

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Post

%d bloggers like this: