Some time ago I wrote a post on calculating a selection of statistics from a list on numbers. I got some criticism from people saying that for some of the statistics I should have used Python built-in functions or functions from the Python Standard Library statistics module.
Doing so, however, would cause each of those functions to iterate over the entire dataset. If you want to calculate a number of different statistics in one go you can increase efficiency considerably with just one iteration.
I started writing a simple experiment to calculate the minimum, maximum, sum, mean and standard deviation of a list of numbers using Python's own functions, calculating them again using a single loop, and then comparing the performance.
I then decided to expand the experiment somewhat, firstly by running the plain Python code with PyPy instead of CPython, and then re-writing the Python as Cython. This article explores these experiments and presents the results.
Continue reading