Speed Experiments

Some time ago I wrote a post on calculating a selection of statistics from a list on numbers. I got some criticism from people saying that for some of the statistics I should have used Python built-in functions or functions from the Python Standard Library statistics module.

Doing so, however, would cause each of those functions to iterate over the entire dataset. If you want to calculate a number of different statistics in one go you can increase efficiency considerably with just one iteration.

I started writing a simple experiment to calculate the minimum, maximum, sum, mean and standard deviation of a list of numbers using Python's own functions, calculating them again using a single loop, and then comparing the performance.

I then decided to expand the experiment somewhat, firstly by running the plain Python code with PyPy instead of CPython, and then re-writing the Python as Cython. This article explores these experiments and presents the results.

Continue reading

Tkinter Pillow Application 0.1

In my post An Introduction to Image Manipulation with Pillow I commented that "You could in principle use it [Pillow] as the basis of a sort of lightweight Photoshop type application using perhaps Tkinter or PyQT". At the time I wasn't actually intending to do so but recently the idea has started to appeal to me so I thought I'd give it a go.

Although I'm not attempting to compete with Photoshop this is still a fairly ambitious project which will spread over a number of posts, and to start with I'll just get something very basic up and running.

Continue reading

Percentile Ranks

I recently wrote an article on Z-Scores which help with the problem of comparing two or more sets of exam grades when each set may have differing averages and spreads.

Even if you just have one exam grade you still have the problem of understanding how it compares to the rest of the grades, and in this post I'll write a Python implementation of percentile ranks which offer one possible solution to this problem.

Continue reading

Benford’s Law

I recently posted an article on Zipf's Law and the application of the Zipfian Distribution to word frequencies in a piece of text. A closely related concept is Benford's Law which describes the distribution of the first* digits of many, if not most, sets of numeric data. In fact the two are so closely related that the Benford Distribution can be considered as special case of the Zipfian Distribution.

Continue reading

Zipf’s Law

Zipf's Law describes a probability distribution where each frequency is the reciprocal of its rank multiplied by the highest frequency. Therefore the second highest frequency is the highest multiplied by 1/2, the third highest is the highest multiplied by 1/3 and so on.

This is best illustrated with a graph.

In this post I will write a project in Python to apply Zipf's Law to what is probably it's best known use, that of analysing word frequencies in a piece of text.

Continue reading

Exporting PostgreSQL Data to Excel

There is a baffling selection of reporting software out there with very sophisticated functionality and users can put together reports impressive enough to satisfy any manager or board.

However, many people put pragmatics over aesthetics and will say "can't I just get the data in a spreadsheet?"

In this post I will put together a very simple solution to the problem of exporting data from PostgreSQL to an Excel spreadsheet using psycopg2 for the database access and openpyxl for the spreadsheet creation.

Continue reading

Moving Averages

Everyone understands averages, both their meaning and how to calculate them. However, there are situations, particularly when dealing with real-time data, when a conventional average is of little use because it includes old values which are no longer relevant and merely give a misleading impression of the current situation.

The solution to this problem is to use moving averages, ie. the average of the most recent values rather than all values, which is the subject of this post.

Continue reading

Z-Scores

So, your child gets 78% in both physics and history. Both pretty good grades but as the reader of geeky blogs like this you believe the sciences are more important than the humanities and would have preferred your child to do better in physics than history.

However, we are not necessarily comparing like with like here: 78% in one subject is probably not equivalent to 78% in another. Rather than the absolute percentages we need to calculate and compare the Z-Scores which take into account the averages and ranges of the entire set of scores.

Continue reading