Graph Theory

Graph Theory is a vast area of study based around the simple idea of individual points - known as vertices - connected by lines known as edges, each of which may have an associated numeric values called a weight and perhaps also a direction.

These simple sets of vertices, edges, weights and directions are known as graphs (not to be confused with the more familiar graph of data or a mathematical function) and can be used to represent many real-world situations or abstract concepts. Once you have a graph stored in some kind of data structure there are many algorithms which can be applied to solve problems, especially those of optimisation.

This post is the first of potentially many, and in it I will develop a data structure to store a graph. In future posts I will implement some of the algorithms which operate on graphs.

Continue reading

Tkinter Pillow Application 0.2

This is Version 0.2 of my Tkinter Pillow Application, Version 0.1 being here.

This is an ongoing project to develop an image editing application using Tkinter for the UI (with other GUI toolkits as a long-term objective) and the Pillow library for image editing functionality.

Version 0.1 established the overall architecture of the solution and enabled users to open and display an image. For Version 0.2 I will add a number of essential improvements to the user interface.

Continue reading

Exploring Matrices

If you need to carry out any serious work with matrices in Python then your best option will usually be to use the NumPy package. NumPy provides an industrial strength array class called ndarray (n-dimensional array) which can be used to represent a matrix, and the usual arithmetic operators can be used. Additionally, you can use the @ operator to multiply two matrices together.

However, if you are trying to learn matrix arithmetic then NumPy isn't going to help much as matrix operations aren't very intuitive, especially matrix multiplication which is rather convoluted. The best way to learn is probably to start with a sheet of paper and a pen, working through some examples, before moving on to writing code from scratch to perform the various arithmetic functions.

In this post I will write a very quick whistlestop tour of matrix arithmetic using NumPy before moving on to the main purpose of this post: writing a simple matrix class from scratch.

Continue reading

Speed Experiments

Some time ago I wrote a post on calculating a selection of statistics from a list on numbers. I got some criticism from people saying that for some of the statistics I should have used Python built-in functions or functions from the Python Standard Library statistics module.

Doing so, however, would cause each of those functions to iterate over the entire dataset. If you want to calculate a number of different statistics in one go you can increase efficiency considerably with just one iteration.

I started writing a simple experiment to calculate the minimum, maximum, sum, mean and standard deviation of a list of numbers using Python's own functions, calculating them again using a single loop, and then comparing the performance.

I then decided to expand the experiment somewhat, firstly by running the plain Python code with PyPy instead of CPython, and then re-writing the Python as Cython. This article explores these experiments and presents the results.

Continue reading

Tkinter Pillow Application 0.1

In my post An Introduction to Image Manipulation with Pillow I commented that "You could in principle use it [Pillow] as the basis of a sort of lightweight Photoshop type application using perhaps Tkinter or PyQT". At the time I wasn't actually intending to do so but recently the idea has started to appeal to me so I thought I'd give it a go.

Although I'm not attempting to compete with Photoshop this is still a fairly ambitious project which will spread over a number of posts, and to start with I'll just get something very basic up and running.

Continue reading

Percentile Ranks

I recently wrote an article on Z-Scores which help with the problem of comparing two or more sets of exam grades when each set may have differing averages and spreads.

Even if you just have one exam grade you still have the problem of understanding how it compares to the rest of the grades, and in this post I'll write a Python implementation of percentile ranks which offer one possible solution to this problem.

Continue reading

Benford’s Law

I recently posted an article on Zipf's Law and the application of the Zipfian Distribution to word frequencies in a piece of text. A closely related concept is Benford's Law which describes the distribution of the first* digits of many, if not most, sets of numeric data. In fact the two are so closely related that the Benford Distribution can be considered as special case of the Zipfian Distribution.

Continue reading

Zipf’s Law

Zipf's Law describes a probability distribution where each frequency is the reciprocal of its rank multiplied by the highest frequency. Therefore the second highest frequency is the highest multiplied by 1/2, the third highest is the highest multiplied by 1/3 and so on.

This is best illustrated with a graph.

In this post I will write a project in Python to apply Zipf's Law to what is probably it's best known use, that of analysing word frequencies in a piece of text.

Continue reading