Blog Posts by Andrew B. Collier / @datawookie


Comrades Marathon Medal Predictions

With only a few days to go until race day, most Comrades Marathon athletes will focusing on resting, getting enough sleep, hydrating, eating and giving a wide berth to anybody who looks even remotely ill.

They will probably also be thinking a lot about Sunday’s race. What will the weather be like? Will it be cold at the start? (Unlikely since it’s been so warm in Durban.) How will they feel on the day? Will they manage to find their seconds along the route?

Read More →

R Recipe: Aligning Axes in ggplot2

Faceted plots in ggplot2 are phenomenal. They give you a simple way to break down an array of plots according to the values of one or more categorical variables. But what if you want to stack plots of different variables? Not quite so simple. But certainly possible. I gathered together this solution from a variety of sources on stackoverflow, notably this one and this other one. A similar issue for vertical alignment is addressed here.

Read More →

R Recipe: Making a Chord Diagram

With the circlize package, putting together a Chord Diagram is simple. library(circlize) library(RColorBrewer) # Create a random adjacency matrix # adj = matrix(sample(c(1, 0), 26**2, replace = TRUE, prob = c(1, 9)), nrow = 26, dimnames = list(LETTERS, LETTERS)) adj = ifelse(adj == 1, runif(26**2), 0) chordDiagram(adj, transparency = 0.4, grid.col = "midnightblue", col = colorRamp2(seq(0, 1, 0.2), brewer.pal(6, "Blues"))) Read More →

Comrades Marathon Finish Predictions

There are various approaches to predicting Comrades Marathon finishing times. Lindsey Parry, for example, suggests that you use two and a half times your recent marathon time. Sports Digest provides a calculator which predicts finishing time using recent times over three distances. I understand that this calculator is based on the work of Norrie Williamson. Let’s give them a test. I finished the 2013 Comrades Marathon in 09:41. Based on my marathon time from February that year, which was 03:38, Parry’s formula suggests that I should have finished at around 09:07. Read More →

The Price of Fuel: How Bad Could It Get?

The cost of fuel in South Africa (and I imagine pretty much everywhere else) is a contentious topic. It varies from month to month and, although it is clearly related to the price of crude oil and the exchange rate, various other forces play an influential role.

Read More →

Encyclopaedia: SANAE IV

A contribution which I wrote for Antarctica and the Arctic Circle: A Geographic Encyclopedia of the Earth’s Polar Regions. South Africa is one of the founding signatories of the Antarctic Treaty of 1959. In 1960, the first South African National Antarctic Expedition (SANAE) team overwintered at the Norwegian base on the Fimbul Ice Shelf. A new base, SANAE I, was constructed nearby (70° 18′S 2° 22′W) and opened in 1962. Later bases, SANAE II and SANAE III, were built on the same location (72° 40′ 22″S 2° 50′ 26″W) and commissioned in 1971 and 1979 respectively. Read More →

Dealing with a Byte Order Mark (BOM)

I have just been trying to import some data into R. The data were exported from a SQL Server client in tab-separated value (TSV) format. However, reading the data into R the “usual” way produced unexpected results:

Read More →

Graph Databases

The book Graph Databases by Ian Robinson, Jim Webber and Emil Eifrem gives an engaging overview of Graph Databases, describing typical use cases and illustrating the syntax used to construct and query them. Graph Databases are a form of NoSQL database and, as such, differ significantly from the ubiquitous Relational Databases. The authors discuss a variety of scenarios where a Graph Database would be a better fit than a Relational Database, showing how they are particularly well suited to data which describe relationships between entities. Read More →

R for Business Analytics

The book R for Business Analytics by Ajay Ohri sets out to look at “some of the most common tasks performed by business analysts and helps the user navigate the wealth of information in R and its 4000 packages.” In my opinion it succeeds in covering an extensive range of topics but fails to provide anything of substantial use to its intended audience. At least, not anything that could not be uncovered by a brief internet search. Read More →