# Blog Posts by Andrew B. Collier / @datawookie

## R Recipe: Making a Chord Diagram

With the circlize package, putting together a Chord Diagram is simple. library(circlize) library(RColorBrewer) # Create a random adjacency matrix # adj = matrix(sample(c(1, 0), 26**2, replace = TRUE, prob = c(1, 9)), nrow = 26, dimnames = list(LETTERS, LETTERS)) adj = ifelse(adj == 1, runif(26**2), 0) chordDiagram(adj, transparency = 0.4, grid.col = "midnightblue", col = colorRamp2(seq(0, 1, 0.2), brewer.pal(6, "Blues"))) Read More →

There are various approaches to predicting Comrades Marathon finishing times. Lindsey Parry, for example, suggests that you use two and a half times your recent marathon time. Sports Digest provides a calculator which predicts finishing time using recent times over three distances. I understand that this calculator is based on the work of Norrie Williamson. Let’s give them a test. I finished the 2013 Comrades Marathon in 09:41. Based on my marathon time from February that year, which was 03:38, Parry’s formula suggests that I should have finished at around 09:07. Read More →

## Comrades Runners Disqualified: I’m Not Convinced

Apparently 14 runners have been disqualified for failing to complete the full distance at the 2012 and 2013 Comrades Marathons.

I’m not convinced.

## A Sankey Plot with Uniform Coloured Edges

Following up on my previous post about generating Sankey plots with the riverplot package. It’s also possible to generate plots which have constant coloured edges.

## Comrades Marathon Pacing Chart: Up Run

I’ve updated my Comrades Marathon pacing chart to include both the Up and Down runs. You can grab it here. The data for this year’s race are not yet finalised (I think we will be running 87 km), but you can make changes when it’s all confirmed.

## The Price of Fuel: How Bad Could It Get?

The cost of fuel in South Africa (and I imagine pretty much everywhere else) is a contentious topic. It varies from month to month and, although it is clearly related to the price of crude oil and the exchange rate, various other forces play an influential role.

## Encyclopaedia: SANAE IV

A contribution which I wrote for Antarctica and the Arctic Circle: A Geographic Encyclopedia of the Earth’s Polar Regions. South Africa is one of the founding signatories of the Antarctic Treaty of 1959. In 1960, the first South African National Antarctic Expedition (SANAE) team overwintered at the Norwegian base on the Fimbul Ice Shelf. A new base, SANAE I, was constructed nearby (70° 18′S 2° 22′W) and opened in 1962. Later bases, SANAE II and SANAE III, were built on the same location (72° 40′ 22″S 2° 50′ 26″W) and commissioned in 1971 and 1979 respectively. Read More →

## Dealing with a Byte Order Mark (BOM)

I have just been trying to import some data into R. The data were exported from a SQL Server client in tab-separated value (TSV) format. However, reading the data into R the “usual” way produced unexpected results:

## Graph Databases

The book Graph Databases by Ian Robinson, Jim Webber and Emil Eifrem gives an engaging overview of Graph Databases, describing typical use cases and illustrating the syntax used to construct and query them. Graph Databases are a form of NoSQL database and, as such, differ significantly from the ubiquitous Relational Databases. The authors discuss a variety of scenarios where a Graph Database would be a better fit than a Relational Database, showing how they are particularly well suited to data which describe relationships between entities. Read More →

The book R for Business Analytics by Ajay Ohri sets out to look at “some of the most common tasks performed by business analysts and helps the user navigate the wealth of information in R and its 4000 packages.” In my opinion it succeeds in covering an extensive range of topics but fails to provide anything of substantial use to its intended audience. At least, not anything that could not be uncovered by a brief internet search. Read More →