Blog Posts by Andrew B. Collier / @datawookie


R Recipe: Making a Chord Diagram

With the circlize package, putting together a Chord Diagram is simple. library(circlize) library(RColorBrewer) # Create a random adjacency matrix # adj = matrix(sample(c(1, 0), 26**2, replace = TRUE, prob = c(1, 9)), nrow = 26, dimnames = list(LETTERS, LETTERS)) adj = ifelse(adj == 1, runif(26**2), 0) chordDiagram(adj, transparency = 0.4, grid.col = "midnightblue", col = colorRamp2(seq(0, 1, 0.2), brewer.pal(6, "Blues"))) Read More →

Comrades Marathon Finish Predictions

There are various approaches to predicting Comrades Marathon finishing times. Lindsey Parry, for example, suggests that you use two and a half times your recent marathon time. Sports Digest provides a calculator which predicts finishing time using recent times over three distances. I understand that this calculator is based on the work of Norrie Williamson. Let’s give them a test. I finished the 2013 Comrades Marathon in 09:41. Based on my marathon time from February that year, which was 03:38, Parry’s formula suggests that I should have finished at around 09:07. Read More →

The Price of Fuel: How Bad Could It Get?

The cost of fuel in South Africa (and I imagine pretty much everywhere else) is a contentious topic. It varies from month to month and, although it is clearly related to the price of crude oil and the exchange rate, various other forces play an influential role.

Read More →

Encyclopaedia: SANAE IV

A contribution which I wrote for Antarctica and the Arctic Circle: A Geographic Encyclopedia of the Earth’s Polar Regions. South Africa is one of the founding signatories of the Antarctic Treaty of 1959. In 1960, the first South African National Antarctic Expedition (SANAE) team overwintered at the Norwegian base on the Fimbul Ice Shelf. A new base, SANAE I, was constructed nearby (70° 18′S 2° 22′W) and opened in 1962. Later bases, SANAE II and SANAE III, were built on the same location (72° 40′ 22″S 2° 50′ 26″W) and commissioned in 1971 and 1979 respectively. Read More →

Dealing with a Byte Order Mark (BOM)

I have just been trying to import some data into R. The data were exported from a SQL Server client in tab-separated value (TSV) format. However, reading the data into R the “usual” way produced unexpected results:

Read More →

Graph Databases

The book Graph Databases by Ian Robinson, Jim Webber and Emil Eifrem gives an engaging overview of Graph Databases, describing typical use cases and illustrating the syntax used to construct and query them. Graph Databases are a form of NoSQL database and, as such, differ significantly from the ubiquitous Relational Databases. The authors discuss a variety of scenarios where a Graph Database would be a better fit than a Relational Database, showing how they are particularly well suited to data which describe relationships between entities. Read More →

R for Business Analytics

The book R for Business Analytics by Ajay Ohri sets out to look at “some of the most common tasks performed by business analysts and helps the user navigate the wealth of information in R and its 4000 packages.” In my opinion it succeeds in covering an extensive range of topics but fails to provide anything of substantial use to its intended audience. At least, not anything that could not be uncovered by a brief internet search. Read More →

Simulating Intricate Branching Patterns with DLA

Manfred Schroeder’s book Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise is a fruitful source of interesting topics and projects. He gives a thorough description of Diffusion-Limited Aggregation (DLA) as a technique for simulating physical processes which produce intricate branching structures. Examples, as illustrated below, include Lichtenberg Figures, dielectric breakdown, electrodeposition and Hele-Shaw flow.

Read More →