Andrew B. Collier / @datawookie

Social links and a link to my CV.

Public datasets:

British Canoeing Results

Filtering Data with L2 Regularisation

2014-03-25 R regularisation

I have just finished reading Momentum Strategies with L1 Filter by Tung-Lam Dao. The smoothing results presented in this paper are interesting and I thought it would be cool to implement the L1 and L2 filtering schemes in R. We’ll start with the L2 scheme here because it has an exact solution and I will follow up with the L1 scheme later on.

How Long to Conceive?

2014-01-12 R

This morning my wife presented me with a rather interesting statistic: a healthy couple has a 25% chance of conception every month [1], and that this should result in a 75% to 85% chance of conception after a year. This sounded rather interesting and it occurred to me that it really can’t be that simple. A lot of variables influence this probability. Certainly age should be a factor and, after a short search, I found some more age-specific information which indicated that for a woman in her thirties, the probability is only around 15% [2,3].

Processing EXIF Data

2013-12-16 R

I got quite inspired by the EXIF with R post on the Timely Portfolio blog and decided to do a similar analysis with my photographic database.

Updated Comrades Winners Chart

2013-12-14 running

On Friday I received my copy of The Official Results Brochure for the 2013 Comrades Marathon. Always makes for a diverting half an hour’s reading. And the tables at the front provide some very interesting statistics. Seemed like a good opportunity to update my Chart of Comrades Winners.

Contour and Density Layers with ggmap

2013-12-14 R

I am busy working on a project which uses data from the World Wide Lightning Location Network (WWLLN). Specifically, I am trying to reproduce some of the results from Orville, Richard E, Gary R. Huffines, John Nielsen-Gammon, Renyi Zhang, Brandon Ely, Scott Steiger, Stephen Phillips, Steve Allen, and William Read. 2001. “Enhancement of Cloud-to-Ground Lightning over Houston, Texas”. Geophysical Research Letters 28 (13): 2597–2600.

Amy Cuddy: Your body language shapes who you are

2013-11-27 TED Talk

Amy Cuddy gives a great talk. Provided me with lots to think about and I will happily confess that I have struck a few power poses (but only after ensuring that I am quite alone)!

Deriving a Priority Queue from a Plain Vanilla Queue

2013-11-26 R

Following up on my recent post about implementing a queue as a reference class, I am going to derive a Priority Queue class.

Implementing a Queue as a Reference Class

2013-11-24 R

I am working on a simulation for an Automatic Repeat-reQuest (ARQ) algorithm. After trying various options, I concluded that I would need an implementation of a queue to make this problem tractable. R does not have a native queue data structure, so this seemed like a good opportunity to implement one and learn something about Reference Classes in the process.

The Implementation

We use setRefClass() to create a generator function which will create objects of the Queue class.

Iterators in R

2013-11-14 R

According to Wikipedia, an iterator is “an object that enables a programmer to traverse a container”. A collection of items (stashed in a container) can be thought of as being “iterable” if there is a logical progression from one element to the next (so a list is iterable, while a set is not). An iterator is then an object for moving through the container, one item at a time.

Iterators are a fundamental part of contemporary Python programming, where they form the basis for loops, list comprehensions and generator expressions. Iterators facilitate navigation of container classes in the C++ Standard Template Library (STL). They are also to be found in C#, Java, Scala, Matlab, PHP and Ruby.

Introduction to Fractals

2013-11-04 R

A short while ago I was contracted to write a short piece entitled “Introduction to Fractals”. Admittedly it is hard to do justice to the topic in less than 1000 words.

Percolation Threshold: Including Next-Nearest Neighbours

2013-11-01 R

Percolation through a larger lattice at the percolation threshold.

In my previous post about estimating the Percolation Threshold on a square lattice, I only considered flow from a given cell to its four nearest neighbours. It is a relatively simple matter to extend the recursive flow algorithm to include other configurations as well.

Malarz and Galam (2005) considered the problem of percolation on a square lattice for various ranges of neighbor links. Below is their illustration of (a) nearest neighbour “NN” and (b) next-nearest neighbour “NNN” links.

Percolation Threshold on a Square Lattice

2013-10-30 R

Manfred Schroeder touches on the topic of percolation a number of times in his encyclopaedic book on fractals (Schroeder, M. (1991) Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise. Percolation has numerous practical applications, the most interesting of which (from my perspective) is the flow of hot water through ground coffee!

Plotting Times of Discrete Events

2013-10-19 R

I recently enjoyed reading O’Hara, R. B., & Kotze, D. J. (2010). Do not log-transform count data. Methods in Ecology and Evolution, 1(2), 118–122. doi:10.1111/j.2041-210X.2010.00021.x.

Mounting a sshfs volume via the crontab

2013-10-06 Linux

I need to mount a directory from my laptop on my desktop machine using sshfs. At first I was not making the mount terribly regularly, so I did it manually each time that I needed it. However, the frequency increased over time and I was eventually mounting it every day (or multiple times during the course of a day!). This was a perfect opportunity to employ some automation.

Top 250 Movies at IMDb

2013-10-03 R web scraping

Some years ago I allowed myself to accept a challenge to read the Top 100 Novels of All Time (complete list here). This list was put together by Richard Lacayo and Lev Grossman at Time Magazine.

To start with I could tick off a number of books that I had already read. That left me with around 75 books outstanding. I knuckled down. The Lord of the Rings had been on my reading list for a number of years, so this was my first project. A little unfair for this trilogy to count as just one book… but I consumed it with gusto! One down. Other books followed. They were also great reads. And then I hit a couple of books that were just, well, to put it plainly, heavy going. I am sure that they were great books and my lack of enjoyment was entirely a reflection on me and not the quality of the books. No doubt I learned a lot from reading them. But it was hard work! At this stage it occurred to me that the book list was constructed from a rather specific perspective of what constituted a great book. A perspective which is quite different from my own. I had to admit defeat: my literary tastes will have to mature a bit before I attack this list again!

Flushing Live MetaTrader Logs to Disk

2013-09-18

The logs generated by expert advisors and indicators when running live on MetaTrader are displayed in the Experts tab at the bottom of the terminal window. Sometimes it is more convenient to analyse these logs offline (especially since the order of the records in the terminal runs in a rather counter-intuitive bottom-to-top order!). However, because writing to the log files is buffered, there can be a delay before what you see in the terminal is actually written to disk.

Clustering the Words of William Shakespeare

2013-09-10 R

In my previous post I used the tm package to do some simple text mining on the Complete Works of William Shakespeare. Today I am taking some of those results and using them to generate word clusters.

MetaTrader Time Zones

2013-09-09

Time zones on MetaTrader can be slightly confusing. Two specific time zones are important:

the time zone of the broker’s server and
your local time zone.

And these need not be the same.

Text Mining the Complete Works of William Shakespeare

2013-09-05 R

I am starting a new project that will require some serious text mining. In the interests of bringing myself up to speed on the {tm} package, I thought I would apply it to the Complete Works of William Shakespeare and just see what falls out.

The first order of business was getting my hands on all that text. Fortunately it is available from a number of sources. I chose to use Project Gutenberg.

Presenting Conformance Statistics

2013-08-27 R

A client came to me with some conformance data. She was having a hard time making sense of it in a spreadsheet. I had a look at a couple of ways of presenting it that would bring out the important points.

The Data

The data came as a spreadsheet with multiple sheets. Each of the sheets had a slightly different format, so the easiest thing to do was to save each one as a CSV file and then import them individually into R.

1
2
3
27
28
29
30