Blog Posts by Andrew B. Collier / @datawookie


Contour and Density Layers with ggmap

I am busy working on a project which uses data from the World Wide Lightning Location Network (WWLLN). Specifically, I am trying to reproduce some of the results from Orville, Richard E, Gary R. Huffines, John Nielsen-Gammon, Renyi Zhang, Brandon Ely, Scott Steiger, Stephen Phillips, Steve Allen, and William Read. 2001. “Enhancement of Cloud-to-Ground Lightning over Houston, Texas”. Geophysical Research Letters 28 (13): 2597–2600.

Read More →

Implementing a Queue as a Reference Class

I am working on a simulation for an Automatic Repeat-reQuest (ARQ) algorithm. After trying various options, I concluded that I would need an implementation of a queue to make this problem tractable. R does not have a native queue data structure, so this seemed like a good opportunity to implement one and learn something about Reference Classes in the process. The Implementation We use setRefClass() to create a generator function which will create objects of the Queue class. Read More →

Iterators in R

According to Wikipedia, an iterator is “an object that enables a programmer to traverse a container”. A collection of items (stashed in a container) can be thought of as being “iterable” if there is a logical progression from one element to the next (so a list is iterable, while a set is not). An iterator is then an object for moving through the container, one item at a time. Iterators are a fundamental part of contemporary Python programming, where they form the basis for loops, list comprehensions and generator expressions. Read More →

Introduction to Fractals

A short while ago I was contracted to write a short piece entitled “Introduction to Fractals”. Admittedly it is hard to do justice to the topic in less than 1000 words.

Read More →

Percolation Threshold: Including Next-Nearest Neighbours

Percolation through a larger lattice at the percolation threshold.
In my previous post about estimating the Percolation Threshold on a square lattice, I only considered flow from a given cell to its four nearest neighbours. It is a relatively simple matter to extend the recursive flow algorithm to include other configurations as well. Malarz and Galam (2005) considered the problem of percolation on a square lattice for various ranges of neighbor links. Below is their illustration of (a) nearest neighbour “NN” and (b) next-nearest neighbour “NNN” links. Read More →

Plotting Times of Discrete Events

I recently enjoyed reading O’Hara, R. B., & Kotze, D. J. (2010). Do not log-transform count data. Methods in Ecology and Evolution, 1(2), 118–122. doi:10.1111/j.2041-210X.2010.00021.x.

Read More →

Mounting a sshfs volume via the crontab

I need to mount a directory from my laptop on my desktop machine using sshfs. At first I was not making the mount terribly regularly, so I did it manually each time that I needed it. However, the frequency increased over time and I was eventually mounting it every day (or multiple times during the course of a day!). This was a perfect opportunity to employ some automation.

Read More →

Top 250 Movies at IMDb

Some years ago I allowed myself to accept a challenge to read the Top 100 Novels of All Time (complete list here). This list was put together by Richard Lacayo and Lev Grossman at Time Magazine. To start with I could tick off a number of books that I had already read. That left me with around 75 books outstanding. So I knuckled down. The Lord of the Rings had been on my reading list for a number of years, so this was my first project. Read More →

Flushing Live MetaTrader Logs to Disk

The logs generated by expert advisors and indicators when running live on MetaTrader are displayed in the Experts tab at the bottom of the terminal window. Sometimes it is more convenient to analyse these logs offline (especially since the order of the records in the terminal runs in a rather counter-intuitive bottom-to-top order!). However, because writing to the log files is buffered, there can be a delay before what you see in the terminal is actually written to disk.

Read More →

MetaTrader Time Zones

Time zones on MetaTrader can be slightly confusing. There are two important time zones:

  • the time zone of the broker’s server and
  • your local time zone.

And these need not be the same.

Read More →

Text Mining the Complete Works of William Shakespeare

I am starting a new project that will require some serious text mining. So, in the interests of bringing myself up to speed on the {tm} package, I thought I would apply it to the Complete Works of William Shakespeare and just see what falls out. The first order of business was getting my hands on all that text. Fortunately it is available from a number of sources. I chose to use Project Gutenberg. Read More →

Presenting Conformance Statistics

A client came to me with some conformance data. She was having a hard time making sense of it in a spreadsheet. I had a look at a couple of ways of presenting it that would bring out the important points. The Data The data came as a spreadsheet with multiple sheets. Each of the sheets had a slightly different format, so the easiest thing to do was to save each one as a CSV file and then import them individually into R. Read More →

The Wonders of {foreach}

Writing code from scratch to do parallel computations can be rather tricky. However, the packages providing parallel facilities in R make it remarkably easy. One such package is foreach. I am going to document my trail of discovery with foreach, which began some time ago, but has really come into fruition over the last few weeks. First we need a reproducible example. Preferably something which is numerically intensive. max.eig <- function(N, sigma) { d <- matrix(rnorm(N**2, sd = sigma), nrow = N) # E <- eigen(d)$values # abs(E)[[1]] } This function generates a square matrix of uniformly distributed random numbers, finds the corresponding (complex) eigenvalues and then selects the eigenvalue with the largest modulus. Read More →

Fitting a Model by Maximum Likelihood

Maximum-Likelihood Estimation (MLE) is a statistical technique for estimating model parameters. It basically sets out to answer the question: what model parameters are most likely to characterise a given set of data? First you need to select a model for the data. And the model must have one or more (unknown) parameters. As the name implies, MLE proceeds to maximise a likelihood function, which in turn maximises the agreement between the model and the data.

Read More →