Blog Posts by Andrew B. Collier / @datawookie


A Timeline History of R

A record of some more or less important events in the history of R.

This is a work in progress. The information is cobbled together from a range of sources. If you have pertinent items to add, please let me know via the comments.

Read More →

Adding Users to an EC2 Ubuntu Instance

By default an EC2 instance has only a single user other than root. For example, on a Ubuntu instance, that user is ubuntu. If there will be multiple people accessing the instance then it’s generally necessary for each of them to have their own account. Setting this up is pretty simple, it just requires sorting out some authentication details.

Read More →

Docker: Persisting User Data

I’m busy putting together a Docker image for a multi-user Jupyter Notebook installation. I am to have an independent login for each of the users and each of them should also have their own storage space. That space should exist elsewhere from on the container though, so that even if the container stops, the data lives on. This should mitigate user rage.

The Docker logo. Read More →

RStudio Environment on DigitalOcean with Docker

I’ll be running a training course in a few weeks which will use RStudio as the main computational tool. Since it’s a short course I don’t want to spend a lot of time sorting out technical issues. And with multiple operating systems (and versions) these issues can be numerous and pervasive. Setting up a RStudio server which everyone can access (and that requires no individual configuration!) makes a lot of sense.

Read More →

Increasing MySQL Packet Maximum Size

In the process of uploading a massive CSV file to my Django application my session data are getting pretty big. As a the result I’m getting these errors:

  • (1153, "Got a packet bigger than 'max_allowed_packet' bytes") and
  • (2006, 'MySQL server has gone away').

The second error is potentially unrelated.

After some research it became apparent that the source of the problem is my max_allowed_packet setting.

Read More →

Bayesian Marathon Predictions

There are a variety of ways to predict running times over the standard marathon distance (42.2 km). You could dust off your copy of The Lore of Running (Tim Noakes). My treasured Third Edition discusses predicting likely marathon times on p. 366, referring to tables published by other authors to actually make predictions. There’s also a variety of online services, for example:

Of these I particularly like the offering from Running for Fitness which produces a neatly tabulated set of predicted times over an extensive range of distances using a selection of techniques including Riegel’s Formula and Cameron’s Model.

Read More →

Simple School Maths Problem

A simple problem sent through to me by one of my running friends:

There are 6 red cards and 1 black card in a box. Busi and Khanya take turns to draw a card at random from the box, with Busi being the first one to draw. The first person who draws the black card will win the game (assume that the game can go on indefinitely). If the cards are drawn with replacement, determine the probability that Khanya will win, showing all working.

Read More →