Andrew B. Collier / @datawookie


Social links and a link to my CV.

Public datasets:


Working with Fairly Wide Data

Working with Fairly Wide Data

The concept of “wide data” is relative. In some domains 100 columns is considered “wide”, while in others that’s perfectly normal and you’d need to have thousands (or tens of thousands!) of columns for it to be considered even remotely “wide”. The data that we work with at Fathom Data generally lies in the first domain, but from time to time we do work on data that is considerably wider.

Read More →

Medusa: A Multi-Headed Tor Proxy

Medusa: A Multi-Headed Tor Proxy

At Fathom Data we have a few projects which require us to send HTTP requests from an evolving selection of IP addresses. This post details the Medusa proxy docker image which uses Tor (The Onion Router) as a proxy.

What is a Proxy Server?

A proxy server acts as an intermediary between a client and a server. When a request goes through a proxy server there is no direct connection between the client and the server. The client connects to the proxy and the proxy then connects to the server. Requests and responses pass through the proxy.

Read More →

{emayili} Managing CSS

{emayili} Managing CSS

I love the clean simplicity of an R Markdown document. But sometimes it can feel a little bare and utilitarian. This is especially the case if it’s rendered into the body of an email. How about injecting a little more pizzazz?

Read More →

{emayili} Rendering Plain Markdown

{emayili} Rendering Plain Markdown

We’ve been able to attach text and HTML content to messages with {emayili}. But something that I’ve really been wanting to do is render Markdown directly into an email.

In version 0.4.19 I’ve added the ability to directly render Plain Markdown into a message. That version is not on CRAN, so you’ll need to install from GitHub.

Read More →

{clockify} Time Tracking from R

{clockify} — Time Tracking from R

At Fathom Data we use Clockify to keep detailed records of the time that we spend working on our clients’ projects. Up until fairly recently we manually generated timesheets at the end of each month that were sent through to the clients along with their invoices. Our experience has been that providing detailed timesheets helps foster trust and transparency. However, with a growing team and an expanding clientele, generating these timesheets has become progressively more laborious. Time to automate!

Read More →

Setting up a Tiny HTTP Proxy

Setting up a Tiny HTTP Proxy

It’s often handy to have access to a HTTP proxy. I use this recipe from time to time to quickly fling together a proxy server which I can use to relay HTTP requests from a different origin.

Read More →

Pre-Commit Hook for Processing README.Rmd

Pre-Commit Hook for Processing

When writing an R package I usually create a README.Rmd file that I render to README.md. I use {pkgdown} to then create documentation. I run the last step via CI, so once it’s set up I never need to think about it again.

The problem is that I regularly forget to process the README.Rmd file, which means that despite keeping that up to date, everything else lags behind.

What if I automated the process? I created a simple pre-commit hook which processes README.Rmd whenever I make a commit and automatically adds any changes to the commit.

Read More →

{emayili} Rudimentary Email Address Validation

{emayili}: Rudimentary Email Address Validation

A recent issue on the {emayili} GitHub repository prompted me to think a bit more about email address validation. When I started looking into this I was somewhat surprised to learn that it’s such a complicated problem. Who would have thought that something as apparently simple as an email address could be linked with such complexity?

Read More →

Old ‘Hood, New ‘Hood

Image adapted from the cover of 'Old Hat New Hat' by Dr Seuss.

I recently moved from suburban South Africa to rural England. I’m figuring out my new environment. Making some maps seemed to be a good way to get familiar with the surroundings.

In the process I wanted to figure out two things:

  • how to get maps with a consistent aspect ratio at different latitudes; and
  • how to overlay a partially transparent map layer.

To make things more interesting I’ll create maps of both my old and new locations.

Read More →

Websockify & noVNC: Adding SSL

Websockify & noVNC: Adding SSL

If you’re going to be exposing noVNC on the (public) internet, then it’s vital that you take some security measures. You should install a suitable SSL certificate and serve noVNC via HTTPS rather than HTTP. Getting that all up and running can be moderately tricky. Here’s a quick recipe to get a minimal setup working.

Read More →

Websockify & noVNC behind an NGINX Proxy

Websockify & noVNC behind an NGINX Proxy

At Fathom Data we are developing a framework which will enable remote access to Linux desktops via a browser. There’s nothing new to this idea. However, we have a very specific application in mind, so we need to roll our own solution. Importantly, there need to be multiple independent connections catering for a group of users. In this post I’ll show how we used the following tools to make this possible:

Read More →

TomTom Routing

A highway stretching into the distance.

While working with the Google Mobility Data I stumbled upon the TomTom Traffic Index. I then learned that TomTom has a public API which exposes a bunch of useful and interesting data.

Seemed like another opportunity to create a smaller R package. Enter {tomtom}.

{tomtom} Package

The {tomtom} package can be found here.

Install the package.

remotes::install_github("datawookie/tomtom")

Load the package.

library(tomtom)

API Key

Getting a key for the API is quick and painless. I stored mine in the environment then retrieved it with Sys.getenv().

Read More →