Andrew B. Collier / @datawookie

Social links and a link to my CV.

Public datasets:

British Canoeing Results

{pagedown} Page Size & Margins

2022-10-22 {pagedown} R

{pagedown} Page Size & Margins

At Fathom Data we have been doing a lot of automated documentation and automated reporting. Although many of these documents are rendered to HTML, there’s an increasing demand for PDF documents. We’ve had to raise out game in that department. The {pagedown} package has become invaluable. This is a short note showing how we tweak the page size and margins for PDF documents.

Scaling Density Plots

2022-10-08 R {ggplot2} {ggridges}

Scaling Density Plots

I’m a density plot devotee. And, using geom_density() from {ggplot2} these plots are effortless to produce. However, sometimes the results of geom_density() are not exactly what I’m after. Here’s how I tweak them to give me precisely what I need.

Handling Empty Paragraphs from R Markdown

2022-09-29 R Markdown HTML CSS

From time to time I find empty paragraph tags (<p></p>) inserted into my HTML when knitting an R Markdown document.

Vertically Align Image & Text

2022-09-24 HTML CSS

I’m not a web developer. However, I do regularly use HTML and CSS to layout pages which are then transformed into PDF documents. One of the requirements that I encounter fairly often is vertically aligning images and text. Fairly often, but not often enough to remember the solution. I inevitably have to rediscover the solution (thanks StackOverflow!). Jotting this down for posterity.

Enforcing Style in an R Project

2022-09-20 Git R lint pre-commit

Enforcing Style in an R Project

In the previous post we looked at how to apply a linter and styler to a Python Project. Now we’re going to do the same for an R project. We’ll use the {precommit} R package to make the setup a breeze.

Enforcing Style in a Python Project

2022-09-19 Git Python lint pre-commit

Enforcing Style in a Python Project

A linter and a styler can help you to write cleaner and more consistent code. In this post we’ll look at how to set up both for a Python project.

Squares & Spirals

2022-09-18 R

While trolling the internet aimlessly this morning, this TikTok video caught my attention.

Calculating the Fire Danger Index (FDI)

2022-09-08 R

Calculating the Fire Danger Index (FDI)

In a previous post I took a look at some granular weather data that I acquired via the Weather API. One interesting application of these data is calculating the Fire Danger Index (FDI), which measures the degree of fire danger using information on dryness, wind speed, temperature and humidity.

Using Shiny Server in Docker

2022-09-07 R Shiny Docker

Using Shiny Server in Docker

A quick note on how to use the Shiny Server Docker image, rocker/shiny.

I’m a big believer in starting with the simplest possible setup, getting that to work and then adding complexity in layers. We’ll start with a simple Shiny application in app.R.

Postboxes & Postal Codes

2022-08-13 {blimey} R

Postboxes & Postal Codes

I’ve just added some more data to the {blimey} package:

postal codes and
postbox locations.

Schools in England

2022-08-11 {blimey} R

I’ve just added data on schools in England to the {blimey} package. The raw data were obtained from gov.uk.

Linux Packages for R

2022-08-09 R Linux

Linux Packages for R

Getting R set up on Linux can be somewhat frustrating. Many of the fundamental packages (like {devtools} or {remotes}) have implicit system dependencies. Installing these packages can involve numerous iterations back and forth between R and the shell while you figure out what those dependencies are and get them all installed.

I’ve been through this process many times now and finally just created a quick script that will get most of it done quickly and easily.

Historical Weather Data

2022-08-07 weather R Python

Historical Weather Data

I’m building a model which requires historical weather data from a selection of locations in South Africa. In this post I demonstrate the process of acquiring the data and doing some simple processing.

Persisting Data with Pickle & S3

2022-07-28 Python S3

Persisting Data with Pickle & S3

I occasionally write scripts where I need to persist some information between runs. These scripts are often wrapped in a Docker image and deployed on Amazon ECS. This means that there is no persistent storage. I could use a database, but this would be overkill for the volume of data involved. This post describes a simple approach to storing these data on S3 using a pickle file.

Great Britain Railway Network

2022-07-16 spatial {blimey} R

Introducing the nascent R package {blimey} (repository). At this stage it contains only the following data:

railways — latitude and longitude segments along railway lines (wide format);
railways_pivot — latitude and longitude segments along railway lines (long format); and
railway_stations — codes, names and locations of railway stations.

Interactive Brokers API: Connecting from MATLAB

2022-05-10 trading MATLAB

An old stock exchange trading floor from the 1920s.

In this post I show how to connect to the Interactive Brokers API from MATLAB. The “obvious” way generates a SSL error. This is an alternative approach using the system curl command. It’s a bit of a hack but it does the job.

Interactive Brokers Client Portal API Gateway

2022-05-09 trading

An old stock exchange trading floor from the 1920s.

The Interactive Brokers Client Portal API Gateway provides access to a REAST API for IBKR accounts. It makes it possible to retrieve account information, manage portfolios, execute trades, and access market data via standard HTTP requests.

Interactive Brokers: Gateway Automation

2022-04-21 trading {IBrokers}

In a previous post I documented my local setup for accessing the Interactive Brokers API via their Gateway application. I’m now in a position where I need to deploy my code onto a VM. My local setup will no longer suffice.

Interactive Brokers: Gateway Install & Setup

2022-04-20 trading

A brief tutorial on setting up the Interactive Brokers Gateway on Linux.

{emayili} Encrypted Email with Mailfence

2022-04-04 GPG {emayili} R

{emayili} Encrypted Email with Mailfence

In the previous post I ran through the process of setting up a Mailfence account for encrypting emails using asymmetric encryption. In this post I show how Mailfence can be used with the {emayili} package for sending encrypted email from R.

Mailfence Setup

2022-04-03 encryption email

Mailfence Setup

If you’re thinking about using encrypted email, then Mailfence appears to be a pretty good option for getting started in a relatively painless way.

Pre-Registered GitLab Runner in a Container

2022-03-23 GitLab Docker

Pre-Registered GitLab Runner in a Container

In a previous post I described a recipe for setting up GitLab Runner using a Docker container. With that setup it was possible to register multiple runners on a single container. However, each runner needed to be registered manually. This setup makes complete sense if the container will be around for a while. But what if you’re spinning up a GitLab Runner container for only a short duration? In this case it might be preferable to have the container pre-configured (or at least easily configured) to provide a runner to a specific project or group. Setting that up is the goal of this post.

Scheduling Refresh of a Materialised View

2022-03-22 PostgreSQL SQL AWS RDS

Scheduling Refresh of a Materialised View

Materialised views are a great alternative to views if the underlying query takes a long time to run. However, the principle problem with materialised views is that their content gets stale… and if the database is active, then it gets out of date rather quickly. Sure you can manually refresh a materialised view, but who has the discipline or time to do that? Better to automate the process. Then you can safely forget about it, secure in the knowledge that the data in the materialised view will remain current.

Firing Up Firestore

2022-03-20 NoSQL Python Firestore Votela

Firing Up Firestore

I’ve just started collaborating on a new project, Votela, with Luke. We’re going to be using Firestore for stashing our data. I’ve never worked with Firestore before, so one of my first tasks was just figuring out how to get connected and how to shift some data to and from the database.

Making Sense of Drug Prices

2022-03-09

Making sense of drug prices.

Drug pricing is complicated. In this post I take a look at reconciling ASP and WAC prices, focusing on normalising the WAC price per billing unit to achieve a price which is comparable to ASP. This post includes a number of case studies with the objective of laying out and testing a methodology for understanding and dealing with these data. There’s a fair amount of repetition, but I wanted to test the approach across a number of drugs.

{emayili} Updated Gmail Authentication

2022-03-08 {emayili} R

Updated Gmail Authentication

A recent announcement from Google stated that from 30 May 2022 they will no longer support login via username and password (this is the “less secure” option). The change will have an impact for people using the {emayili} package to send email from R, but will also affect many others who use this form of authentication to access their emails via desktop email clients. In this short post I detail how to work around this by using an application password.

Creating Git Commits in CI

2022-03-04 Git CI GitLab GitHub

Banner for 'Creating Git Commits in CI' post.

I use Continuous Integration (CI) extensively across almost all of my remote Git repositories. These are the typical jobs which it’s used for:

running tests
building documentation and
acquiring data.

This post addresses the last item, acquiring data.

Adding Timestamp Columns

2022-02-23 PostgreSQL SQL

If you have database tables in which you are frequently adding or updating data, then it can be useful to have columns which indicates precisely when a specific record was created and updated.

Pushing Docker Images to AWS ECR

2022-02-18 GitLab ECR Docker

Pushing Docker Images to AWS ECR

I’ve been using the image registry on GitLab for quite a while now and loved the convenience of having my images living in the same place as my code. However, recently GitLab introduced a soft limit on transfers and that’s cramping my style. I’m moving a lot of my images onto Amazon Elastic Container Registry (ECR). In this post I look at how to get this set up.

How to Harvest RSS Feeds

2022-01-31 RSS web scraping

How to Harvest RSS Feeds

At Fathom Data we tend to do quite a lot of web scraping. At the moment I’m working on a small project which requires assembling a large selection of RSS feeds. Aggregator sites (like Feedspot and Feedly) have extensive, carefully curated lists of RSS feeds. As we’ll see below, the underlying lists are not entirely trivial to access.

A Recipe for Upgrading R

2022-01-25 R

Upgrading R

This is the recipe I use to upgrade R on a Linux box. It’s something that I do fairly frequently on fresh EC2 instances.

{emayili} Message Templates

2022-01-21 {emayili} R

Message Templates

Services like Mailchimp and MailerLite make it easy to create stylish email campaigns. Their templating tools allow you to create elegant HTML messages which are personalised to the recipient.

Wouldn’t it be cool if you could do something similar when sending emails from R? Well, with the latest version of {emayili}, that’s now possible (although this feature is definitely in its infancy!).

{emayili} Sending Email from Shiny

2022-01-20 Shiny {emayili} R

Sending Email from Shiny

The {emayili} package for sending emails from R works well within a Shiny app. You just need to set it up right.

{emayili} HTML Messages with Images

2022-01-12 {emayili} R

HTML Messages with Images

No two email clients are equal. Nowhere is this more true than in the way that they treat images in HTML messages.

Building GPXSee

2022-01-05 spatial

I’m using the source for GPXSee to figure out some details of the QCT format. The build instructions in the README are somewhat terse. Here are more details.

Translating QCT (Quick Chart) Map Files

2021-12-30 spatial

Translating Quick Chart Map Files

I’ve got a stash of old (2004 vintage) UK Ordnance Survey maps. They are really works of art and the folk at the Ordnance Survey should be commended on the level of detail embedded in these maps. There’s just one small snag: the maps are in a rather obscure format. The proprietary Quick Chart (.qct extension) format is intended for use with Memory Map navigation software. If you want to use these maps for other purposes then you are stuck.

{emayili} Sending Encrypted Email

2021-12-07 {emayili} R

{emayili} Sending Encrypted Email

In a previous post I documented what I had learned while trying to understand the structure of encrypted emails. I then took an informal Twitter poll to gauge how many people are using encrypted email messages.

{emayili} Understanding Encrypted Email

2021-11-26 {emayili} email encryption GPG

{emayili} Understanding Encrypted Email

I’m adding encrypted message support to the {emayili} package for sending emails from R.

{filebin} Quick & Easy File Sharing

2021-11-18 {filebin} R

{filebin} Quick & Easy File Sharing

At Fathom Data we have a number of workflows that require us to share various bits of data for a short time. The data are not sensitive, so we can freely share them. We have been doing this manually via platforms like Google Drive, Box or Dropbox. However we need to remember to go back and delete the file some time later. This is not ideal. What we needed was a simple “fire and forget” solution which would allow us to share the files and they would disappear automatically after some time. Well, this is precisely what Filebin does.

{binance} P2P Trades

2021-11-10 {binance} R

{binance} Spot Trading

Peer-to-Peer (P2P) cryptocurrency trades occur directly between two parties without a central authority (like an exchange) being involved.

Shared Memory & Docker

2021-11-09 Selenium Linux Docker

Shared Memory & Docker

The shared memory device, /dev/shm, provides a temporary file storage filesystem using RAM for storing files. It’s not mandatory to have /dev/shm, although it’s probably desirable since it facilitates inter-process communication (IPC).

{binance} Spot Trading: Liquidity

2021-11-08 {binance} R

{binance} Spot Trading

In previous posts we looked at creating market orders and limit orders with {binance}. We saw a couple of successful trades. However, sometimes trades are not successful and the orders are not filled. Let’s try to understand why.

Accessing Virtual Memory from a Docker Container

2021-11-06 Docker

Accessing Virtual Memory from a Docker Container

Memory is something I generally don’t worry about when working with Docker. It just works. This is great… but what happens when it doesn’t?

{binance} Spot Trading: Limit Orders

2021-11-05 {binance} R

{binance} Spot Trading

In the previous post we looked at creating market orders on Binance using the {binance} package. Today we’re going to dig into limit orders.

{binance} Spot Trading: Market Orders

2021-11-01 {binance} R

{binance} Spot Trading

Functionality for working with spot trades is now available in {binance}. In this post we’ll establish some background on spot trading and then explore some related functions.

{binance} Dealing with Dust

2021-10-27 {binance} R

{binance} Dealing with Dust

Dust refers to the fragments of coins which are too small to use for transactions. In the fiat world the equivalent would be those worthless coins with too little value to actually buy anything, that take up space in your wallet and end up scattered across parking areas.

Binance allows you to convert dust into BNB. In this post I discuss the functions in {binance} which support this operation.

I’ve got a bit of dust in my wallet.

{binance} Tracking Total Account Balance

2021-10-26 {binance} R

{binance} Tracking Total Account Balance

I started dabbling in Crypto trading on Binance at the beginning of September 2021. I am really impressed with the interface, which is smooth and full featured (if perhaps a little complicated and confusing!). One of the things that has frustrated me though is not being able to get an idea of whether I’m making progress. There’s no view which shows me the overall status of my account and how this has evolved over time.

HCRIS Field Labels

2021-10-19 {pdftools} R

HCRIS Field Labels

Fathom Data has been doing a lot of work with the HCRIS (Healthcare Cost Report Information System) data. The underlying reports are submitted as a spreadsheet with multiple sheets. The data are then extracted and recorded in a simple tabular format, with each field linked to a worksheet code (wksht_cd), column number (clmn_num) and line number (clmn_num). These three keys are then mapped to a single compound key. The resulting data look something like this:

{emayili} Message Threads

2021-10-18 {emayili} R

{emayili} Message Threads

Being able to view related messages as threads is really useful. To make this possible, messages must use either the In-Reply-To or References header field to link to the Message-ID from another message.

This is now possible in {emayili}.

{emayili} Support for Gmail, SendGrid & Mailgun

2021-10-15 {emayili} R

{emayili} Support for Gmail, SendGrid & Mailgun

The {emayili} package supports configuring a generic SMTP server via the server() function. In the most recent version, v0.6.5, we add three new functions, gmail(), sendgrid() and mailgun(), which provide specific support for Gmail, SendGrid and Mailgun.

1
2
3
4
5
12