Blog Posts by Andrew B. Collier / @datawookie


Making Sense of Drug Prices

Drug pricing is complicated. In this post I take a look at reconciling ASP and WAC prices, focusing on normalising the WAC price per billing unit to achieve a price which is comparable to ASP. This post includes a number of case studies with the objective of laying out and testing a methodology for understanding and dealing with these data. There’s a fair amount of repetition, but I wanted to test the approach across a number of drugs.

Read More →

{emayili} Updated Gmail Authentication

A recent announcement from Google stated that from 30 May 2022 they will no longer support login via username and password (this is the “less secure” option). The change will have an impact for people using the {emayili} package to send email from R, but will also affect many others who use this form of authentication to access their emails via desktop email clients. In this short post I detail how to work around this by using an application password.

Read More →

Creating Git Commits in CI

I use Continuous Integration (CI) extensively across almost all of my remote Git repositories. These are the typical jobs which it’s used for:

  • running tests
  • building documentation and
  • acquiring data.

This post addresses the last item, acquiring data.

Read More →

Adding Timestamp Columns

If you have database tables in which you are frequently adding or updating data, then it can be useful to have columns which indicates precisely when a specific record was created and updated.

Read More →

Pushing Docker Images to AWS ECR

I’ve been using the image registry on GitLab for quite a while now and loved the convenience of having my images living in the same place as my code. However, recently GitLab introduced a soft limit on transfers and that’s cramping my style. I’m moving a lot of my images onto Amazon Elastic Container Registry (ECR). In this post I look at how to get this set up.

Read More →

How to Harvest RSS Feeds

At Fathom Data we tend to do quite a lot of web scraping. At the moment I’m working on a small project which requires assembling a large selection of RSS feeds. Aggregator sites (like Feedspot and Feedly) have extensive, carefully curated lists of RSS feeds. As we’ll see below, the underlying lists are not entirely trivial to access.

Read More →

{emayili} Message Templates

Services like Mailchimp and MailerLite make it easy to create stylish email campaigns. Their templating tools allow you to create elegant HTML messages which are personalised to the recipient.

Wouldn’t it be cool if you could do something similar when sending emails from R? Well, with the latest version of {emayili}, that’s now possible (although this feature is definitely in its infancy!).

Read More →

Translating QCT (Quick Chart) Map Files

I’ve got a stash of old (2004 vintage) UK Ordnance Survey maps. They are really works of art and the the folk at the Ordnance Survey should be commended on the level of detail embedded in these maps. There’s just one small snag: the maps are in a rather obscure format. The proprietary Quick Chart (.qct extension) format is intended for use with Memory Map navigation software. If you want to use these maps for other purposes then you are stuck.

Read More →

{filebin} Quick & Easy File Sharing

At Fathom Data we have a number of workflows that require us to share various bits of data for a short time. The data are not sensitive, so we can freely share them. We have been doing this manually via platforms like Google Drive, Box or Dropbox. However we need to remember to go back and delete the file some time later. This is not ideal. What we needed was a simple “fire and forget” solution which would allow us to share the files and they would disappear automatically after some time. Well, this is precisely what Filebin does.

Read More →