Andrew B. Collier / @datawookie


Link to CV.


AWS EC2: Setting up a Load Balancer

An Application Load Balancer receives requests and distributes them across a selection of processing resources. These processing resources are divided into Target Groups (see previous post for how to set one up).

Creating an Application Load Balancer

We’re setting up a Flask API which is deployed as a Docker image and running on ECS. We’re going to create a load balancer which will accept requests on port 80 and route them to port 5000 on the API container.

Read More →

AWS EC2: Creating a Target Group

A banner image for AWS/EC2.

If we want to have an ECS service which is visible to the public, then we need to set up an Application Load Balancer. There are a couple of steps to this process, the first of which is creating a Target Group.

Read More →

AWS Containers #4: Dependencies

A banner image for AWS/ECS.

We saw in a previous post that it’s important to ensure that the Selenium container is running and accepting requests before the crawler actually gets started. This is because the crawler depends on Selenium being available. We can use ECS task dependencies to assert this dependency.

Read More →

AWS Containers #5: Health Checks

A banner image for AWS/ECS.

Can we create a health check that will check if the Selenium service is available? Yes! We will need to do two things:

  • tell the crawler container to wait for the Selenium container to be HEALTHY and
  • add a health check to the Selenium container.

Let’s do it!

Read More →

{hagr} Linnaean Classification

I’ve taken another look at the {hagr} data, which I wrote about previously. This time I’m focusing on the hierarchy of creatures.

Taxonomic Rank

The Linnaean Taxonomy is a hierarchical classification system for organisms devised by Carl Linnaeus. An organism is assigned to the following levels in the hierarchy (in increasing order or granularity):

  • domain
  • kingdom
  • phylum
  • class
  • order
  • family
  • genus and
  • species.

The relative level of a group of organisms in this hierarchy determines its taxonomic rank.

Read More →

{hagr} Database of Animal Ageing and Longevity

I came across the Human Ageing Genomic Resources. They are doing some fascinating work and expose some engrossing data. I wanted to make the data easier for me to work with, and an R package seemed to be the natural vehicle to do this.

For more information on these data, take a look at this article: Tacutu, Craig, Budovsky, Wuttke, Lehmann, Taranukha, Costa, Fraifeld and de Magalhaes, “Human Ageing Genomic Resources: Integrated databases and tools for the biology and genetics of ageing,” Nucleic Acids Research 41(D1):D1027-D1033, 2013.

Read More →

Making the Most of Mobility

The Google Mobility Data (or Community Mobility Reports) refers to the datasets provided by Google which track how people move and congregate in various locations during specific time periods. The data is based on anonymised location information from users who have opted into Location History on their Google accounts.

Read More →

Flexible Environment Variables for a Docker Image

I’ve been following an excellent tutorialfor deploying a Docker image on an EC2 instance via GitLab CI/CD. It covers every step in the process in great detail. If you follow the steps then you’ll definitely end up with a working pipeline.

However, I still wasn’t quite sure how to handle the environment variables and credentials that I wanted to bake into the image, and which varied between my local development environment and the final deployed image.

Read More →

Install GitLab Runner with Docker

📢 An updated version of this post reflecting recent changes in GitLab Runner can be found here.

I’ve got a project which takes a long time to build. And I rebuild it regularly. I’ve been using the shared runners on GitLab. However, the total time constraint has become a limitation. I’m going to install GitLab Runner as a Docker service on an underutilised EC2 instance.

Read More →