Blog Posts by Andrew B. Collier / @datawookie


GitLab CI: Services

I needed to have a Redis server available as part of the GitLab CI pipeline for this blog (simply because I wanted to use the {rredis} package). After fiddling around for some time trying to install the redis-server package using apt I discovered that GitLab CI actually provides Redis as a service, which makes the process remarkably easy.

Some details of the “standard” services (Redis, PostgreSQL and MySQL) supported by GitLab CI can be found here:

Read More →

Scrapy Ban Policies with Rotating Proxies

The scrapy-rotating-proxies package makes it simple to use rotating proxies with Scrapy.

One issue that I’ve run into though is that pages which return a 404 error are retried (and the corresponding proxy is marked as dead). This does not make sense to me since if a server returns a 404 error this generally means that the requested page is just not available. It’s not a proxy problem; it’s a URL problem.

Read More →

Uploading CSV to MySQL

I occasionally need to upload the contents of a CSV file to a MySQL database. It happens sufficiently infrequently that I need to remind myself how it works each time. Hopefully this will make it easier next time around.

The Data.

Suppose that you have a file, prices.csv, that looks like this:

"time","product","price"
2020-06-03 22:33:39,"Basic T-Shirt",299
2020-07-22 21:32:21,"Pique Polo",429
2020-04-07 05:38:17,"COUNTRY ROAD Slub Frill T-Shirt",299
2020-04-23 03:54:09,"Caribbean Tan Mousse Gradual A 150ml",95.95
2020-04-01 05:01:29,"Pulled Pork Shoulder 500g",79.99
2020-05-15 12:26:48,"Back To Work Blazer",2299
2020-07-13 06:28:27,"Funnel Neck Cardigan",1499
2020-06-03 17:07:50,"Extra Depth 180TC Cotton Blend Fitted Sheet",279
2020-07-28 02:00:29,"Clover Seal Full Cream Fresh Milk 1l",17.99

Create Table

First we need to create a table.

Read More →

Resizing a Volume on an EC2 Linux Instance

Resizing a Volume on an EC2 Linux Instance.

From time to time you might need to resize one of the volumes attached to an EC2 instance. Perhaps it’s too big and you’re wanting to downsize? Or maybe it’s too small and you’re wanting to upscale? You only have the option of increasing the size of a volume. If you want a smaller one, then you’ll need to create a new volume and migrate the data across. However, if you’re making it bigger, then everything you need to know is in this post.

Read More →

Shiny App in Docker with HTTP Authentication

A banner image featuring the NGINX logo and the name of the NGINX package.

Suppose you have an app running on a Shiny server and you want to add HTTP authentication so that it’s only accessible via a username and password. This can be done using NGINX.

Test Shiny Server

The Shiny server should be accessible at http://localhost:3838/ (assuming you’re running Shiny server on localhost). 📢 Substitute another IP address or DNS entry if you’re running on another machine.

Read More →

Retail Data: R Package

Have you ever noticed how things seem to get really expensive at specific times of the year? Like Mother’s Day and Valentine’s Day? Have you ever felt a bit ripped off when buying an over-priced bouquet of flowers or box of chocolates? Have you ever wondered just how much those prices have been inflated?

Of course you have!

But it’s always been a niggling suspicion, never a fact. Where’s the evidence?

Read More →

Private Security and the Pareto Principle

Private Security is a big industry in South Africa. Most Private Security companies promise to provide a rapid response to every callout generated by any of their customers. There is a delicate balance between the number of response vehicles and the number of customers (and the frequency of their callouts!), which determines whether or not they are able to honour this promise.

On the one hand, more response vehicles result in lower response times. However, these vehicles are expensive to maintain and staff. Fewer vehicles are more cost effective, but make it difficult to maintain a high level of service.

Read More →

Tweaking Linux for Pernickety Projectors

Linux has really come a long way. I used to arrive at the podium and hook up my (Linux) laptop with the resigned expectation that there would be some tweaking involved to get it to speak to the projector. However the support for video hardware has evolved massive and nowadays I don’t ever think about this: it just works.

Until it doesn’t.

This week I was speaking at a conference where the video setup was extremely pernickety. It required a resolution of 1280 by 720 at a frequency of 50 Hz. Try and setup that up using the desktop display configuration tools in Ubuntu… it just doesn’t seem to be possible.

Read More →

MySQL Backups

Your data are valuable. If, God forbid, some disaster befalls your database then you should have a plan in place for how to recover your data. In this post I describe a simple strategy for backing up a MySQL database. This might not be the best approach, but it has worked for me.

Read More →

R, Docker and Checkpoint: A Route to Reproducibility

I need to deploy Shiny on a Windows machine. I also need to use {checkpoint} for package management. Using Docker seems to be the only reasonable approach to Shiny on Windows. But how easy would it be to also factor {checkpoint} into this setup?

Only one reasonable way to find out: give it a try.

Below is the simple Dockerfile I used. Here are the fundamental components of what it does:

Read More →