Andrew B. Collier / @datawookie

Social links and a link to my CV.

Public datasets:

British Canoeing Results

{emayili} Support for Mailtrap

2024-04-23 {emayili} R

{emayili} Support for Mailtrap

The {emayili} package has adapters which make it simple to send email via a variety of services. For example, it caters specifically for ZeptoMail, MailerSend, Mailfence and Sendinblue. The latest version of {emayili}, 0.8.0 published on 23 April 2024, adds an adapter for Mailtrap.

Backtesting

2024-04-21 R quant GARCH

A farmer hoeing his field in the style of John Constable.

The key to successful backtesting is to ensure that you only use the data that were available at the time of the prediction. No “future” data can be included in the model training set, otherwise the model will suffer from look-ahead bias (having unrealistic access to future data).

Docker-in-Docker with GitHub Actions

2024-04-21 Docker GitHub

A fanciful depiction of a Docker daemon in the style of Bernardo Patentino.

Are you trying to build and push a Docker image from GitHub Actions?

Asset Allocation

2024-04-19 R quant GARCH

A farmer, his sheep and equipment in the style of John Constable.

The Two-Fund Separation Theorem introduced by James Tobin, a Nobel Prize-winning economist, is a fundamental concept in investment theory. It addresses how investors can optimally allocate their assets. In an efficient market an optimal portfolio is a combination of a risk-free asset and a market portfolio.

Logging like a Lumberjack

2024-04-18 Python logging

Cut logs floating down a river in the Amazon.

Sprinkling status messages across you code using print() statements can be a good temporary fix for tracking down issues in your code.

But it can get messy and there’s no way for you to selectively disable them. Sure you can redirect output to a file or /dev/null but this is an all-or-nothing solution. How about disabling some messages and retaining others?

This is where the logging module comes into its own.

Parameter Constraints & Significance

2024-04-16 R quant GARCH

Cows in a field bordered by a dry stone wall. In the style of Joseph Conrad.

Setting the values of one or more parameters for a GARCH model or applying constraints to the range of permissible values can be useful.

Risk/Reward Tradeoff

2024-04-15 R quant GARCH

A painting of a river valley. On the left the countryside it verdant and green. On the right it's dry and brown.

The two quantities we have been modelling (the time-dependent average and standard deviation of the returns) represent respectively the (potential) risk and reward associated with an asset. The relationship between these two quantities is implicit in the GARCH model. However, sometimes the return depends directly on the risk. A variant of the GARCH model can take this explicit relationship into account.

Docker Image from Scratch

2024-04-14 Docker Alpine

An minimal image evocative of a whale.

Most often when you are creating a new Docker image it will be based on one of the standard Docker base images like ubuntu, alpine, python or nginx. But sometimes you might want to truly roll your own image. Starting with literally nothing. From scratch. Tabula rasa.

Model Validation

2024-04-14 R quant GARCH

A farmer inspecting a cow. Image in style of John Constable.

Is this a “good” model? How to validate a model and determine whether it’s a good representation of the training data and likely to produce robust and reliable predictions.

Parameter Significance & Parsimonious Models

2024-04-13 R quant GARCH

A landscape in the style of John Constable.

In general a parsimonious model is a good model. A model with too many parameters is likely to overfit the data. How do we determine when a model is “complex enough” but not “too complex”?

Leverage Effect

2024-04-12 R quant GARCH

Two ladies on a seesaw in a field. In style of John Constable.

The models we have been looking at do not differentiate between positive and negative residuals: both errors are treated the same. However, this does not align with reality, where the volatility resulting from a large negative return is higher than that for the corresponding positive return.

Skewed Returns

2024-04-11 R quant GARCH

A house tilted to the side in the middle of a river. Painting in the style of John Constable.

In the previous post we assumed that returns had a normal distribution. This assumption implied that the distribution was symmetric and a positive return was as likely as the corresponding negative return. In reality this assumption is just not true and returns are asymmetrically distributed.

What is a GARCH Model?

2024-04-10 R quant GARCH

A landscape in the style of John Constable.

A GARCH (Generalised Autoregressive Conditional Heteroskedasticity) model is a statistical tool used to forecast volatility by analysing patterns in past price movements and volatility.

Rolling Volatility & Returns

2024-04-09 R quant

An image of barrels being loaded onto carts in a style similar to that of John Constable.

In the previous post we loaded stock data into R and then calculated return volatility, both for the entire time series and shorter intervals. We saw that volatility is not constant but can change appreciably with time. One way to get a clear view of changes in volatility is by calculating them using a moving or (“rolling”) window.

Loading Financial Time Series

2024-04-08 R quant

An image of farm workers loading hay onto the back of a wagon in a style similar to that of John Constable.

I’m going to be writing a series of posts which will look at some applications of R (and perhaps Python) to financial modelling. We’ll start here by pulling some stock data into R, calculating the daily returns and then looking at correlations and simple volatility estimates.

PyInstaller, boto3 and configparser

2024-04-03 Python

The current version of PyInstaller (6.5.0) doesn’t play nicely with the boto3 package. Here’s how to fix it.

Python Packages from GitHub

2024-03-10 Python GitHub Git

Intellectuals at work in a Baroque style.

I’ve hit my head against this issue from time to time, so it seems like I need to document the solution somewhere for each reference.

Read by Frank Collier

2024-03-08

Read by Frank Collier

A collection of books read by my father, Frank Collier, for Tape Aids for the Blind. Dad was always an enthusiastic and patient reader. One of my earliest memories is of him reading to my sister and me in bed each morning. In retirement he devoted many hours to reading and editing books for Tape Aids for the Blind.

Host & Port: Where is it?

2024-03-02 Traefik

Traffic in a LEGO landscape.

In the previous post Traefik was compared to NGINX. Now let’s take a look at a few simple Traefik setups. We’ll focus on specifying the host and port.

What is Traefik?

2024-03-01 Traefik NGINX

Traffic in a LEGO landscape.

I’ve come across Traefik in a number of questions on Stack Overflow recently. I regularly use NGINX as a reverse proxy and sometimes find it to be a little obscure. Having an alternative would be helpful.

Standalone Next.js Application in Docker

2024-02-23 Docker Next.js

An image of a whale.

I have seen a few questions on Stack Overflow relating to building a simple standalone Next.js app in a Docker image. Here’s one way to do it.

Testing CSS & Xpath

2024-02-08 web scraping CSS XPath

A colourful image of people working in an impressionist style.

There are many tools for generating CSS selectors and XPath expressions. However, short of using them in your code, how can you quick test them? In this post I’ll show how you can use your browser’s Developer Tools to establish that your CSS or XPath is doing what you intend.

Parsing the DOM

2024-02-07 Gatsby React

A mobster in Art Deco style.

The parse() function from the html-react-parser package converts HTML strings into React elements. It allows you to take HTML and render it as if it were JSX. This can be particularly useful when you’re working with content that comes as HTML from external sources (such as a CMS) and you want to include that content in your React components. It can also be used to filter and modify the React elements.

Gatsby Content from RSS

2024-02-06 Gatsby RSS Month of Gatsby

An elaborate theatre in art deco style.

In a previous post we looked at how to import content from Medium. Another potential source of content is an RSS feed. In this post we’ll see how RSS content can be imported into a Gatsby side.

Gatsby Content from MDX

2024-02-02 Gatsby TypeScript MDX Month of Gatsby

An elaborate theatre in art deco style.

In a previous post we looked at how to use AsciiDoc Markdown to author content for a Gatsby site. Another approach to handling Markdown content is MDX, which is “Markdown for the component era”. In this post we’ll see how to integrate MDX into a Gatsby site.

Dynamic User Pages

2024-02-01 Gatsby Month of Gatsby

People socialising in an art deco style.

Suppose you want to redirect paths beginning with @ to a specific user page. For example, the @datawookie path would take you to the user page for handle datawookie.

There are probably a few ways to do this, but one approach would be to use dynamic routing.

🚀 TL;DR Show me the code. Look at the 27-dynamic-users branch. This site is deployed here.

First let’s set up the user page at src/pages/user.jsx.

Python Security Audit

2024-01-23 Git Python pre-commit

A Roman centurion guarding a cage of snakes.

Is my code secure? This is something that we should all be thinking (if not worrying) about. A thorough security audit would be the ideal, but what if you don’t have the skills or resources for that? Well, there are some tools that will at least get you part way there.

ChromeDriver in GitLab CI Pipeline

2024-01-22 Selenium ChromeDriver GitLab CI

You might need to run a Selenium crawler in a GitLab CI pipeline. Here’s how to get that set up.

Gatsby Content from Medium

2024-01-20 Medium Gatsby Month of Gatsby

A medium conducting a seance in art deco style.

In a previous post we used WordPress as a CMS for a Gatsby site. We can do something similar with Medium.

Gatsby, Tailwind & Docker

2024-01-19 Docker Tailwind Gatsby

A whale in art deco style.

Gatsby and Tailwind are a formidable combination for putting together a robust and attractive site. Throw Docker into the mix and you also have robust and reliable deployments. Here’s how to set that up for a minimal site.

Next.js, Tailwind & Docker

2024-01-18 Docker Tailwind

Three columns with a futuristic city background.

Want to use Tailwind CSS with your Next.js site? Here’s how to get that set up. Also how to wrap the whole project in a Docker image.

Web Scraping with Class Name Mangling

2024-01-16 web scraping

An old fashioned clothes mangle in a room with a wooden floor and stone walls.

Class name mangling (or hashing) is becoming increasingly prevalent. There’s no need to let it slow you down though. This is how you can deal with it.

.NET and MySQL in Docker

2024-01-15 .NET MySQL Docker

In the interests of full disclosure, I know very little (very little indeed!) about .NET. But I do enjoy figuring things out. In this post I’ve documented what I learned when trying to connect a simple .NET application to MySQL using Docker Compose.

We’re going to try to do this using Docker as far as possible, which will allow me to avoid having to set up .NET on my local machine.

WordPress Headless CMS

2024-01-06 Gatsby Month of Gatsby

An art deco style image of a garden party with an imposing house in the background.

Not everybody is comfortable crafting web pages directly in JavaScript, HTML or even Markdown. Often content writers are more productive in an environment like WordPress. What if you want to develop your site using Gatsby but allow content writers to still craft their content in WordPress? No problem! You can use WordPress simply as a Content Management System (CMS), then pull the content through into your Gatsby site.

In this post we’ll look at how to set up a Headless WordPress CMS as a source of content for Gatsby.

Humble Head

2024-01-05 Gatsby Month of Gatsby

A picture of a barber in Art Deco style, showing some men getting their hair cut and faces shaved.

Populating the <head> tag might not be the most scintillating component of building a web site, but it’s an important one to get right. Gatsby’s Head API provides a flexible mechanism for doing this and supersedes the functionality from React Helmet.

Minecraft Plugin: Discord for Voice & Text Chat

2023-12-31 plugin Ubuntu Discord Minecraft

A Minecraft character wearing headphones, striding towards the viewer with an erupting volcano in the background.

It’d be cool to be able to voice chat to other players on the Minecraft server. There are a few ways to implement this, one of which involves setting up and connecting a Discord server.

Minecraft Paper Server

2023-12-30 Ubuntu Minecraft

A Minecraft character wearing glasses. The landscape and clothing of the character are patterned with newspaper.

The original Java Edition of the Minecraft Server that we installed previously implements all of the basic server functionality required for multiplayer Minecraft. But perhaps this is not enough. What if you want to customise the server by installing plugins? In that case you need to install a more sophisticated server forked off the original. The PaperMC Minecraft Server provides a lot of bells and whistles not present in the original.

Weekly Digest & Annual Review

2023-12-29 Keras SSH Julia Firefox Weekly Digest

A large library with vaulted ceiling.

A quick review of the year.

I published 55 posts (including this one).
I spent a lot of time working with GatsbyJS for one of my clients. At first I was quite out of my depth, but I slowly figured out more or less how it works. I documented some of my learning in a series of posts.
My most popular post is still about Shared Memory & Docker. The runner up looks at how to Install GitLab Runner with Docker.
I spent some time compiling data on kayak specifications in the hope of producing a definitive table. It’s a work in progress but it’s already getting quite a lot of interest.

Now onto a few interesting articles from this week, mostly announcements of new versions.

Firefox 121.0
OpenSSH 9.6
Julia 1.10
Keras 3.0.2 and
Love Songs.

Chrome & ChromeDriver in Docker

2023-12-19 Docker ChromeDriver Selenium

A whale leaping out of the ocean in the style of Vincent van Gogh.

When I containerised Selenium crawlers in the past I normally used a remote driver connection from the crawler to Selenium, running a separate Docker image with Selenium and accessing it via port 4444. This has proven to be a robust design. However, it does mean two containers rather than just one, leading to a higher maintenance burden and elevated resource requirements.

What about simply embedding Chrome and ChromeDriver directly into the crawler image? It requires a bit more work, but it’s worth it. The critical point is ensuring compatible versions of Chrome and ChromeDriver.

SSH Tunnel: Dynamic Port Forwarding

2023-12-19 SSH Linux

SSH Tunnel: Dynamic Port Forwarding

With a local or remote SSH tunnel the ports on both the local and remote machines must be specified at the time of creating the tunnel. But what if you need something more flexible? That’s where Dynamic Port Forwarding comes into play.

Weekly Digest

2023-12-15 Weekly Digest

An image of a library.

Interesting articles from the week that was:

Python 3.12 on AWS Lambda
Python 3.12.1
Best Novels Visualised and
Space Typography.

SSH Tunnel: Remote Port Forwarding

2023-12-13 SSH Linux

A tunnel with large yellow earth-moving equipment.

Local and remote SSH tunnels serve the same fundamental purpose: they make it possible to securely send data across an unsecured network. The implementation details are subtly different though. A local SSH tunnel acts like a secure bridge from a local machine to a remote server. It’s ideal for accessing services on the remote server which aren’t publicly exposed. Conversely, a remote SSH tunnel reverses this direction, forwarding traffic from the remote server back to a local machine (or another machine).

The critical distinction between the two is the direction of the connection between the remote and local machines.

Middleware Redirects on Vercel

2023-12-12 middleware redirect Vercel Gatsby Month of Gatsby

Planes in an art deco style.

In the previous post we looked at how to set up a collection of static redirects via the vercel.json configuration file. Now we’re going to explore a more flexible and dynamic alternative using Edge Middleware.

Minecraft Client on Ubuntu

2023-12-11 Ubuntu Minecraft

A Minecraft scene with a river and trees.

In the previous post we set up a Minecraft server on Ubuntu. Now we’re going to install the Minecraft client and connect to that server.

Minecraft Server on Ubuntu

2023-12-10 Ubuntu Minecraft

A Minecraft scene with a large character in the foreground.

I’m not a gamer, but I have an offspring who is deeply obsessed with Minecraft. I set up a Minecraft server for her so that she can play with her friends online in a safe environment.

Weekly Digest

2023-12-08 LLM Docker Selenium Weekly Digest

An image of a library.

A few things that caught my attention this week:

Gemini
Docker’s Generative AI & Machine Learning Stack
Setting up a Minecraft Server on EC2
Selenium 4.16 and
Typical Airliner Seating Chart.

SSH Tunnel: Local Port Forwarding

2023-12-05 SSH Linux

A tunnel with large yellow earth-moving equipment.

SSH tunnels are a powerful and secure method for transmitting data over potentially unsecured networks. They allow users to establish an encrypted connection between their local machine and a remote server, providing a secure and private pathway for data. An SSH tunnel will allow a service running on a remote machine to appear as if it is running on a local machine. This is also known as port forwarding.

Static Redirects on Vercel

2023-12-05 Vercel Gatsby Month of Gatsby

Moored boats in an art deco style.

A redirect is a rule which sends users to a different URL than the one they requested. They are most commonly used to ensure that browsers still get to the correct page after it has been moved to a new URL.

If you have a relatively small number of redirects and don’t need to do anything too fancy then static (or “configuration”) redirects are a good option. Static redirects are configured on Vercel by adding entries to the vercel.json configuration file. There’s just one major snag: you can only create 1024 redirects using this mechanism.

Batch Resolving Merge Conflicts

2023-12-01 Git

A surrealistic image of the confluence between two rivers.

Sometimes when you run git merge you will be confronted with a huge load of merge conflicts. However, if you are lucky there might be a clear rule which you can apply to each of those conflicts, either

accept current change (change on current branch or ours) or
accept incoming change (incoming change from other branch or theirs).

In this case you can save yourself a lot of time and effort by specifying a particular merge strategy option.

Weekly Digest

2023-12-01 Spark Keras R Vercel Weekly Digest

An image of a library.

Some things that got my attention this week:

Titan Image Generator in AWS Bedrock
AWS Transcribe Supports 100+ Languages
cron Jobs in Vercel
R 4.3.2
Spark 3.4.2
Keras 3.0.0 and
Oceanography Gift.

1
2
3
12