Andrew B. Collier / @datawookie


Social links and a link to my CV.

Public datasets:


{emayili} Support for Mailtrap

{emayili} Support for Mailtrap

The {emayili} package has adapters which make it simple to send email via a variety of services. For example, it caters specifically for ZeptoMail, MailerSend, Mailfence and Sendinblue. The latest version of {emayili}, 0.8.0 published on 23 April 2024, adds an adapter for Mailtrap.

Read More →

Backtesting

A farmer hoeing his field in the style of John Constable.

The key to successful backtesting is to ensure that you only use the data that were available at the time of the prediction. No “future” data can be included in the model training set, otherwise the model will suffer from look-ahead bias (having unrealistic access to future data).

Read More →

Asset Allocation

A farmer, his sheep and equipment in the style of John Constable.

The Two-Fund Separation Theorem introduced by James Tobin, a Nobel Prize-winning economist, is a fundamental concept in investment theory. It addresses how investors can optimally allocate their assets. In an efficient market an optimal portfolio is a combination of a risk-free asset and a market portfolio.

Read More →

Logging like a Lumberjack

Cut logs floating down a river in the Amazon.

Sprinkling status messages across you code using print() statements can be a good temporary fix for tracking down issues in your code.

But it can get messy and there’s no way for you to selectively disable them. Sure you can redirect output to a file or /dev/null but this is an all-or-nothing solution. How about disabling some messages and retaining others?

This is where the logging module comes into its own.

Read More →

Risk/Reward Tradeoff

A painting of a river valley. On the left the countryside it verdant and green. On the right it's dry and brown.

The two quantities we have been modelling (the time-dependent average and standard deviation of the returns) represent respectively the (potential) risk and reward associated with an asset. The relationship between these two quantities is implicit in the GARCH model. However, sometimes the return depends directly on the risk. A variant of the GARCH model can take this explicit relationship into account.

Read More →

Docker Image from Scratch

An minimal image evocative of a whale.

Most often when you are creating a new Docker image it will be based on one of the standard Docker base images like ubuntu, alpine, python or nginx. But sometimes you might want to truly roll your own image. Starting with literally nothing. From scratch. Tabula rasa.

Read More →

Model Validation

A farmer inspecting a cow. Image in style of John Constable.

Is this a “good” model? How to validate a model and determine whether it’s a good representation of the training data and likely to produce robust and reliable predictions.

Read More →

Leverage Effect

Two ladies on a seesaw in a field. In style of John Constable.

The models we have been looking at do not differentiate between positive and negative residuals: both errors are treated the same. However, this does not align with reality, where the volatility resulting from a large negative return is higher than that for the corresponding positive return.

Read More →

Skewed Returns

A house tilted to the side in the middle of a river. Painting in the style of John Constable.

In the previous post we assumed that returns had a normal distribution. This assumption implied that the distribution was symmetric and a positive return was as likely as the corresponding negative return. In reality this assumption is just not true and returns are asymmetrically distributed.

Read More →

What is a GARCH Model?

A landscape in the style of John Constable.

A GARCH (Generalised Autoregressive Conditional Heteroskedasticity) model is a statistical tool used to forecast volatility by analysing patterns in past price movements and volatility.

Read More →

Rolling Volatility & Returns

An image of barrels being loaded onto carts in a style similar to that of John Constable.

In the previous post we loaded stock data into R and then calculated return volatility, both for the entire time series and shorter intervals. We saw that volatility is not constant but can change appreciably with time. One way to get a clear view of changes in volatility is by calculating them using a moving or (“rolling”) window.

Read More →

Loading Financial Time Series

An image of farm workers loading hay onto the back of a wagon in a style similar to that of John Constable.

I’m going to be writing a series of posts which will look at some applications of R (and perhaps Python) to financial modelling. We’ll start here by pulling some stock data into R, calculating the daily returns and then looking at correlations and simple volatility estimates.

Read More →

Read by Frank Collier

Read by Frank Collier

A collection of books read by my father, Frank Collier, for Tape Aids for the Blind. Dad was always an enthusiastic and patient reader. One of my earliest memories is of him reading to my sister and me in bed each morning. In retirement he devoted many hours to reading and editing books for Tape Aids for the Blind.

Read More →

What is Traefik?

Traffic in a LEGO landscape.

I’ve come across Traefik in a number of questions on Stack Overflow recently. I regularly use NGINX as a reverse proxy and sometimes find it to be a little obscure. Having an alternative would be helpful.

Read More →

Testing CSS & Xpath

A colourful image of people working in an impressionist style.

There are many tools for generating CSS selectors and XPath expressions. However, short of using them in your code, how can you quick test them? In this post I’ll show how you can use your browser’s Developer Tools to establish that your CSS or XPath is doing what you intend.

Read More →

Parsing the DOM

A mobster in Art Deco style.

The parse() function from the html-react-parser package converts HTML strings into React elements. It allows you to take HTML and render it as if it were JSX. This can be particularly useful when you’re working with content that comes as HTML from external sources (such as a CMS) and you want to include that content in your React components. It can also be used to filter and modify the React elements.

Read More →

Dynamic User Pages

Month of Gatsby
People socialising in an art deco style.

Suppose you want to redirect paths beginning with @ to a specific user page. For example, the @datawookie path would take you to the user page for handle datawookie.

There are probably a few ways to do this, but one approach would be to use dynamic routing.

🚀 TL;DR Show me the code. Look at the 27-dynamic-users branch. This site is deployed here.

First let’s set up the user page at src/pages/user.jsx.

Read More →

Python Security Audit

A Roman centurion guarding a cage of snakes.

Is my code secure? This is something that we should all be thinking (if not worrying) about. A thorough security audit would be the ideal, but what if you don’t have the skills or resources for that? Well, there are some tools that will at least get you part way there.

Read More →

Gatsby, Tailwind & Docker

A whale in art deco style.

Gatsby and Tailwind are a formidable combination for putting together a robust and attractive site. Throw Docker into the mix and you also have robust and reliable deployments. Here’s how to set that up for a minimal site.

Read More →

.NET and MySQL in Docker

In the interests of full disclosure, I know very little (very little indeed!) about .NET. But I do enjoy figuring things out. In this post I’ve documented what I learned when trying to connect a simple .NET application to MySQL using Docker Compose.

We’re going to try to do this using Docker as far as possible, which will allow me to avoid having to set up .NET on my local machine.

Read More →

WordPress Headless CMS

Month of Gatsby
An art deco style image of a garden party with an imposing house in the background.

Not everybody is comfortable crafting web pages directly in JavaScript, HTML or even Markdown. Often content writers are more productive in an environment like WordPress. What if you want to develop your site using Gatsby but allow content writers to still craft their content in WordPress? No problem! You can use WordPress simply as a Content Management System (CMS), then pull the content through into your Gatsby site.

In this post we’ll look at how to set up a Headless WordPress CMS as a source of content for Gatsby.

Read More →

Minecraft Paper Server

A Minecraft character wearing glasses. The landscape and clothing of the character are patterned with newspaper.

The original Java Edition of the Minecraft Server that we installed previously implements all of the basic server functionality required for multiplayer Minecraft. But perhaps this is not enough. What if you want to customise the server by installing plugins? In that case you need to install a more sophisticated server forked off the original. The PaperMC Minecraft Server provides a lot of bells and whistles not present in the original.

Read More →

Weekly Digest & Annual Review

A large library with vaulted ceiling.

A quick review of the year.

  • I published 55 posts (including this one).
  • I spent a lot of time working with GatsbyJS for one of my clients. At first I was quite out of my depth, but I slowly figured out more or less how it works. I documented some of my learning in a series of posts.
  • My most popular post is still about Shared Memory & Docker. The runner up looks at how to Install GitLab Runner with Docker.
  • I spent some time compiling data on kayak specifications in the hope of producing a definitive table. It’s a work in progress but it’s already getting quite a lot of interest.

Now onto a few interesting articles from this week, mostly announcements of new versions.

Read More →

Chrome & ChromeDriver in Docker

A whale leaping out of the ocean in the style of Vincent van Gogh.

When I containerised Selenium crawlers in the past I normally used a remote driver connection from the crawler to Selenium, running a separate Docker image with Selenium and accessing it via port 4444. This has proven to be a robust design. However, it does mean two containers rather than just one, leading to a higher maintenance burden and elevated resource requirements.

What about simply embedding Chrome and ChromeDriver directly into the crawler image? It requires a bit more work, but it’s worth it. The critical point is ensuring compatible versions of Chrome and ChromeDriver.

Read More →

SSH Tunnel: Dynamic Port Forwarding

SSH Tunnel: Dynamic Port Forwarding

With a local or remote SSH tunnel the ports on both the local and remote machines must be specified at the time of creating the tunnel. But what if you need something more flexible? That’s where Dynamic Port Forwarding comes into play.

Read More →

SSH Tunnel: Remote Port Forwarding

A tunnel with large yellow earth-moving equipment.

Local and remote SSH tunnels serve the same fundamental purpose: they make it possible to securely send data across an unsecured network. The implementation details are subtly different though. A local SSH tunnel acts like a secure bridge from a local machine to a remote server. It’s ideal for accessing services on the remote server which aren’t publicly exposed. Conversely, a remote SSH tunnel reverses this direction, forwarding traffic from the remote server back to a local machine (or another machine).

The critical distinction between the two is the direction of the connection between the remote and local machines.

Read More →

SSH Tunnel: Local Port Forwarding

A tunnel with large yellow earth-moving equipment.

SSH tunnels are a powerful and secure method for transmitting data over potentially unsecured networks. They allow users to establish an encrypted connection between their local machine and a remote server, providing a secure and private pathway for data. An SSH tunnel will allow a service running on a remote machine to appear as if it is running on a local machine. This is also known as port forwarding.

Read More →

Static Redirects on Vercel

Moored boats in an art deco style.

A redirect is a rule which sends users to a different URL than the one they requested. They are most commonly used to ensure that browsers still get to the correct page after it has been moved to a new URL.

If you have a relatively small number of redirects and don’t need to do anything too fancy then static (or “configuration”) redirects are a good option. Static redirects are configured on Vercel by adding entries to the vercel.json configuration file. There’s just one major snag: you can only create 1024 redirects using this mechanism.

Read More →

Batch Resolving Merge Conflicts

A surrealistic image of the confluence between two rivers.

Sometimes when you run git merge you will be confronted with a huge load of merge conflicts. However, if you are lucky there might be a clear rule which you can apply to each of those conflicts, either

  • accept current change (change on current branch or ours) or
  • accept incoming change (incoming change from other branch or theirs).

In this case you can save yourself a lot of time and effort by specifying a particular merge strategy option.

Read More →