Andrew B. Collier / @datawookie


Social links and a link to my CV.

Public datasets:


Custom 404 Page

Month of Gatsby
Custom 404 Page

Setting up a custom 404 page can add something special to your site. It provides you with the opportunity to do something memorable in the unfortunate event that a user asks for an unknown page.

Read More →

Cookies & Headers from Selenium

Cookies & Headers from Selenium

One of my standard approaches to scraping content from a dynamic website is to diagnose the API behind the site and then use it to retrieve data directly. This means that I can make efficient HTTP requests using the requests package and I don’t need to worry about all of the complexity around scraping with Selenium. However, it’s often the case that the API requests require a collection of cookies and headers, and those need to be gathered using Selenium.

Read More →

Adding robots.txt to a Gatsby Site

Adding robots.txt to a Gatsby Site

There are a couple files which can have an impact on the SEO performance of a site: (1) a sitemap and (2) a robots.txt. In a previous post we set up a sitemap which includes only the canonical pages on the site. In this post we’ll add a robots.txt.

A Gatsby site will not have a robots.txt file by default. There’s a handy package which makes it simple though. We’ll take a look at how to add it to the site and a couple of ways to configure it too.

Read More →

Update Sitemap for Canonical Pages

Update Sitemap for canonical pages.

The principal purpose of a sitemap file is to inform search engines about the pages on a website that are available for crawling. It provides a list of URLs along with additional metadata about each URL to help search engines more intelligently crawl the site. If there are multiple page versions on a site then the sitemap should include only the canonical versions of those pages.

Read More →

Gatsby Site Versions

Month of Gatsby
Gatsby Site Versions

We’re now going to bring together what we have been building in the previous two blog posts. First we added the raw AsciiDoc source into the GraphQL schema. Next we used AsciiDoc preprocessor directives to include conditional content into the rendered content pages. Specifically, we conditionally included content on pages depending on the value of a version attribute which was dynamically inserted into the raw AsciiDoc front matter. Now we are going to set up a URL structure which includes a version number and list the available documentation versions from the landing page.

Suppose that you have a product which is undergoing rapid development. Each new release of the product is assigned a unique version number. The product documentation is diligently updated in line with the evolving product. Ideally the documentation should be consistent with the latest release of the product. However, not all of your users will be using the latest version, so they should also be able to access older versions of the documentation.

Read More →

Gatsby Page Ordering

Gatsby Page Ordering

It’s often the case that we want pages on a site to be presented in a specific order. It’s possible to do this systematically by sorting on some existing aspect of the content (for example, sort alphabetically by page title) or by introducing a page attribute that’s specifically intended for sorting.

Read More →

Gatsby Redirects

Month of Gatsby
Gatsby Redirects

Redirects instruct web browsers to automatically reroute from one URL to another. They are especially vital when website structures change, pages get deleted, or content moves to a new location. Whether you’re rebranding, restructuring, or simply optimizing your site’s user experience, Gatsby offers powerful tools for handling redirects seamlessly. In this post, we’ll delve into the intricacies of implementing and managing redirects with Gatsby, ensuring your visitors always land in the right place.

Read More →

Adding a Sitemap with Gatsby

Gatsby banner image.

A sitemap serves as a navigational blueprint for search engines, ensuring they can efficiently crawl and index all essential pages of a website. By providing a structured list of URLs, a sitemap streamlines the discoverability of content, especially in complex or extensive sites. This not only optimizes search engine ranking and visibility but also ensures that any updates or new content additions are promptly recognized and indexed, thereby enhancing the site’s overall accessibility and user experience.

Read More →

Gatsby Starter Project

Gatsby banner image.

Gatsby is a modern, fast framework for building optimized, high-performance websites. It’s a static site generator that compiles a site into static files at build time. Under the hood it uses React (user interface library) and GraphQL (data query language).

Compared with tools like WordPress or Joomla, Gatsby feels a lot more technical and less user-friendly. The learning curve is steeper and it takes longer to get things set up. However, the reward is more flexibility and granular control over all aspects of the site.

This post runs through the steps for setting up a minimal Gatsby site.

Read More →

Why Do Sports Odds Change?

Why Do Sports Odds Change?

Many sports trading strategies hinge on odds changing over time. For instance, a strategy might involve laying a market at lower odds, anticipating the opportunity to back it at higher odds later on. Conversely, one might back a market at higher odds, hoping to lay it at lower odds in the future. Some strategies work with short term odds fluctuations, while others depend on longer term odds variations.

In this post I’ll take a look at some examples of odds dynamics and unpack why the odds change.

Read More →

Undetected ChromeDriver with noVNC

Undetected ChomeDriver with noVNC.

In a previous post I wrote about an Undetected ChromeDriver Docker image. A container derived from that image exposed a view of the Chrome session via VNC on port 5900. This worked really well. However, it meant having yet another app (the VNC client) running on my already cluttered desktop. I have extended the Docker image to use noVNC which means that I can now view the Chrome session via a web browser. This is very convenient since I always have a browser running.

Read More →