Blog Posts by Andrew B. Collier / @datawookie


Kayak Stability

The stability of a kayak is intimately linked to its potential speed. A faster boat will generally be less stable. A more stable boat will normally be slower. But, of course, the skill of the paddler will also influence the likelihood of taking a swim.

Read More →

Column Order: Inheritance & Declarative Base

I prefer to have my primary key columns first in a table. I recognise that column order is irrelevant to the performance of the table, but I prefer this for personal aesthetic reasons. However, from SQLAlchemy 2.0.0 there’s a change in the way that column order works with inherited base classes.

Read More →

Using mailmap to Tidy Git Contributors

Do you ever contribute to a Git repository from different machines? Yeah, you probably do. Sometimes you’re on your work machine. Other times you’re on your personal laptop. Or your gaming desktop. And you might have a different Git identity on each of those. And this means that your Git log ends up looking a bit messy. Who are all of these people with similar names but different email addresses? A .mailmap file can be used to tidy things up.

Read More →

Developing a Gatsby Site with Docker

Getting Gatsby installed and running can be a challenge. With older versions of Ubuntu I have fought extensively with Node package versions. Docker seems to be a natural solution. This post shows how to build and run a simple Docker image for serving a development Gatsby site.

Read More →

Configuring BASH History

If you use BASH, then you’re probably already using the command history. BASH history allows you to access a list of previous commands executed in the shell. It can make you more productive and efficient: do more and do it quicker.

The default configuration of BASH history will suit most purposes. But, like most things in the Linux universe, it’s possible to tweak that configuration to suit your specific requirements. In this post I’ll present some of those configuration options.

Read More →

Chrome DevTools Protocol & Selenium

Do you do any web scraping? If so, then you probably spend a lot of time scratching around in your browser’s Developer Tools, figuring out the DOM structure and understanding how various bits of a site are delivered. Wouldn’t it be cool to access the Developer Tools functionality from inside your scraper? Well, you can. The Chrome DevTools Protocol (CDP) provides a low-level interface for interacting with Chrome. And you can tap into that interface via Selenium.

Read More →

Undetected ChromeDriver: Stay Below the Radar

There’s one major problem with ChromeDriver: anti-bot services are able to detect that a browser session is being automated (as opposed to being used by a regular meat sack) and will often impose restrictions or deny connections altogether. The Undetected ChromeDriver (undetected-chromedriver) Python package is a patched version of ChromeDriver which avoids triggering a selection of anti-bot services, allowing it to glide under the anti-bot radar.

Read More →

{pagedown} Page Size & Margins

At Fathom Data we have been doing a lot of automated documentation and automated reporting. Although many of these documents are rendered to HTML, there’s an increasing demand for PDF documents. So we’ve had to raise out game in that department. The {pagedown} package has become invaluable. This is a short note showing how we tweak the page size and margins for PDF documents.

Read More →

Scaling Density Plots

I’m a density plot devotee. And, using geom_density() from {ggplot2} these plots are effortless to produce. However, sometimes the results of geom_density() are not exactly what I’m after. Here’s how I tweak them to give me precisely what I need.

Read More →

Vertically Align Image & Text

I’m not a web developer. However, I do regularly use HTML and CSS to layout pages which are then transformed into PDF documents. One of the requirements that I encounter fairly often is vertically aligning images and text. Fairly often, but not often enough to remember the solution. So I inevitably have to rediscover the solution (thanks StackOverflow!). Jotting this down for posterity.

Read More →