Andrew B. Collier / @datawookie

{binance} Spot Trading: Liquidity

2021-11-08 {binance} R

In previous posts we looked at creating market orders and limit orders with {binance}. We saw a couple of successful trades. However, sometimes trades are not successful and the orders are not filled. Let’s try to understand why.

Read More →

Accessing Virtual Memory from a Docker Container

2021-11-06 Docker

Memory is something I generally don’t worry about when working with Docker. It just works. This is great… but what happens when it doesn’t?

Read More →

{binance} Spot Trading: Limit Orders

2021-11-05 {binance} R

In the previous post we looked at creating market orders on Binance using the {binance} package. Today we’re going to dig into limit orders.

Read More →

{binance} Spot Trading: Market Orders

2021-11-01 {binance} R

Functionality for working with spot trades is now available in {binance}. In this post we’ll establish some background on spot trading and then explore some related functions.

Read More →

{binance} Dealing with Dust

2021-10-27 {binance} R

Dust refers to the fragments of coins which are too small to use for transactions. In the fiat world the equivalent would be those worthless coins with too little value to actually buy anything, that take up space in your wallet and end up scattered across parking areas.

Binance allows you to convert dust into BNB. In this post I discuss the functions in {binance} which support this operation.

I’ve got a bit of dust in my wallet.

Read More →

{binance} Tracking Total Account Balance

2021-10-26 {binance} R

I started dabbling in Crypto trading on Binance at the beginning of September 2021. I am really impressed with the interface, which is smooth and full featured (if perhaps a little complicated and confusing!). One of the things that has frustrated me though is not being able to get an idea of whether I’m making progress. There’s no view which shows me the overall status of my account and how this has evolved over time.

Read More →

HCRIS Field Labels

2021-10-19 {pdftools} R

Fathom Data has been doing a lot of work with the HCRIS (Healthcare Cost Report Information System) data. The underlying reports are submitted as a spreadsheet with multiple sheets. The data are then extracted and recorded in a simple tabular format, with each field linked to a worksheet code (wksht_cd), column number (clmn_num) and line number (clmn_num). These three keys are then mapped to a single compound key. The resulting data look something like this:

Read More →

{emayili} Message Threads

2021-10-18 {emayili} R

Being able to view related messages as threads is really useful. To make this possible, messages must use either the In-Reply-To or References header field to link to the Message-ID from another message.

This is now possible in {emayili}.

Read More →

{emayili} Support for Gmail, SendGrid & Mailgun

2021-10-15 {emayili} R

The {emayili} package supports configuring a generic SMTP server via the server() function. In the most recent version, v0.6.5, we add three new functions, gmail(), sendgrid() and mailgun(), which provide specific support for Gmail, SendGrid and Mailgun.

Read More →

Fathoming Email Headers

2021-10-11 email

If you pull back the curtain and take a look at what a naked email looks like, then you might be astonished.

Read More →

{emayili} Message Precedence

2021-10-06 {emayili} R

Sometimes you need to have a message delivered immediately. Other times it doesn’t matter when it’s delivered. Similarly, you might want the recipient to read a message immediately. Or you may not really care when they read it. The ability to specify message priority and importance in {emayili} has been added to address both scenarios.

library(emayili)

packageVersion("emayili")

[1] '0.6.1'

Importance

The Importance header specifies how important a message is (surprise!). It reflects how important the sender thinks the message is, which might not necessarily agree with the recipient’s opinion. According to RFC 4021 this (optional) field can assume one of three values: low, normal or high.

Read More →

{emayili} Message Integrity

2021-10-05 {emayili} R

How can you be sure that the contents of an email haven’t been tampered with? The best approach would probably be to have a digital signature on each component of the message. Perhaps I’ll look at integrating that into {emayili} some time in the future. However, today I’m writing about the first step in that direction: MD5 checksums.

Read More →

Working with Fairly Wide Data

2021-10-04 csvkit CSV SQL R

The concept of “wide data” is relative. In some domains 100 columns is considered “wide”, while in others that’s perfectly normal and you’d need to have thousands (or tens of thousands!) of columns for it to be considered even remotely “wide”. The data that we work with at Fathom Data generally lies in the first domain, but from time to time we do work on data that is considerably wider.

Read More →

Medusa: A Multi-Headed Tor Proxy

2021-10-01 web scraping proxy Tor

At Fathom Data we have a few projects which require us to send HTTP requests from an evolving selection of IP addresses. This post details the Medusa proxy docker image which uses Tor (The Onion Router) as a proxy.

What is a Proxy Server?

A proxy server acts as an intermediary between a client and a server. When a request goes through a proxy server there is no direct connection between the client and the server. The client connects to the proxy and the proxy then connects to the server. Requests and responses pass through the proxy.

Read More →

{emayili} Right-to-Left

2021-09-24 Markdown {emayili} R

Yoav Raskin suggested that it would be useful to support right-to-left (RTL) text in {emayili}, so that languages like Hebrew, Arabic and Aramaic would render properly. I’ll be honest, this was not something that I had previously considered. But agreed, it would be a cool feature.

Read More →

{emayili} Styling Figures

2021-09-23 Markdown {emayili} R

By default <img> tags are wrapped in a tight <p></p> embrace by {knitr}. In general this works really well. However, I want to have more control over image formatting for {emayili}.

Read More →

{emayili} Managing CSS

2021-09-22 Markdown {emayili} R

I love the clean simplicity of an R Markdown document. But sometimes it can feel a little bare and utilitarian. This is especially the case if it’s rendered into the body of an email. How about injecting a little more pizzazz?

Read More →

{emayili} R Markdown Parameters

2021-09-20 Markdown {emayili} R

I don’t frequently use parameters in R Markdown documents. The initial implementation of render() in {emayili} did not cater for them. A small tweak makes it possible though.

Read More →

{emayili} Rendering R Markdown

2021-09-17 Markdown {emayili} R

In a previous post I documented a new feature in {emayili}, the ability to render Plain Markdown directly into the body of an email message.

Read More →

{emayili} Rendering Plain Markdown

2021-09-10 Markdown {emayili} R

We’ve been able to attach text and HTML content to messages with {emayili}. But something that I’ve really been wanting to do is render Markdown directly into an email.

In version 0.4.19 I’ve added the ability to directly render Plain Markdown into a message. That version is not on CRAN, so you’ll need to install from GitHub.

Read More →

{clockify} Time Tracking from R

2021-09-09 {clockify} R

At Fathom Data we use Clockify to keep detailed records of the time that we spend working on our clients’ projects. Up until fairly recently we manually generated timesheets at the end of each month that were sent through to the clients along with their invoices. Our experience has been that providing detailed timesheets helps foster trust and transparency. However, with a growing team and an expanding clientele, generating these timesheets has become progressively more laborious. Time to automate!

Read More →

Setting up a Tiny HTTP Proxy

2021-09-08 AWS proxy

It’s often handy to have access to a HTTP proxy. I use this recipe from time to time to quickly fling together a proxy server which I can use to relay HTTP requests from a different origin.

Read More →

Pre-Commit Hook for Processing README.Rmd

2021-09-05 R Git

When writing an R package I usually create a README.Rmd file that I render to README.md. I use {pkgdown} to then create documentation. I run the last step via CI, so once it’s set up I never need to think about it again.

The problem is that I regularly forget to process the README.Rmd file, which means that despite keeping that up to date, everything else lags behind.

What if I automated the process? I created a simple pre-commit hook which processes README.Rmd whenever I make a commit and automatically adds any changes to the commit.

Read More →

{emayili} Interpolating Message Content

2021-09-03 {emayili} R

A small new feature added to {emayili}: the ability to interpolate content into the email message body.

Read More →

Websockify with Token Target Selection

2021-08-22 NGINX noVNC Websockify

In a previous post I looked at how to set up Websockify behind an NGINX proxy. The ultimate goal was to accommodate multiple simultaneous users. Although the setup in that post worked, if the number of users is large then it becomes very resource hungry because there’s a Websockify instance running for each user.

Read More →

{emayili} Rudimentary Email Address Validation

2021-08-21 {emayili} R

A recent issue on the {emayili} GitHub repository prompted me to think a bit more about email address validation. When I started looking into this I was somewhat surprised to learn that it’s such a complicated problem. Who would have thought that something as apparently simple as an email address could be linked with such complexity?

Read More →

Old ‘Hood, New ‘Hood

2021-08-11 {geosphere} {ggmap} spatial R

Image adapted from the cover of 'Old Hat New Hat' by Dr Seuss.

I recently moved from suburban South Africa to rural England. I’m figuring out my new environment. Making some maps seemed to be a good way to get familiar with the surroundings.

In the process I wanted to figure out two things:

how to get maps with a consistent aspect ratio at different latitudes; and
how to overlay a partially transparent map layer.

To make things more interesting I’ll create maps of both my old and new locations.

Read More →

Websockify & noVNC: Adding SSL

2021-08-08 NGINX noVNC Websockify

If you’re going to be exposing noVNC on the (public) internet, then it’s vital that you take some security measures. You should install a suitable SSL certificate and serve noVNC via HTTPS rather than HTTP. Getting that all up and running can be moderately tricky. Here’s a quick recipe to get a minimal setup working.

Read More →

Websockify & noVNC behind an NGINX Proxy

2021-08-07 NGINX noVNC Websockify

At Fathom Data we are developing a framework which will enable remote access to Linux desktops via a browser. There’s nothing new to this idea. However, we have a very specific application in mind, so we need to roll our own solution. Importantly, there need to be multiple independent connections catering for a group of users. In this post I’ll show how we used the following tools to make this possible:

Read More →

Creating an AMI using the AWS CLI

2021-07-31 AWS

This post describes the process of building a custom AMI (Amazon Machine Image) using the AWS CLI. The goal is to automate the entire process, making it completely repeatable.

Read More →

TomTom Traffic

2021-07-27 {tomtom} R

In the previous post I introduced the {tomtom} package and showed how it can be used for geographic routing. Now we’re going to look at the traffic statistics returned by the TomTom API.

Read More →

TomTom Routing

2021-07-26 spatial {tomtom} R

While working with the Google Mobility Data I stumbled upon the TomTom Traffic Index. I then learned that TomTom has a public API which exposes a bunch of useful and interesting data.

Seemed like another opportunity to create a smaller R package. Enter {tomtom}.

{tomtom} Package

The {tomtom} package can be found here.

Install the package.

remotes::install_github("datawookie/tomtom")

Load the package.

library(tomtom)

API Key

Getting a key for the API is quick and painless. I stored mine in the environment then retrieved it with Sys.getenv().

Read More →

Fixing Truncated Logs on Gitlab CI/CD

2021-07-24 GitLab CI

I’ve got a few CI/CD jobs running on GitLab that produce long logs, which in turn get truncated. Since the most interesting stuff normally happens towards the end of the logs (like errors!), this can be really counter-productive.

Job's log exceeded limit of 4194304 bytes.

There’s a fundamental problem with this though: if something’s going to break then it’s inevitably going to happen after the logs have been truncated so I won’t be able to actually see what’s broken.

Read More →

Mobility & Unrest in South Africa

2021-07-24 {saffer} {mobility} R

Did the recent unrest in South Africa have a detectable effect on mobility patterns?

Read More →

SSH Tunnel from Docker

2021-06-25 SSH Docker

I’m building a crawler which I’m going to wrap up in a Docker image. The crawler writes data to a remote MySQL database. However, there’s a catch: the database connection is via an SSH tunnel. Another wrinkle: the crawler is going to be run on ECS, so the whole thing (including setting up the SSH tunnel) needs to be baked into the Docker image.

This post illustrates the process of connecting to a remote MySQL data via a SSH tunnel from Docker. I’m not sure how secure this is. And there are probably better ways to do this. But it’s a start and it works!

Read More →

Shiny on ECS

2021-06-24 Shiny ECS R

A recipe for setting up a simple Shiny app on ECS.

Read More →

Adding Swap Space on Ubuntu

2021-06-10 Linux

Most people running a Linux system would agree that you should set up swap. According to the poll below, only 28% believe that no swap is required. And I think that they are misguided. Always put some swap on your system. You’ll never regret it.

Read More →

Scrapy with a Rotating Tor Proxy

2021-06-09 proxy Tor Scrapy Python Docker

This post shows an approach to using a rotating Tor proxy with Scrapy.

I’m using the scrapy-rotating-proxies download middleware package to rotate through a set of proxies, ensuring that my requests are originating from a selection of IP addresses. However, I need to have those IP addresses evolve over time too, so I’m using the Tor network.

Setup

I’ve got the following in the settings.py for my Scrapy project:

Read More →

RAM & CPU Requirements for a Selenium Crawler

2021-06-04 ECS AWS Selenium web scraping

How much memory and CPU resources should be allocated to a simple Selenium crawler? I’ve been fudging these parameters but the time has come to man up and do this right.

I want my task to have sufficient resources that it’s able to perform its function. It should never be starved of resources! But, at the same, I also don’t want to extravagantly allocate excess resources. More resources → higher costs. I want to allocate the minimal resources to get the job done.

Read More →

Shiny Inception: JavaScript in Rendered Markdown

2021-06-03 Shiny R

I’m busy helping a colleague with a Shiny application. The application includes HTML content rendered from a .Rmd document. However, there’s a catch: the .Rmd uses the {DT} package to render a dynamic DataTable. It turns out that this doesn’t immediately work because the JavaScript in the embedded document isn’t run.

I’ll use a simple document and application structure to illustrate the problem.

Static Document

Let’s start with a .Rmd document which renders two different static views of the Palmer Archipelago (Antarctica) Penguin Data.

Read More →

Building an Airflow Environment in Docker

2021-05-31 Docker Airflow Ubuntu

We’re developing some training about Apache Airflow and need to have a robust and portable environment for running demos and labs which we can make available to the class. This will reduce the frustration and time wasted getting everybody set up and ensure that everybody is working in the same environment.

Read More →

Desktop in Docker

2021-05-30 Docker Ubuntu

We’re building a new training program around Apache Airflow. The major technical challenge with delivering this sort of program is ensuring that everybody in the class has access to a working version of the technology. Since there is generally a diverse range of setups (operating systems, corporate firewalls and personal configurations) this can really be a nightmare.

Read More →

Using {pagedown} in Docker

2021-05-28 Docker {pagedown} R

I’m building an automated reporting system which generates PDF reports. My approach is to use R Markdown to write the report and render to PDF using the excellent {pagedown} package.

Read More →

AWS EC2: Setting up a Load Balancer

2021-05-09 EC2 AWS

An Application Load Balancer receives requests and distributes them across a selection of processing resources. These processing resources are divided into Target Groups (see previous post for how to set one up).

Creating an Application Load Balancer

We’re setting up a Flask API which is deployed as a Docker image and running on ECS. We’re going to create a load balancer which will accept requests on port 80 and route them to port 5000 on the API container.

Read More →

AWS EC2: Creating a Target Group

2021-05-08 EC2 AWS

If we want to have an ECS service which is visible to the public, then we need to set up an Application Load Balancer. There are a couple of steps to this process, the first of which is creating a Target Group.

Read More →

AWS Containers #8: Setting up a Service

2021-05-06 ECS AWS

In this post we’ll look at setting up an ECS Service. A service is just a persistent set of tasks running on a cluster.

Read More →

AWS EC2: Security Groups

2021-05-02 EC2 AWS

A security group is a collection of rules which control the traffic into and out of an AWS resource.

Read More →

AWS Containers #4: Dependencies

2021-04-28 ECS AWS

We saw in a previous post that it’s important to ensure that the Selenium container is running and accepting requests before the crawler actually gets started. This is because the crawler depends on Selenium being available. We can use ECS task dependencies to assert this dependency.

Read More →

AWS Containers #5: Health Checks

2021-04-28 ECS AWS

Can we create a health check that will check if the Selenium service is available? Yes! We will need to do two things:

tell the crawler container to wait for the Selenium container to be HEALTHY and
add a health check to the Selenium container.

Let’s do it!

Read More →

AWS Containers #3: Image on ECR

2021-04-27 Docker ECR AWS

In the previous post we saw how to deploy a simple Selenium crawler on ECS. However, the Docker image for the crawler was stored in a Docker Hub repository. Now we’re going to see how to use the AWS Elastic Container Registry (ECR) instead.

Read More →