Blog Posts by Andrew B. Collier / @datawookie

{filebin} Quick & Easy File Sharing

At Fathom Data we have a number of workflows that require us to share various bits of data for a short time. The data are not sensitive, so we can freely share them. We have been doing this manually via platforms like Google Drive, Box or Dropbox. However we need to remember to go back and delete the file some time later. This is not ideal. What we needed was a simple “fire and forget” solution which would allow us to share the files and they would disappear automatically after some time. Well, this is precisely what Filebin does.

Read More →

{binance} Dealing with Dust

Dust refers to the fragments of coins which are too small to use for transactions. In the fiat world the equivalent would be those worthless coins with too little value to actually buy anything, that take up space in your wallet and end up scattered across parking areas. Binance allows you to convert dust into BNB. In this post I discuss the functions in {binance} which support this operation. Read More →

{binance} Tracking Total Account Balance

I started dabbling in crypto trading on Binance at the beginning of September 2021. I am really impressed with the interface, which is smooth and full featured (if perhaps a little complicated and confusing!). One of the things that has frustrated me though is not being able to get an idea of whether I’m making progress. There’s no view which shows me the overall status of my account and how this has evolved over time.

Read More →

HCRIS Field Labels

Fathom Data has been doing a lot of work with the HCRIS (Healthcare Cost Report Information System) data. The underlying reports are submitted as a spreadsheet with multiple sheets. The data are then extracted and recorded in a simple tabular format, with each field linked to a worksheet code (wksht_cd), column number (clmn_num) and line number (clmn_num). These three keys are then mapped to a single compound key. Read More →

{emayili} Message Threads

Being able to view related messages as threads is really useful. To make this possible, messages must use either the In-Reply-To or References header field to link to the Message-ID from another message.

This is now possible in {emayili}.

Read More →

{emayili} Message Precedence

Sometimes you need to have a message delivered immediately. Other times it doesn’t matter when it’s delivered. Similarly, you might want the recipient to read a message right now! Or you may not really care when they read it. To address both of these scenarios I have added the ability to specify message priority and importance in {emayili}. library(emayili) packageVersion("emayili") [1] '0.6.1' Importance The Importance header specifies how important a message is (surprise! Read More →

{emayili} Message Integrity

How can you be sure that the contents of an email haven’t been tampered with? The best approach would probably be to have a digital signature on each component of the message. Perhaps I’ll look at integrating that into {emayili} some time in the future. However, today I’m writing about the first step in that direction: MD5 checksums.

Read More →

Working with Fairly Wide Data

The concept of “wide data” is relative. In some domains 100 columns is considered “wide”, while in others that’s perfectly normal and you’d need to have thousands (or tens of thousands!) of columns for it to be considered even remotely “wide”. The data that we work with at Fathom Data generally lies in the first domain, but from time to time we do work on data that is considerably wider.

Read More →

Medusa: A Multi-Headed Tor Proxy

At Fathom Data we have a few projects which require us to send HTTP requests from an evolving selection of IP addresses. This post details the Medusa proxy docker image which uses Tor (The Onion Router) as a proxy. What is a Proxy Server? A proxy server acts as an intermediary between a client and a server. When a request goes through a proxy server there is no direct connection between the client and the server. Read More →