Andrew B. Collier / @datawookie

JSON Payload for POST Request

2019-01-10 R

Starting with JSON body because this is the way that most API documentation will give you the payload examples.

Where does .Renviron live on Citrix?

2019-01-08 R

At one of my clients I run RStudio under Citrix in order to have access to their data.

For the most part this works fine. However, every time I visit them I spend the first few minutes of my day installing packages because my environment does not seem to be persisted from one session to the next.

I finally had a gap and decided to fix the problem.

Where are the packages being installed?

Installed packages just spontaneously disappear… That’s weird. Where are they being installed?

Read More →

Survey Raking: An Illustration

2018-12-26 R survey

Analysing survey data can be tricky. There’s often a mismatch between the characteristics of the survey respondents and those of the general population. If the discrepancies are not accounted for then the survey results can (and generally will!) be misleading.

Read More →

Citrix Receiver on Ubuntu

2018-12-14 Linux Citrix

There’s a Debian package available for Citrix Receiver, so in principle this task should be trivial.

It’s not.

Read More →

Scraping the Turkey Accordion

2018-12-12 R web scraping

One of the things I like most about web scraping is that almost every site comes with a new set of challenges.

The Accordion Concept

I recently had to scrape a few product pages from the site of a large retailer. I discovered that these pages use an “accordion” to present the product attributes. Only a single panel of the accordion is visible at any one time. For example, you toggle the Details panel open to see the associated content.

Read More →

RStudio & Shiny Servers with NGINX & SSL

2018-11-14 R Shiny

I fairly often set up servers to host both Shiny and RStudio servers. This is my recipe.

Read More →

Installing RStudio & Shiny Servers

2018-11-13 R Shiny

I did a remote install of Ubuntu Server today. This was somewhat novel because it’s the first time that I have not had physical access to the machine I was installing on. The server install went very smoothly indeed.

The next tasks were to install RStudio Server and Shiny Server. The installation process for each of these is well documented on the RStudio web site:

These are my notes. Essentially the same, with some small variations.

Read More →

Accessing Open Data from AWS

2018-11-04 AWS

There’s a magnificent variety of open data available on AWS. To see the full list, head over to the Registry of Open Data on AWS.

Read More →

Embedding Dependencies into a HTML File

2018-10-31 tool speaking

I use HTML to generate slide decks. Usually my HTML will reference a host of other files on my machine (CSS, JavaScript and images). If I want to distribute my deck then I have a couple of options:

just send the HTML file without all of the dependencies or
send the HTML file and dependencies (normally wrapped up in some sort of archive).

Both of these have problems. In the former case the HTML just ends up looking like ass because it relies on all of those dependencies to sort out the aesthetics. In the latter case I need to take care of the directory structure and, if those dependencies are distributed across my file system (which they generally are!) then this can be a challenge.

Read More →

DNS on Ubuntu

2018-10-25 Ubuntu

For years it’s been simple to set up DNS on a Linux machine. Just add a couple of entries to /etc/resolv.conf and you’re done.

Read More →

@pyconza (2018): Data Science and Bayes with Python

2018-10-15 Python conference

I’ve just returned from PyConZA (2018), held at the Birchwood Hotel in Boksburg North (Johannesburg) on 11-12 October. A great conference with a super selection of talks and great catering.

Obviously when the PyCon call for papers came out I was feeling ambitious because I submitted a Workshop and a Talk. They were both accepted, so that put the pressure on a bit.

Workshop

I gave a full day pre-conference workshop on 10 October entitled “Introduction to Python for Data Science”. In retrospect it would have been a better idea to call it “Introduction to Data Science using Python”.

Read More →

Docker Images for Spark

2018-09-28 Docker Spark

I recently put together a short training course on Spark. One of the initial components of the course involved deploying a Spark cluster on AWS. I wanted to have Jupyter Notebook and RStudio servers available on the master node too and the easiest way to make that happen was to install Docker and then run appropriate images.

There’s already a jupyter/pyspark-notebook image which includes Spark and Jupyter. It’s a simple matter to extend the rocker/verse image (which already includes RStudio server, the Tidyverse, devtools and some publishing utilities) to include the sparklyr package.

Read More →

MySQL Server Replication using Binary Logs

2018-09-17 MySQL

Suppose you want to create a replica of your MySQL database. The replica should:

start with a complete snapshot of the current (initial) state of the master database and
be updated with any changes to the master database.

This post will outline how MySQL server replication can be done using binary logs.

Read More →

DIY VPN with Docker

2018-09-11 Docker VPN

I’ve worked with both ExpressVPN and NordVPN. Both are great services but, from my perspective, have one major shortcoming: they’re currently blocked by Amazon Web Services (AWS). When using either of them you are simply not able to access any of the AWS services.

The most common scenario in which I’d be using a VPN is if I’m on a restrictive network where I’m only able to access web sites. Typically just ports 80, 8080 and 443 are open. Forget about SSH (port 22), SMTP (ports 25, 465 and 587) or NTP (port 123). I want to be able to connect by SSH to my AWS servers, send mail over SMTP and synchronise my clock. The latter items are normally possible over commercial VPN providers (like ExpressVPN and NordVPN) but not being able to connect to AWS is a deal breaker.

Read More →

Refining an AWS IAM Policy for Flintrock

2018-09-08 Spark AWS

Read More →

Diagnosing RStudio Startup Issues

2018-09-07 R

Yesterday I tried to start RStudio and something weird happened: the window launched but it was blank and unresponsive.

Read More →

Chairing a Conference Session

2018-08-09 speaking chairing conference

There are many factors which can determine the success of a conference: the location, the venue, the catering, the speakers, the social programme, the contents of the swag bag… However, in my opinion, one of the most important components of an enjoyable conference is a collection of competent chairpersons, for they will ensure that all aspects of the sessions (the very core of a conference!) run smoothly.

Read More →

Setup for using Stan with Julia

2018-07-25 Julia

I’m busy preparing a poster about Stan.jl for JuliaCon 2018. Getting set up is pretty simple, although there are some minor details that I thought I’d document.

Read More →

Updating R on Ubuntu

2018-07-09 R

Today I finally got around to updating my R to 3.5 (or, more specifically, 3.5.1). The complete instructions for doing the update on Ubuntu are available here. I’ve paraphrased them below.

Read More →

eRum (2018) Top Twenty

2018-05-18 R conference

My Top 20 highlights from eRum (2018) in Budapest.

Read More →