Andrew B. Collier / @datawookie

Social links and a link to my CV.

Public datasets:

British Canoeing Results

Day 28: Hypothesis Tests

2015-10-05 Julia Month of Julia

It’s all very well generating myriad statistics characterising your data. How do you know whether or not those statistics are telling you something interesting? Hypothesis Tests. To that end, we’ll be looking at the HypothesisTests package today.

Day 27: Distributions

2015-10-02 Julia Month of Julia

Today I’m looking at the Distributions package.

Day 26: Statistics

2015-10-01 Julia Month of Julia

Read More →

Day 25: Interfacing with Other Languages

2015-09-30 Julia R Python Month of Julia

Julia has native support for calling C and Fortran functions. There are also add on packages which provide interfaces to C++, R and Python. We’ll have a brief look at the support for C and R here. Further details on these and the other supported languages can be found on GitHub.

Day 24: Graphs

2015-09-29 Julia Month of Julia

Read More →

Day 23: Data Structures

2015-09-28 Julia Month of Julia

Read More →

Day 22: Optimisation

2015-09-25 Julia Month of Julia

Sudoku-as-a-Service is a great illustration of Julia’s integer programming facilities. Julia has several packages which implement various flavours of optimisation: JuMP, JuMPeR, Gurobi, CPLEX, DReal, CoinOptServices and OptimPack. We’re not going to look at anything quite as elaborate as Sudoku today, but focus instead on finding the extrema in some simple (or perhaps not so simple) mathematical functions. At this point you might find it interesting to browse through this catalog of test functions for optimisation.

Day 21: Differential Equations

2015-09-24 Julia Month of Julia

Read More →

Day 20: Calculus

2015-09-23 Julia Month of Julia

Read More →

Day 19: Units of Measurement

2015-09-22 Julia Month of Julia

Read More →

Day 18: Plotting

2015-09-21 Julia Month of Julia

There’s a variety of options for plotting in Julia. We’ll focus on those provided by Gadfly and Plotly.

PhysicalConstants.jl: Julia Package of Physical Constants

2015-09-21 Julia

PhysicalConstants is a Julia package which has the values of a range of physical constants. Currently MKS and CGS units are supported.

Day 17: Datasets from R

2015-09-18 Julia R Month of Julia

R has an extensive range of builtin datasets, which are useful for experimenting with the language. The RDatasets package makes many of these available within Julia. We’ll see another way of accessing R’s datasets in a couple of days’ time too. In the meantime though, check out the documentation for RDatasets and then read on below.

Day 16: Databases

2015-09-17 Julia Month of Julia

Read More →

Setting up ODBC for SQLite on Ubuntu

2015-09-17 Linux SQLite

First install the SQLiteODBC and unixODBC packages. Have a quick look at the documentation for unixODBC and SQLiteODBC.

Day 15: Time Series

2015-09-16 Julia Month of Julia

Read More →

Day 14: DataFrames & DataArrays

2015-09-15 Julia Month of Julia

Read More →

urlshorteneR: A package for shortening URLs

2015-09-14 R

This is a small package I put together quickly to satisfy an immediate need: generating abbreviated URLs in R. As it happens I require this functionality in a couple of projects, so it made sense to have a package to handle the details. It’s not perfect but it does the job. The code is available from GitHub along with vague usage information.

In essence the functionality is simple: first authenticate to shortening service (goo.gl and Bitly are supported at present) then shorten or expand URLs as required. The {longurl} package will perform the latter function too, possibly with greater efficiency.

Day 13: Packages

2015-09-14 Julia Month of Julia

Read More →

Day 12: Parallel Processing

2015-09-11 Julia Month of Julia

The previous post looked at metaprogramming in Julia, considering how to write code that will generate or modify other code. Today’s post considers a somewhat less esoteric, yet powerful topic: Parallel processing.

As opposed to many other languages, where parallel computing is bolted on as an afterthought, Julia was designed from the start with parallel computing in mind. It has a number of native features which lend themselves to efficient implementation of parallel algorithms. It also has packages which facilitate cluster computing (using MPI, for example). We won’t be looking at those, but focusing instead on coroutines, generic parallel processing and parallel loops.

Day 11: Metaprogramming

2015-09-10 Julia Month of Julia

Read More →

A ggplot2 oddity

2015-09-10 R

Read More →

Day 10: Modules

2015-09-09 Julia Month of Julia

Read More →

Day 9: Input/Output

2015-09-08 Julia Month of Julia

Your code won’t be terribly interesting without ways of getting data in and out. Ways to do that with Julia will be the subject of today’s post.

Console IO

Direct output to the Julia terminal is done via print() and println(), where the latter appends a newline to the output.

julia> print(3, " blind "); print("mice!\n")
3 blind mice!
julia> println("Hello World!")
Hello World!

Terminal input is something that I never do, but it’s certainly possible. readline() will read keyboard input until the first newline.

Day 8: Iteration, Conditionals and Exceptions

2015-09-07 Julia Month of Julia

Yesterday I had a look at Julia’s support for Functional Programming. Naturally it also has structures for conventional program flow like conditionals, iteration and exception handling.

Day 7: Functional Programming

2015-09-06 Julia Month of Julia

An earlier post looked at how to work with functions in Julia. This time we’ll dig into Functional Programming, an approach to coding which, not surprisingly, depends on writing functions.

Day 6: Composite Types

2015-09-05 Julia Month of Julia

Read More →

Day 5: Collections

2015-09-04 Julia Month of Julia

Read More →

Day 4: Functions

2015-09-03 Julia Month of Julia

Benchmarks of various Programming Languages.

Julia performs Just-in-Time (JIT) compilation using a Low Level Virtual Machine (LLVM) to create machine-specific assembly code. The first time a function is called, Julia compiles the function’s source code and the results are cached and used for any subsequent calls to the same function. However, there are some additional wrinkles to this story.

Day 3: Variables and Data Types

2015-09-02 Julia Month of Julia

The previous post considered a selection of development environments for working with Julia. Now we’re going to look at a topic which is central to almost every programming task: variables.

Most coding involves the assignment and manipulation of variables. Julia is dynamically typed, which means that you don’t need to declare explicitly a variable’s data type. It also means that a single variable name can be associated with different data types at various times. Julia has a sophisticated, yet extremely flexible, system for dealing with data types. covered in great detail by the official documentation. My notes below simply highlight some salient points I uncovered while digging around.

Day 2: Development Environments

2015-09-01 Julia Month of Julia

Read More →

Day 1: Installation and Orientation

2015-08-31 Julia Month of Julia

As a long-term R user I’ve found that there are few tasks (analytical or otherwise) that R cannot immediately handle. Or be made to handle after a bit of hacking! However, I’m always interested in learning new tools. A month or so ago I attended a talk entitled Julia’s Approach to Open Source Machine Learning by John Myles White at ICML in Lille, France. What John told us about Julia was impressive and intriguing. I felt compelled to take a closer look. Like most research tasks, my first stop was the Wikipedia entry, which was suitably informative.

Shiny Bayesian Updates

2015-07-24 R Bayesian Shiny

Reading Bayesian Computation with R by Jim Albert (Springer, 2009) inspired a fit of enthusiasm. Admittedly, I was on a plane coming back from Amsterdam and looking for distractions. I decided to put together a Shiny app to illustrate successive Bayesian updates. I had not yet seen anything that did this to my satisfaction. I like to think that my results come pretty close.

Lightning on your Twitter Feed

2015-07-13

As an aside for a Social Media Automation project I have constructed a bot which uses data from the World Wide Lightning Location Network (WWLLN) to construct daily animated maps of global lightning activity and post them on my Twitter feed. The bot runs remotely and autonomously on an EC2 instance.

Constructing a Word Cloud for ICML 2015

2015-07-10 Conference

Word clouds have become a bit cliché, but I still think that they have a place in giving a high level overview of the content of a corpus. Here are the steps I took in putting together the word cloud for the International Conference on Machine Learning (2015).

ICML 2015 (Lille, France): Day 5 (Workshops)

2015-07-10 Conference

Read More →

ICML 2015 (Lille, France): Day 4

2015-07-10 Conference

Sundry notes from the fourth day of the International Conference for Machine Learning (ICML 2015) in Lille, France. Some of this might not be entirely accurate. Caveat emptor.

Celeste: Variational inference for a generative model of astronomical images (Jeffrey Regier, Andrew Miller, Jon McAuliffe, Ryan Adams, Matt Hoffman, Dustin Lang, David Schlegel, Prabhat)

Colour modelled as a 4 dimensional vector. The Physics (Planck’s Law) places some constraints on the components of these vectors. Light density model accounts for rotation as well as asymmetry of galactic axes.

ICML 2015 (Lille, France): Day 3

2015-07-08 Conference

Selected scribblings from the third day at the International Conference for Machine Learning (ICML 2015) in Lille, France. I’m going out on a limb with some of this, since the more talks I attend, the more acutely aware I become of my limited knowledge of the cutting edge of Machine Learning. Caveat emptor.

Adaptive Belief Propagation (Georgios Papachristoudis, John Fisher)

Belief Propagation describes the passage of messages across a network. The focus of this talk was Belief Propagation within a tree. The authors consider an adaptive algorithm and found that their technique, AdaMP, was significantly better than the current state of the art algorithm, RCTreeBP.

ICML 2015 (Lille, France): Day 2

2015-07-08 Conference

Some notes from the second day at the International Conference for Machine Learning (ICML 2015) in Lille, France. Don’t quote me on any of this because it’s just stuff that I jotted down during the course of the day. Also much of the material discussed in these talks lies outside my field of expertise. Caveat emptor.

Two Big Challenges in Machine Learning (Léon Bottou)

Machine Learning is an Exact Science. It’s also an Experimental Science. It’s also Engineering.

ICML 2015 (Lille, France): Day 1 (Tutorials)

2015-07-07 Conference

Started the day with a run through the early morning streets of Lille. This city seems to wake up late because it was still nice and quiet well after sunrise. Followed by a valiant attempt to sample everything on the buffet breakfast. I’ll know where to focus my attention tomorrow.

ICML2015

The first day of the International Conference on Machine Learning (ICML 2015) in Lille consisted of tutorials in two parallel streams. Evidently the organisers are not aware of my limited attention span because these tutorials each had nominal durations of longer than 2 hours, punctuated by a break in the middle.

Machine Learning with R Cookbook

2015-07-03 R Machine Learning book review

Cover of 'Machine Learning with R Cookbook'.

“Machine Learning with R Cookbook” by Chiu Yu-Wei is nothing more or less than it purports to be: a collection of 110 recipes for applying Data Analysis and Machine Learning techniques in R. I was asked by the publishers to review this book and found it to be an interesting and informative read. It will not help you understand how Machine Learning works (that’s not the goal!) but it will help you quickly learn how to apply Machine Learning techniques to you own problems.

Flashes from the Ashes: Volcanic Lightning

2015-07-03

Read More →

Excel: Copying with Relative Links

2015-06-26 Excel

Read More →

Disney: Quality over Quantity

2015-06-15

I read an interesting article in the 10 June 2015 edition of The Wall Street Journal. The graphic below concisely illustrates the focused strategy being adopted by Disney: a steady decline in the number of movies released per year accompanied by a steady ascent in operating margin.

R Recipe: RStudio and UNC Paths

2015-06-04 R

RStudio does not like Uniform Naming Convention (UNC) paths. This can be a problem if, for example, you install it under Citrix. The solution is to create a suitable environment file.

Hosting Shiny on Amazon EC2

2015-05-30 R AWS Shiny

I recently finished some work on a Shiny application which incorporated a Random Forest model. The model was stored in a .rda file and loaded by server.R during initialisation. This worked fine when tested locally but when I tried to deploy the application on shinyapps.io I ran into a problem: evidently you can only upload server.R and ui.R files. Nothing else.

Comrades Marathon Medal Predictions

2015-05-28 R running Shiny

With only a few days to go until race day, most Comrades Marathon athletes will focusing on resting, getting enough sleep, hydrating, eating and giving a wide berth to anybody who looks even remotely ill.

They will probably also be thinking a lot about Sunday’s race. What will the weather be like? Will it be cold at the start? (Unlikely since it’s been so warm in Durban.) How will they feel on the day? Will they manage to find their seconds along the route?

R Recipe: Aligning Axes in ggplot2

2015-05-27 R

Faceted plots in ggplot2 are phenomenal. They give you a simple way to break down an array of plots according to the values of one or more categorical variables. But what if you want to stack plots of different variables? Not quite so simple. But certainly possible. I gathered together this solution from a variety of sources on Stack Overflow, notably this one and this other one. A similar issue for vertical alignment is addressed here.

R Recipe: Reordering Columns in a Flexible Way

2015-05-16 R

Read More →

Recent Common Ancestors: Simple Model

2015-05-15 R

An interesting paper (Modelling the recent common ancestry of all living humans, Nature, 431, 562–566, 2004) by Rohde, Olson and Chang concludes with the words:

1
2
3
9
10
11
12