Andrew B. Collier / @datawookie


Link to CV.


Graph from Sparse Adjacency Matrix

I spent a decent chunk of my morning trying to figure out how to construct a sparse adjacency matrix for use with graph.adjacency(). I’d have thought that this would be rather straight forward, but I tripped over a few subtle issues with the Matrix package. My biggest problem (which in retrospect seems rather trivial) was that elements in my adjacency matrix were occupied by the pipe symbol.

Read More →

LIBOR and Bond Yields

I’ve just been looking at the historical relationship between the London Interbank Offered Rate (LIBOR) and government bond yields. LIBOR data can be found at Quandl and comes in CSV format, so it’s pretty simple to digest. The bond data can be sourced from the US Department of the Treasury. It comes as XML and requires a little more work.

treasury.xml = xmlParse('data/treasury-yield.xml')
xml.field = function(name) {
  xpathSApply(xmlRoot(treasury.xml), paste0('//ns:entry/ns:content//d:', name),
              function(x) {xmlValue(x)},
              namespaces = c(ns = 'https://www.w3.org/2005/Atom',
                             d = 'http://schemas.microsoft.com/ado/2007/08/dataservices'))
}
bonds = data.frame(
  date = strptime(xml.field('NEW_DATE'), format = '%Y-%m-%dT%H:%M:%S', tz = 'GMT'),
  yield_1m = as.numeric(xml.field('BC_1MONTH')),
  yield_6m = as.numeric(xml.field('BC_6MONTH')),
  yield_1y = as.numeric(xml.field('BC_1YEAR')),
  yield_5y = as.numeric(xml.field('BC_5YEAR')),
  yield_10y = as.numeric(xml.field('BC_10YEAR'))
)

Once I had a data frame for each time series, the next step was to convert them each to xts objects. With the data in xts format it was a simple matter to enforce temporal overlap and merge the data into a single time series object. The final step in the analysis was to calculate the linear coefficient, or beta, for a least squares fit of LIBOR on bond yield. This was to be done with both a 1 month and a 1 year moving window. Both of these could be achieved quite easily using rollapply() from the zoo package.

Read More →

Guy Kawasaki on Personal Branding

Kelsey Jones of Search Engine Journal interviews Guy Kawasaki of Canva. The key take-home message is that maintaining a personal brand is vital even if you are permanently employed. Specifically, it’s important to keep a visible record of who you have worked for and your personal successes.

I'm living proof. I did one thing right for Apple thirty years ago. I've been coasting ever since. Just need to do one thing really right. Guy Kawasaki

The quote above is, of course, tongue in cheek, but it bears a nugget of truth: showcase your achievements on LinkedIn and other social media because they all contribute to your personal brand.

Read More →

Beautiful Data

I’ve just finished reading Beautiful Data (published by O’Reilly in 2009), a collection of essays edited by Toby Segaran and Jeff Hammerbacher. The 20 essays from 39 contributors address a diverse array of topics relating to data and how it’s collected, analysed and interpreted.

Read More →

Day 29: Distances

Month of Julia
Using various Distance Measures in Julia.

Today we’ll be looking at the Distances package, which implements a range of distance metrics. This might seem a rather obscure topic, but distance calculation is at the core of all clustering techniques (which are next on the agenda), so it’s prudent to know a little about how they work.

Read More →