Installing Spark on Ubuntu
I’m busy experimenting with Spark. This is what I did to set up a local cluster on my Ubuntu machine. Before you embark on this you should first set up Hadoop.
Read More →Social links CV.
and a link to my
Public datasets:
I’m busy experimenting with Spark. This is what I did to set up a local cluster on my Ubuntu machine. Before you embark on this you should first set up Hadoop.
Read More →In the process of uploading a massive CSV file to my Django application my session data are getting pretty big. As a the result I’m getting these errors:
(1153, "Got a packet bigger than 'max_allowed_packet' bytes")
and(2006, 'MySQL server has gone away')
.The second error is potentially unrelated.
After some research it became apparent that the source of the problem is my max_allowed_packet
setting.
I’ve been meaning to set up a VPN and this morning seemed like a good time to tick it off the bucket list. This is a quick outline of my experience, which included one minor hiccup.
Read More →A short note on how to set up Jupyter Notebooks with Python 3 on Ubuntu. The instructions are specific to Xenial Xerus (16.04) but are likely to be helpful elsewhere too.
Read More →I’m in the process of deploying a scraper on a DigitalOcean instance. The scraper uses RSelenium
with the PhantomJS browser. I ran into a problem though. Although it worked flawlessly on my local machine, on the remote instance it broke with an following error.
I have been looking at methods for clustering time domain data and recently read TSclust
: An R Package for Time Series Clustering by Pablo Montero and José Vilar. Here are the results of my initial experiments with the TSclust
package.
The Bulgaria Web Summit happened on 7 and 8 April 2017 at the Inter Expo Center in Sofia, Bulgaria.
Read More →There are a variety of ways to predict running times over the standard marathon distance (42.2 km). You could dust off your copy of The Lore of Running (Tim Noakes). My treasured Third Edition discusses predicting likely marathon times on p. 366, referring to tables published by other authors to actually make predictions. There’s also a variety of online services, for example:
Of these I particularly like the offering from Running for Fitness which produces a neatly tabulated set of predicted times over an extensive range of distances using a selection of techniques including Riegel’s Formula and Cameron’s Model.
Read More →Amazon seems to really understand me. Or, at least, my reading preferences. Running, garlic, data. Yup, that pretty much sums me up.
Read More →Spent a very diverting few minutes playing with Quick, Draw! this morning, which is one of the cool AI Experiments hosted by Google.
Read More →A simple problem sent through to me by one of my running friends:
There are 6 red cards and 1 black card in a box. Busi and Khanya take turns to draw a card at random from the box, with Busi being the first one to draw. The first person who draws the black card will win the game (assume that the game can go on indefinitely). If the cards are drawn with replacement, determine the probability that Khanya will win, showing all working.
Read More →
satRday Cape Town will happen on 18 February 2017 at Workshop 17, Victoria & Alfred Waterfront, Cape Town, South Africa.
Read More →I followed up a reference to fast-neural-style from Twitter and spent a glorious hour experimenting with this code. Very cool stuff indeed. It’s documented in Perceptual Losses for Real-Time Style Transfer and Super-Resolution by Justin Johnson, Alexandre Alahi and Fei-Fei Li.
The basic idea is to use feed-forward convolutional neural networks to generate image transformations. The networks are trained using perceptual loss functions and effectively apply style transfer.
What is “style transfer”? You’ll see in a moment.
Read More →I’m generally not too interested in fitting analytical distributions to my data. With large enough samples (which I am normally fortunate enough to have!) I can safely assume normality for most statistics of interest.
Recently I had a relatively small chunk of data and finding a decent analytical approximation was important. I had a look at the tools available in R for addressing this problem. The {fitdistrplus}
package seemed like a good option. Here’s a sample workflow.
I’m busy working my way through Kyle Banker’s MongoDB in Action. Much of the example code in the book is given in Ruby. Despite the fact that I’d love to learn more about Ruby, for the moment it makes more sense for me to follow along with Python.
Read More →Sometimes you’ll want to see how a site behaves on a slower connection. This can be easily emulated using Chrome DevTools. Go to the Network tab and press the “No throttling” dropdown, which will give you a selection of presets and the option to configure custom connections.
Read More →When figuring out how to formulate the contents of a POST request it’s often useful to see the “typical” fields submitted directly from a web form.
Read More →Seems that I am doing this a lot: deleting my entire graph (all nodes and relationships) and rebuilding from scratch. I guess that this is part of the learning process.
Read More →