The Stan.jl uses the CmdStan command line client for interacting with Stan. This is what I did to install CmdStan on my Ubuntu 18.04 system.
wget -P /tmp/ https://github.com/stan-dev/cmdstan/releases/download/v2.17.1/cmdstan-2.17.1.tar.gz
/opt/
.
cd /opt
sudo tar -zxvf /tmp/cmdstan-2.17.1.tar.gz
cd cmdstan-2.17.1
sudo make build
Since the distribution was unpacked under /opt/
it was read only for non-root users. As a result I had a minor issue actually running my first Stan model because in the process Stan tried to create the precompiled header model_header.hpp.gch
under /opt/
which, of course, resulted in a permission denied error. Quick fix for this: apply the changes from this pull request and rebuild. Sorted!
Upon success you can clean up the download.
rm /tmp/cmdstan-2.17.1.tar.gz
There are a couple of ways to tell Julia where to find stanc
. You can either set an environment variable (this should ideally be appended to the end of ~/.bashrc
so that it is applied to each session).
export CMDSTAN_HOME=/opt/cmdstan-2.17.1/
Alternatively, if you don’t want to clutter up your shell namespace you can define it in your personal Julia initialisation file, ~/.juliarc.jl
:
const CmdStanDir = "/opt/cmdstan-2.17.1/";
I prefer the second approach.
Now fire up Julia then install and load the Stan.jl
package.
Pkg.add("Stan")
using Stan
Find links to pertinent documentation in the package repository.
]]>Julia has a few packages aimed at image processing. We’ll start by looking at the TestImages package, which hosts a selection of sample images, then briefly visit the ImageView package before moving onto the Images package, which implements a range of functions for image manipulation.
The TestImages package currently provides 25 sample images, which form a convenient basis for experimentation.
using TestImages
readdir(joinpath(homedir(), ".julia/v0.4/TestImages/images/"))
25-element Array{ByteString,1}:
"autumn_leaves.png"
"cameraman.tif"
"earth_apollo17.jpg"
"fabio.png"
"house.tif"
"jetplane.tif"
"lake.tif"
"lena_color_256.tif"
"lena_color_512.tif"
"lena_gray_256.tif"
"lena_gray_512.tif"
"lighthouse.png"
"livingroom.tif"
"mandril_color.tif"
"mandril_gray.tif"
"mandrill.tiff"
"moonsurface.tiff"
"mountainstream.png"
"peppers_color.tif"
"peppers_gray.tif"
"pirate.tif"
"toucan.png"
"walkbridge.tif"
"woman_blonde.tif"
"woman_darkhair.tif"
We’ll load the archetypal test image (the November 1972 Playboy centerfold of Lena Söderberg).
lena = testimage("lena_color_256.tif");
Of course, now that we’ve loaded that image, we’ll want to take a look at it. To do that we’ll need the ImageView package.
using ImageView
view(lena)
(ImageCanvas,ImageSlice2d: zoom = Graphics.BoundingBox(0.0,256.0,0.0,256.0))
You can optionally specify the pixel spacing as a parameter to view()
, which then ensures that the aspect ratio of the image is conserved on resizing. There are various other bobs and whistles associated with view()
: you can click-and-drag within the image to zoom in on a particular region; various simple transformations (flipping and rotation) are possible; images can be annotated and multiple images can be arranged on a canvas for simultaneous viewing.
Outside of the test images, an arbitrary image file can be loaded using imread()
from the Images package. Naturally, there are also functions for writing images, imwrite()
and writemime()
.
using Images
earth = imread(joinpath(homedir(), ".julia/v0.4/TestImages/images/earth_apollo17.jpg"))
RGB Images.Image with:
data: 3000x3002 Array{ColorTypes.RGB{FixedPointNumbers.UfixedBase{UInt8,8}},2}
properties:
IMcs: sRGB
spatialorder: x y
pixelspacing: 1 1
The default representation for the Image
object tells us its dimensions, storage type and colour space. The spatial order indicates that the image data are stored using row major ordering. It’s also possible to specify physical units for the pixel spacing, which is particularly important if you are analysing images where absolute scale matters (for example, medical imaging). There are convenience methods for a few image properties.
colorspace(earth)
"RGB"
height(earth)
3002
width(earth)
3000
We can examine individual pixels within the image using the indexing operator.
earth[1,1]
RGB{U8}(0.047,0.008,0.0)
Each pixel is of type RGB
(defined in the Colors
package), which encapsulates a tuple giving the proportion of red, green and blue for that pixel. The underlying image data can also be accessed via the data()
method.
The image can be split into its component colour channels using separate()
.
earth_rgb = separate(earth)
RGB Images.Image with:
data: 3002x3000x3 Array{FixedPointNumbers.UfixedBase{UInt8,8},3}
properties:
IMcs: sRGB
colorspace: RGB
colordim: 3
spatialorder: y x
pixelspacing: 1 1
Note that the result is a three-dimensional Array
. The spatial order has also changed, which means that the data are now represented using column major ordering. The data are thus effectively transposed.
Kernel-based filtering can be applied using imfilter()
or imfilter_fft()
, where the latter is better suited to larger kernels. There’s a variety of helper functions for constructing kernels, like imaverage()
and gaussian2d()
.
lena_smooth = imfilter(lena, imaverage([3, 3]));
lena_very_smooth = imfilter_fft(lena, ones(10, 10) / 100);
lena_gauss_smooth = imfilter_gaussian(lena, [1, 2]);
The effects of the above smoothing operations can be seen below, with the original image on the left, followed by the 3-by-3 and 10-by-10 boxcar filtered versions and finally the Gaussian filtered image.
The imgradients()
function calculates gradients across the image. You can choose from a set of methods for calculating the gradient. The morphological dilation and erosion operations are available via dilate()
and erode()
.
(lena_sobel_x, lena_sobel_y) = imgradients(lena, "sobel");
lena_dilate = dilate(lena);
lena_erode = erode(lena);
Below are the two components of the image gradient calculated using the Sobel operator followed by the results of dilate()
and erode()
.
The ImageMagick package implements further imaging functionality. If, in the future, it provides an interface to the full functionality on the ImageMagick suite then it will be a truly phenomenal resource. Also worth looking at is the PiecewiseAffineTransforms package which implements a technique for warping portions of an image.
If you’d like to exercise your image processing and machine learning skills in Julia, take a look at the First Steps With Julia competition on kaggle.
]]>The Fourier Transform is often applied to signal processing and other analyses. It allows a signal to be transformed between the time domain and the frequency domain. The efficient Fast Fourier Transform (FFT) algorithm is implemented in Julia using the FFTW library.
Let’s start by looking at the Fourier Transform in one dimension. We’ll create test data in the time domain using a wide rectangle function.
f = [abs(x) <= 1 ? 1 : 0 for x in -5:0.1:5];
length(f)
101
This is what the data look like:
We’ll transform the data into the frequency domain using fft()
.
F = fft(f);
typeof(F)
Array{Complex{Float64},1}
length(F)
101
F = fftshift(F);
The frequency domain data are an array of Complex
type with the same length as the time domain data. Since each Complex number consists of two parts (real and imaginary) it seems that we have somehow doubled the information content of our signal. This is not true because half of the frequency domain data are redundant. The fftshift()
function conveniently rearranges the data in the frequency domain so that the negative frequencies are on the left.
This is what the resulting amplitude and power spectra look like:
The analytical Fourier Transform of the rectangle function is the sinc function, which agrees well with numerical data in the plots above.
Let’s make things a bit more interesting: we’ll look at the analogous two-dimensional problem. But this time we’ll go in the opposite direction, starting with a two-dimensional sinc function and taking its Fourier Transform.
Building the array of sinc data is easy using a list comprehension.
f = [(r = sqrt(x^2 + y^2); sinc(r)) for x in -6:0.125:6, y in -6:0.125:6];
typeof(f)
Array{Float64,2}
size(f)
(97,97)
It doesn’t make sense to think about a two-dimensional function in the time domain. But the Fourier Transform is quite egalitarian: it’s happy to work with a temporal signal or a spatial signal (or a signal in pretty much any other domain). So let’s suppose that our two-dimensional data are in the spatial domain. This is what it looks like:
Generating the Fourier Transform is again a simple matter of applying fft()
. No change in syntax: very nice indeed!
F = fft(f);
typeof(F)
Array{Complex{Float64},2}
F = fftshift(F);
The power spectrum demonstrates that the result is the 2D analogue of the rectangle function.
It’s just as easy to apply the FFT to higher dimensional data, although in my experience this is rarely required.
Most of the FFTW library’s functionality has been implemented in the Julia interface. For example:
plan_fft()
;dct()
yields the Discrete Cosine Transform;rfft()
; andFFTW.set_num_threads()
.Watch the video below in which Steve Johnson demonstrates many of the features of FFTs in Julia.
]]>Markdown is a lightweight format specification language developed by John Gruber. Markdown can be converted to HTML, LaTeX or other document formats. You probably knew all that already. The syntax is pretty simple. Check out this useful cheatsheet.
In the latest stable version of Julia support for markdown is provided in the Base package.
using Base.Markdown
import Base.Markdown: MD, Paragraph, Header, Italic, Bold, LineBreak, plain, term, html,
Table, Code, LaTeX, writemime
Markdown is stored in objects of type Base.Markdown.MD
. As you’ll see below, there are at least two ways to construct markdown objects: either directly from a string (using markdown syntax) or programmatically (using a selection of formatting functions).
d1 = md"foo \*italic foo\* \*\*bold foo\*\* \`code foo\`";
d2 = MD(Paragraph(["foo ", Italic("italic foo"), " ", Bold("bold foo"), " ",
Code("code foo")]));
typeof(d1)
Base.Markdown.MD
d1 == d2
true
You’ll find that Base.Markdown.MD
objects are rendered with appropriate formatting in your console.
Functions html()
and latex()
convert Base.Markdown.MD
objects into other formats. Another way of rendering markdown elements is with writemime()
, where the output is determined by specifying a MIME type.
html(d1)
"<p>foo <em>italic foo</em> <strong>bold foo</strong> <code>code foo</code></p>\n"
latex(d1)
"foo \\emph{italic foo} \\textbf{bold foo} \\texttt{code foo}\n"
Markdown has support for section headers, both ordered and unordered lists, tables, code fragments and block quotes.
d3 = md"""# Chapter Title
## Section Title
### Subsection Title""";
d4 = MD(Header{2}("Section Title"));
d3 |> html
"<h1>Chapter Title</h1>\n<h2>Section Title</h2>\n<h3>Subsection Title</h3>\n"
latex(d4)
"\\subsection{Section Title}\n"
Most Julia packages come with a README.md
markdown file which provides an overview of the package. The readme()
function gives you direct access to these files' contents.
readme("Quandl")
Quandl.jl
============
(Image: Build Status)
(Image: Coverage Status)
(Image: Quandl)
Documentation is provided by Read the Docs.
See the Quandl API Help Page for further details about the Quandl API. This package
closely follows the nomenclature used by that documentation.
We can also use parse_file()
to treat the contents of a file as markdown.
d6 = Markdown.parse_file(joinpath(homedir(), ".julia/v0.4/NaNMath/README.md"));
This is rendered below as LaTeX.
\section{NaNMath}
Implementations of basic math functions which return \texttt{NaN} instead of throwing a
\texttt{DomainError}.
Example:
\begin{verbatim}
import NaNMath
NaNMath.log(-100) # NaN
NaNMath.pow(-1.5,2.3) # NaN
\end{verbatim}
In addition this package provides functions that aggregate one dimensional arrays and ignore
elements that are NaN. The following functions are implemented:
\begin{verbatim}
sum
maximum
minimum
mean
var
std
\end{verbatim}
Example:
\begin{verbatim}
using NaNMath; nm=NaNMath
nm.sum([1., 2., NaN]) # result: 3.0
\end{verbatim}
\href{https://travis-ci.org/mlubin/NaNMath.jl}{\begin{figure}
\centering
\includegraphics{https://api.travis-ci.org/mlubin/NaNMath.jl.svg?branch=master}
\caption{Build Status}
\end{figure}
}
What particularly appeals to me about the markdown functionality in Julia is the potential for automated generation of documentation and reports. To see more details of my dalliance with Julia and markdown, visit github.
]]>A lot of my data reflects events happening at different geographic locations (and, incidentally, at different times, but that’s another story). So it’s not surprising that I’m interested in mapping those data. Julia has an OpenStreetMap package which presents an interface to the OpenStreetMap service. The package is well documented and has an extensive range of functionality. As with a number of previous posts in this series, I’m just going to skim the surface of what’s available.
We’ll need to load up the Requests package to retrieve the map data and the OpenStreetMap package to manipulate and process those data.
using Requests
using OpenStreetMap
As far as I can see the OpenStreetMap package doesn’t implement functionality for downloading the map data. So we do this directly through an HTTP request. We’ll specify a map area by giving the latitude and longitude of the bottom-left and top-right corners.
const MAPFILE = "map.osm";
minLon = 30.8821;
maxLon = minLon + 0.05;
minLat = -29.8429;
maxLat = minLat + 0.05;
We then build the query URL using Julia’s convenient string interpolation and execute a GET request against the OpenStreetMap API.
URL = "http://overpass-api.de/api/map?bbox=$(minLon),$(minLat),$(maxLon),$(maxLat)"
"http://overpass-api.de/api/map?bbox=30.8821,-29.8429,30.932100000000002,-29.7929"
osm = get(URL)
Response(200 OK, 10 headers, 1958494 bytes in body)
save(osm, MAPFILE)
"map.osm"
Save the resulting data (it’s just a large blob of XML) to a file. Feel free to open this file in an editor and browse around. Although there is currently no official schema for the OpenStreetMap XML, the documentation gives a solid overview of the format.
file map.osm
map.osm: OpenStreetMap XML data
We process the contents of the XML file using getOSMData()
.
nodes, highways, buildings, features = getOSMData(MAPFILE);
println("Number of nodes: $(length(nodes))")
Number of nodes: 9360
println("Number of highways: $(length(highways))")
Number of highways: 592
println("Number of buildings: $(length(buildings))")
Number of buildings: 5
println("Number of features: $(length(features))")
Number of features: 12
The call to getOSMData()
returns all of the data required to build a map. Amongst these you’ll find a dictionary of features broken down by :class
, :detail
and :name
. It’s always handy to know where the nearest Woolworths is, and this area has two of them.
features
Dict{Int64,OpenStreetMap.Feature} with 12 entries:
1871785198 => OpenStreetMap.Feature("amenity","pharmacy","Clicks")
270909308 => OpenStreetMap.Feature("amenity","fuel","BP")
1932067048 => OpenStreetMap.Feature("shop","supermarket","Spar")
747740685 => OpenStreetMap.Feature("shop","supermarket","Westville mall")
3011871215 => OpenStreetMap.Feature("amenity","restaurant","Lupa")
1871785313 => OpenStreetMap.Feature("shop","clothes","Woolworths")
1871785167 => OpenStreetMap.Feature("shop","supermarket","Checkers")
747740690 => OpenStreetMap.Feature("amenity","school","Westville Girl's High")
1872497461 => OpenStreetMap.Feature("shop","supermarket","Pick n Pay")
1554106907 => OpenStreetMap.Feature("amenity","pub","Waxy O'Conner's")
1872497555 => OpenStreetMap.Feature("shop","supermarket","Woolworths")
1932067047 => OpenStreetMap.Feature("amenity","bank","Standard Bank")
fieldnames(OpenStreetMap.Feature)
3-element Array{Symbol,1}:
:class
:detail
:name
There are other dictionarys which list the highways and buildings in the area.
Although we specified the latitudinal and longitudinal extremes of the map originally, we can retrieve these wrapped up in a data structure. Note that these values are given in Latitude-Longitude-Altitude (LLA) coordinates. There’s functionality for transforming to other coordinate systems like East-North-Up (ENU).
bounds = getBounds(parseMapXML(MAPFILE))
Geodesy.Bounds{Geodesy.LLA}(-29.8429,-29.7929,30.8821,30.9321)
We’re ready to take a look at the map using plotMap()
.
const WIDTH = 800;
plotMap(nodes,
highways = highways,
buildings = buildings,
features = features,
bounds = bounds,
width = WIDTH,
roadways = roads)
And here’s what it looks like. There are ways to further customise the look and feel of the map.
Plotting maps is just the beginning. You can use findIntersections()
to fing highway intersections; generate a transportation network using createGraph()
; and find the shortest and fastest routes between locations using shortestRoute()
and fastestRoute()
. The package is literally a trove of cool and useful things.
There might be interesting synergies between this package and the GeoInterface, GeoIP, GeoJSON and Geodesy packages. Those will have to wait for another day. But feel free to experiment in the meantime!
]]>Today’s post is a mashup of various things relating to networking with Julia. We’ll have a look at FTP transfers, HTTP requests and using the Twitter API.
Only wimps use tape backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it. Linus Torvalds (1996)
Back in the mid-90s Linus Torvalds was a big fan of FTP. I suspect that his sentiments have not changed, although now he’d probably modify that statement with 's/upload/push/;s/ftp/github/'
. He might have made it more gender neutral too, but it’s hard to be sure.
FTP seems a little “old school”, but if you grew up in the 1980s, before scp and sftp came along, then you’ll probably feel (like me) that FTP is an intrinsic part of the internet experience. There are still a lot of anonymous FTP sites in operation. You can find a list here, although it appears to have last been updated in 2003, so some of that information might no longer be valid. We’ll use ftp://speedtest.tele2.net/ for illustrative purposes since it also allows uploads.
First we initiate a connection to the FTP server.
using FTPClient
ftp_init();
ftp = FTP(host = "speedtest.tele2.net", user = "anonymous", pswd = "hiya@gmail.com")
Host: ftp://speedtest.tele2.net/
User: anonymous
Transfer: passive mode
Security: None
Grab a list of files available for download.
readdir(ftp)
18-element Array{ByteString,1}:
"1000GB.zip"
"100GB.zip"
"100KB.zip"
"100MB.zip"
"10GB.zip"
"10MB.zip"
"1GB.zip"
"1KB.zip"
"1MB.zip"
"200MB.zip"
"20MB.zip"
"2MB.zip"
"3MB.zip"
"500MB.zip"
"50MB.zip"
"512KB.zip"
"5MB.zip"
"upload"
This site (as its name would imply) has the sole purpose of conducting speed tests. So the content of those files is not too interesting. But that’s not going to stop me from downloading one.
binary(ftp) # Change transfer mode to BINARY
download(ftp, "1KB.zip", "local-1KB.zip");
Generally anonymous FTP sites do not allow uploads, but this site is an exception. We’ll test that out too.
cd(ftp, "upload")
ascii(ftp) # Change transfer mode to ASCII
upload(ftp, "papersize", open("/etc/papersize"));
Close the connection when you’re done.
ftp_cleanup()
close(ftp);
Okay, I’m over the historical reminiscences now. Onto something more current.
There are a few Julia packages implementing HTTP methods. We’ll focus on the Requests package. The package homepage makes use of http://httpbin.org/ to illustrate the various bits of functionality. This is a good choice since it allows essentially all of the functionality in Requests to be exercised. We’ll take a different approach and apply a subset of the functionality to a couple of more realistic scenarios. Specifically we’ll look at the GET and POST requests.
First we’ll use a GET request to retrieve information from Google Books using ISBN to specify a particular book. The get()
call below is equivalent to opening this URL in your browser.
r1 = get("https://www.googleapis.com/books/v1/volumes";
query = {"q" => "isbn:178328479X"});
We check that everything went well with the request: the status code of 200 indicates that it was successful. The request headers provide some additional metadata.
r1.status
200
r1.headers
Dict{AbstractString,AbstractString} with 18 entries:
"Alt-Svc" => "quic=\":443\"; p=\"1\"; ma=604800"
"Date" => "Mon, 12 Oct 2015 06:01:13 GMT"
"http_minor" => "1"
"Keep-Alive" => "1"
"status_code" => "200"
"Cache-Control" => "private, max-age=0, must-revalidate, no-transform"
"Server" => "GSE"
"Expires" => "Mon, 12 Oct 2015 06:01:13 GMT"
"ETag" => "\"65-LEm5ATkHVhzLpHrk8rG7RWww/xI4TbmPbZwN2eJh_EyxSqn0UHDU\""
"X-XSS-Protection" => "1; mode=block"
"Content-Length" => "2092"
"X-Content-Type-Options" => "nosniff"
"Vary" => "X-Origin"
"http_major" => "1"
"Alternate-Protocol" => "443:quic,p=1"
"Content-Type" => "application/json; charset=UTF-8"
"X-Frame-Options" => "SAMEORIGIN"
"Content-Language" => "en"
The actual content is found in the JSON payload which is stored as an array of unsigned bytes in the data
field. We can have a look at the text content of the payload using Requests.text()
, but accessing fields in these data is done via Requests.json()
. Finding the data you’re actually looking for in the resulting data structure may take a bit of trial and error.
typeof(r1.data)
Array{UInt8,1}
Requests.json(r1)\["items"\]\[1\]["volumeInfo"] # Parsed JSON
Dict{AbstractString,Any} with 17 entries:
"publisher" => "Packt Publishing"
"industryIdentifiers" => Any[Dict{AbstractString,Any}("identifier"=>"178328479X","type"=>"ISBN_10"),Dict{AbstractString,Any}("identifier"=>"9781783…
"language" => "en"
"contentVersion" => "preview-1.0.0"
"imageLinks" => Dict{AbstractString,Any}("smallThumbnail"=>"http://books.google.co.za/books/content?id=Rc0drgEACAAJ&printsec=frontcover&im…
"readingModes" => Dict{AbstractString,Any}("image"=>false,"text"=>false)
"printType" => "BOOK"
"infoLink" => "http://books.google.co.za/books?id=Rc0drgEACAAJ&dq=isbn:178328479X&hl=&source=gbs_api"
"previewLink" => "http://books.google.co.za/books?id=Rc0drgEACAAJ&dq=isbn:178328479X&hl=&cd=1&source=gbs_api"
"allowAnonLogging" => false
"publishedDate" => "2015-02-26"
"canonicalVolumeLink" => "http://books.google.co.za/books/about/Getting_Started_with_Julia_Programming_L.html?hl=&id=Rc0drgEACAAJ"
"title" => "Getting Started with Julia Programming Language"
"categories" => Any["Computers"]
"pageCount" => 214
"authors" => Any["Ivo Balbaert"]
"maturityRating" => "NOT_MATURE
We see that the book in question was written by Ivo Balbaert and entitled “Getting Started with Julia Programming Language”. It was published by Packt Publishing earlier this year. It’s a pretty good book, well worth checking out.
If the payload is not JSON then we process the data differently. For example, after using get()
to download CSV content from Quandl you’d simply use readtable()
from the DataFrames package to produce a data frame.
URL = "https://www.quandl.com/api/v1/datasets/EPI/8.csv";
using DataFrames
population = readtable(IOBuffer(get(URL).data), separator = ',', header = true);
names!(population, [symbol(i) for i in ["Year", "Industrial", "Developing"]]);
head(population)
6x3 DataFrames.DataFrame
| Row | Year | Industrial | Developing |
|-----|--------------|------------|------------|
| 1 | "2100-01-01" | 1334.79 | 8790.14 |
| 2 | "2099-01-01" | 1333.72 | 8786.27 |
| 3 | "2098-01-01" | 1332.64 | 8782.08 |
| 4 | "2097-01-01" | 1331.54 | 8777.6 |
| 5 | "2096-01-01" | 1330.43 | 8772.83 |
| 6 | "2095-01-01" | 1329.32 | 8767.78 |
Of course, as we saw on Day 15, if you’re going to access data from Quandl it would make more sense to use the Quandl package.
Those two queries above were submitted using GET. What about POST? We’ll directly access the Twitter public API to see how many times the URL http://julialang.org/ has been included in a tweet.
r3 = post("http://urls.api.twitter.com/1/urls/count.json";
query = {"url" => "http://julialang.org/"}, data = "Quite a few times!");
Requests.json(r3)
Dict{AbstractString,Any} with 2 entries:
"count" => 2639
"url" => "http://julialang.org/"
The JSON payload has an element count
which indicates that to date that URL has been included in 2639 distinct tweets.
We’ve just seen how to directly access the Twitter API using a POST request. We also know that there is a Quandl package which provides a wrapper around the Quandl API. Not too surprisingly there’s also a wrapper for the Twitter API in the Twitter package. This package greatly simplifies interacting with the Twitter API. No doubt wrappers for other services will follow.
First you need to load the package and authenticate yourself. I’ve got my keys and secrets stored in environment variables which I retrieve using from the ENV[]
global array.
using Twitter
consumer_key = ENV["CONSUMER_KEY"];
consumer_secret = ENV["CONSUMER_SECRET"];
oauth_token = ENV["OAUTH_TOKEN"];
oauth_secret = ENV["OAUTH_SECRET"];
twitterauth(consumer_key, consumer_secret, oauth_token, oauth_secret)
I’ll take this opportunity to pander to my own vanity, looking at which of my tweets have been retweeted. To make sense of the results, convert them to a DataFrame
.
retweets = DataFrame(get_retweets_of_me());
retweets[:, [:created_at, :text]]
20x2 DataFrames.DataFrame
| Row | created_at | text |
|-----|----------------------------------|-------------------------------------------------------------------------------------------------------------------|
| 1 | "Mon Oct 12 21:03:57 +0000 2015" | "Sparkline theory and practice Edward Tufte http://t.co/THgFkv3ZZS #Statistics @EdwardTufte" |
| 2 | "Mon Oct 12 18:33:49 +0000 2015" | "R Developer Fluent in Shiny and ggvis (\$100 for ~2 hours gig) http://t.co/sM8JRVOKiA #jobs" |
| 3 | "Mon Oct 12 15:31:39 +0000 2015" | "Installing LightTable and Juno on Ubuntu http://t.co/2sbEFR7MXR http://t.co/ZMmQ0QHEZs" |
| 4 | "Sun Oct 11 20:05:08 +0000 2015" | "On Forecast Intervals \"too Wide to be Useful\" http://t.co/pxqrpgkewu #Statistics" |
| 5 | "Sun Oct 11 20:04:01 +0000 2015" | "P-value madness: A puzzle about the latest test ban (or dont ask, dont tell) http://t.co/aBSgVYCb3E #Statistics" |
| 6 | "Sat Oct 10 19:04:37 +0000 2015" | "Seasonal adjusment on the fly with X-13ARIMA-SEATS, seasonal and ggplot2 http://t.co/hB9gW8LPn5 #rstats" |
| 7 | "Sat Oct 10 14:34:04 +0000 2015" | "Doomed to fail: A pre-registration site for parapsychology http://t.co/NTEfpJim5k #Statistics" |
| 8 | "Sat Oct 10 13:34:41 +0000 2015" | "Doomed to fail: A pre-registration site for parapsychology http://t.co/7NwYJZRsky #Statistics" |
| 9 | "Sat Oct 10 08:34:43 +0000 2015" | "Too Much Information Can Ruin Your Presentation http://t.co/RdRp9V6EDd #Presentation #speaking" |
| 10 | "Fri Oct 09 20:03:32 +0000 2015" | "Manage The Surge In Unstructured Data http://t.co/fhqfNCNq6O #visualization #infographics" |
| 11 | "Fri Oct 09 12:33:50 +0000 2015" | "Julia 0.4 Release Announcement http://t.co/jqaKWflomJ #julialang" |
| 12 | "Fri Oct 09 12:04:22 +0000 2015" | "User-friendly scaling http://t.co/P9rYu38FeD #rstats" |
| 13 | "Thu Oct 08 16:03:37 +0000 2015" | "#MonthOfJulia Day 31: Regression http://t.co/HBJv5xDHcy #julialang" |
| 14 | "Thu Oct 08 15:33:06 +0000 2015" | "MIT Master's Program To Use MOOCs As 'Admissions Test' http://t.co/OjF8CVYBzW #slashdot" |
| 15 | "Thu Oct 08 06:03:36 +0000 2015" | "Announcing: Calls For Speakers For 2016 Conferences http://t.co/HOqzeAJ3Bx #Presentation #speaking" |
| 16 | "Wed Oct 07 21:05:45 +0000 2015" | "Spark Turns Five Years Old! http://t.co/TislhgsDrz #bigdata" |
| 17 | "Wed Oct 07 21:03:49 +0000 2015" | "5 Reasons To Learn Hadoop http://t.co/ZdmSdkoJUI #bigdata" |
| 18 | "Wed Oct 07 16:04:56 +0000 2015" | "#MonthOfJulia Day 30: Clustering http://t.co/dh6AUqSqKe #julialang" |
| 19 | "Wed Oct 07 15:01:04 +0000 2015" | "#MonthOfJulia Day 30: Clustering http://t.co/IEm60jRNYp http://t.co/tn9iZ65L4j" |
| 20 | "Wed Oct 07 00:34:48 +0000 2015" | "What is Hadoop? Great Infographics Explains How it Works http://t.co/36Cm2raL1w #visualization #infographics" |
You can have a lot of fun playing around with the features in the Twitter API. Trust me.
The HttpServer package provides low level functionality for implementing a HTTP server in Julia. The Mux package implements a higher level of abstraction. There are undoubtedly easier ways of serving your HTTP content, but being able to do it from the ground up in Julia is cool if nothing else!
That’s it for today. I realise that I have already broken through the “month” boundary. I still have a few more topics that I want to cover. It might end up being something more like “A Month and a Week of Julia”.
{
}]]>Grab the distribution from the Light Table homepage. Unpack it and move the resulting folder somewhere suitable.
tar -zxvf LightTableLinux64.tar.gz
sudo mv LightTable /opt/
Go ahead and fire it up.
/opt/LightTable/LightTable
At this stage Light Table is just a generic editor: it doesn’t know anything about Julia or Juno. We’ll need to install a plugin to make that connection. In the Light Table IDE type Ctrl-Space, which will open the Commands dialog. Type show plugin manager
into the search field and then click on the resulting entry.
Search for Juno among the list of available plugins and select Install.
Open the Commands dialog again using Ctrl-Space. Type settings
into the search field.
Click on the User behaviors entry.
Add the following line to the configuration file:
[:app :lt.objs.langs.julia/julia-path "julia"]
At this point you should start up Julia in a terminal and install the Jewel package.
Pkg.add("Jewel")
I ran into some issues with the configuration file for the Julia plugin, so I replaced the contents of ~/.config/LightTable/plugins/Julia/jl/init.jl
with the following:
using Jewel
Jewel.server(map(parseint, ARGS)..., true)
That strips out a lot of error checking, but as long as you have a recent installation of Julia and you have installed the Jewel package, you’re all good.
Time to restart Light Table.
/opt/LightTable/LightTable
You should find that it starts in Juno mode.
Finally, to make things easier we can define a shell macro for Juno.
alias juno='/opt/LightTable/LightTable'
juno
Enjoy.
]]>There are two packages implementing evolutionary computation in Julia: GeneticAlgorithms and Evolutionary. Today we’ll focus on the latter. The Evolutionary package already has an extensive range of functionality and is under active development. The documentation is a little sparse but the author is very quick to respond to any questions or issues you might raise.
I used a GA to optimize seating assignments at my wedding reception. 80 guests over 10 tables. Evaluation function was based on keeping people with their dates, putting people with something in common together, and keeping people with extreme opposite views at separate tables. I ran it several times. Each time, I got nine good tables, and one with all the odd balls. In the end, my wife did the seating assignments. Adrian McCarthy on stackoverflow
Let’s get the package loaded up and then we’ll be ready to begin.
using Evolutionary
We’ll be using a genetic algorithm to solve the knapsack problem. We first need to set up an objective function, which in turn requires data giving the utility and mass of each item we might consider putting in our knapsack. Suppose we have nine potential items with the following characteristics:
utility = [10, 20, 15, 2, 30, 10, 30, 45, 50];
mass = [1, 5, 10, 1, 7, 5, 1, 2, 10];
To get an idea of their relative worth we can look at the utility per unit mass.
utility ./ mass
9-element Array{Float64,1}:
10.0
4.0
1.5
2.0
4.28571
2.0
30.0
22.5
5.0
Evidently item 7 has the highest utility/mass ratio, followed by item 8. So these two items are quite likely to be included in an optimal solution.
The objective function is simply the total utility for a set of selected items. We impose a penalty on the total mass of the knapsack by setting the total utility to zero if our knapsack becomes too heavy (the maximum permissible mass is set to 20).
function summass(n::Vector{Bool})
sum(mass .* n)
end
summass (generic function with 1 method)
function objective(n::Vector{Bool})
(summass(n) <= 20) ? sum(utility .* n) : 0
end
objective (generic function with 1 method)
We’ll give those a whirl just to check that they make sense. Suppose our knapsack holds items 3 and 9.
summass([false, false, true, false, false, false, false, false, true])
20
objective([false, false, true, false, false, false, false, false, true])
65
Looks about right. Note that we want to maximise the objective function (total utility) subject to the mass constraints of the knapsack.
We’re ready to run the genetic algorithm. Note that ga()
takes as it’s first argument a function which it will minimise. We therefore give it the reciprocal of the objective function.
best, invbestfit, generations, tolerance, history = ga(
x -> 1 / objective(x), # Function to MINIMISE
9, # Length of chromosome
initPopulation = collect(randbool(9)),
selection = roulette, # Options: sus
mutation = inversion, # Options: insertion, swap2, scramble, shifting
crossover = singlepoint, # Options: twopoint, uniform
mutationRate = 0.2,
crossoverRate = 0.5,
ɛ = 0.1, # Elitism
debug = false,
verbose = false,
iterations = 200,
populationSize = 50,
interim = true
);
best
9-element Array{Bool,1}:
true
true
false
true
false
false
true
true
true
The optimal solution consists of items 1, 2, 4, 7, 8 and 9. Note that items 7 and 8 (with the highest utility per unit mass) are included. We can check up on the mass constraint and total utility for the optimal solution.
summass(best)
20
objective(best)
157
1 / invbestfit
157.0
Examining the debug output from ga()
is rather illuminating (set the debug
and verbose
parameters to true
). You’ll want to limit the population size and number of iterations when you do this though, otherwise the information deluge can get a little out of hand. The output shows how each member of the population is initialised with the same set of values. The last field on each line is the corresponding value of the objective function.
INIT 1: Bool[true,true,false,true,false,true,true,false,false] : 71.99999999999885
INIT 2: Bool[true,true,false,true,false,true,true,false,false] : 71.99999999999885
INIT 3: Bool[true,true,false,true,false,true,true,false,false] : 71.99999999999885
INIT 4: Bool[true,true,false,true,false,true,true,false,false] : 71.99999999999885
INIT 5: Bool[true,true,false,true,false,true,true,false,false] : 71.99999999999885
Each subsequent iteration dumps output like this:
BEST: [1,2,4,3,5]
MATE 2+4>: Bool[true,true,false,true,true,true,true,false,false] : Bool[true,true,false,true,false,false,true,true,true]
MATE >2+4: Bool[true,true,false,true,true,true,true,true,true] : Bool[true,true,false,true,false,false,true,false,false]
MATE 5+1>: Bool[true,true,false,true,false,false,true,true,true] : Bool[true,true,false,true,false,false,true,true,true]
MATE >5+1: Bool[true,true,false,true,false,false,true,true,true] : Bool[true,true,false,true,false,false,true,true,true]
MUTATED 2>: Bool[true,true,false,true,false,false,true,false,false]
MUTATED >2: Bool[true,false,true,false,false,true,false,true,false]
MUTATED 4>: Bool[true,true,false,true,false,false,true,true,true]
MUTATED >4: Bool[true,true,false,false,true,false,true,true,true]
MUTATED 5>: Bool[true,true,false,true,false,false,true,true,true]
MUTATED >5: Bool[true,true,false,true,true,true,true,false,false]
ELITE 1=>4: Bool[true,true,false,true,false,false,true,true,true] => Bool[true,true,false,false,true,false,true,true,true]
FIT 1: 0.0
FIT 2: 79.99999999999858
FIT 3: 101.9999999999977
FIT 4: 156.99999999999451
FIT 5: 101.9999999999977
BEST: 0.006369426751592357: Bool[true,true,false,true,false,false,true,true,true], G: 8
BEST: [4,3,5,2,1]
We start with a list of the members from the preceding iteration in order of descending fitness (so member 1 has the highest fitness to start with). MATE records detail crossover interactions between pairs of members. These are followed by MUTATED records which specify which members undergo random mutation. ELITE records show which members are promoted unchanged to the following generation (these will always be selected from the fittest of the previous generation). Next we have the FIT records which give the fitness of each of the members of the new population (after crossover, mutation and elitism have been applied). Here we can see that the new member 1 has violated the total mass constraint and thus has a fitness of zero. Two BEST records follow. The first gives details of the single best member from the new generation. Somewhat disconcertingly the first number in this record is the reciprocal of fitness. The second BEST record again rates the members of the new generation in terms of descending fitness.
Using the history of interim results generated by ga()
I could produce the Plotly visualisation below which shows the average and maximum fitness as a function of generation. It’s clear to see how the algorithm rapidly converges on an optimal solution. Incidentally, I asked the package author to modify the code to return these interim results and he complied with a working solution within hours.
In addition to genetic algorithms, the Evolutionary package also implements two other evolutionary algorithms which I will not pretend to understand. Not even for a moment. However, you might want to check out es()
and cmaes()
to see how well they work on your problem. For me, that’s an adventure for another day.
Other related projects you should peruse:
This series is drawing to a close. Still a few more things I want to write about (although I have already violated the “Month” constraint). I’ll be back later in the week.
]]>Yesterday we had a look at Julia’s regression model capabilities. A natural counterpart to these are models which perform classification. We’ll be looking at the GLM and DecisionTree packages. But, before I move on to that, I should mention the MLBase package which provides a load of functionality for data preprocessing, performance evaluation, cross-validation and model tuning.
Logistic regression lies on the border between the regression techniques we considered yesterday and the classification techniques we’re looking at today. In effect though it’s really a classification technique. We’ll use some data generated in yesterday’s post to illustrate. Specifically we’ll look at the relationship between the Boolean field valid
and the three numeric fields.
head(points)
6x4 DataFrame
| Row | x | y | z | valid |
|-----|----------|---------|---------|-------|
| 1 | 0.867859 | 3.08688 | 6.03142 | false |
| 2 | 9.92178 | 33.4759 | 2.14742 | true |
| 3 | 8.54372 | 32.2662 | 8.86289 | true |
| 4 | 9.69646 | 35.5689 | 8.83644 | true |
| 5 | 4.02686 | 12.4154 | 2.75854 | false |
| 6 | 6.89605 | 27.1884 | 6.10983 | true |
To further refresh your memory, the plot below shows the relationship between valid
and the variables x
and y
. We’re going to attempt to capture this relationship in our model.
Logistic regression is also applied with the glm()
function from the GLM package. The call looks very similar to the one used for linear regression except that the error family is now Binomial()
and we’re using a logit link function.
model = glm(valid ~ x + y + z, points, Binomial(), LogitLink())
DataFrameRegressionModel{GeneralizedLinearModel{GlmResp{Array{Float64,1},Binomial,LogitLink},
DensePredChol{Float64,Cholesky{Float64}}},Float64}:
Coefficients:
Estimate Std.Error z value Pr(>|z|)
(Intercept) -23.1457 3.74348 -6.18295 <1e-9
x -0.260122 0.269059 -0.966786 0.3337
y 1.36143 0.244123 5.5768 <1e-7
z 0.723107 0.14739 4.90606 <1e-6
According to the model there is a significant relationship between valid
and both y
and z
but not x
. Looking at the plot above we can see that x
does have an influence on valid
(there is a gradual transition from false to true with increasing x
), but that this effect is rather “fuzzy”, hence the large p-value. By comparison there is a very clear and abrupt change in valid
at y
values of around 15. The effect of y
is also about twice as strong as that of z
. All of this makes sense in light of the way that the data were constructed.
Now we’ll look at another classification technique: decision trees. First load the required packages and then grab the iris data.
using MLBase, DecisionTree
using RDatasets, Distributions
iris = dataset("datasets", "iris");
iris[1:5,:]
5x5 DataFrame
| Row | SepalLength | SepalWidth | PetalLength | PetalWidth | Species |
|-----|-------------|------------|-------------|------------|---------|
| 1 | 5.1 | 3.5 | 1.4 | 0.2 | "setosa" |
| 2 | 4.9 | 3.0 | 1.4 | 0.2 | "setosa" |
| 3 | 4.7 | 3.2 | 1.3 | 0.2 | "setosa" |
| 4 | 4.6 | 3.1 | 1.5 | 0.2 | "setosa" |
| 5 | 5.0 | 3.6 | 1.4 | 0.2 | "setosa" |
We’ll also define a Boolean variable to split the data into training and testing sets.
train = rand(Bernoulli(0.75), nrow(iris)) .== 1;
We split the data into features and labels and then feed those into build_tree()
. In this case we are building a classifier to identify whether or not a particular iris is of the versicolor variety.
features = array(iris[:,1:4]);
labels = [n == "versicolor" ? 1 : 0 for n in iris[:Species]];
model = build_tree(labels[train], features[train,:]);
Let’s have a look at the product of a labours.
print_tree(model)
Feature 3, Threshold 3.0
L-> 0 : 36/36
R-> Feature 3, Threshold 4.8
L-> Feature 4, Threshold 1.7
L-> 1 : 38/38
R-> 0 : 1/1
R-> Feature 3, Threshold 5.1
L-> Feature 1, Threshold 6.7
L-> Feature 2, Threshold 3.2
L-> Feature 4, Threshold 1.8
L-> Feature 1, Threshold 6.3
L-> 0 : 1/1
R-> 1 : 1/1
R-> 0 : 5/5
R-> 1 : 1/1
R-> 1 : 2/2
R-> 0 : 29/29
The textual representation of the tree above breaks the decision process down into a number of branches where the model decides whether to go to the left (L) or right (R) branch according to whether or not the value of a given feature is above or below a threshold value. So, for example, on the third line of the output we must decide whether to move to the left or right depending on whether feature 3 (PetalLength) is less or greater than 4.8.
We can then apply the decision tree model to the testing data and see how well it performs using standard metrics.
predictions = apply_tree(model, features[!train,:]);
ROC = roc(labels[!train], convert(Array{Int32,1}, predictions))
ROCNums{Int64}
p = 8
n = 28
tp = 7
tn = 28
fp = 0
fn = 1
precision(ROC)
1.0
recall(ROC)
0.875
A true positive rate of 87.5% and true negative rate of 100% is not too bad at all!
The DecisionTree package also implements random forest and boosting models. Other related packages are:
Definitely worth checking out if you have the time. My time is up though. Come back soon to hear about what Julia provides for evolutionary programming.
]]>Today we’ll be looking at two packages for regression analyses in Julia: GLM and GLMNet. Let’s get both of those loaded so that we can begin.
using GLM, GLMNet
Next we’ll create a synthetic data set which we’ll use for illustration purposes.
using Distributions, DataFrames
points = DataFrame();
points[:x] = rand(Uniform(0.0, 10.0), 500);
points[:y] = 2 + 3 * points[:x] + rand(Normal(1.0, 3.0) , 500);
points[:z] = rand(Uniform(0.0, 10.0), 500);
points[:valid] = 2 * points[:y] + points[:z] + rand(Normal(0.0, 3.0), 500) .> 35;
head(points)
6x4 DataFrame
| Row | x | y | z | valid |
|-----|----------|---------|---------|-------|
| 1 | 0.867859 | 3.08688 | 6.03142 | false |
| 2 | 9.92178 | 33.4759 | 2.14742 | true |
| 3 | 8.54372 | 32.2662 | 8.86289 | true |
| 4 | 9.69646 | 35.5689 | 8.83644 | true |
| 5 | 4.02686 | 12.4154 | 2.75854 | false |
| 6 | 6.89605 | 27.1884 | 6.10983 | true |
By design there is a linear relationship between the x
and y
fields. We can extract that relationship from the data using glm()
.
model = glm(y ~ x, points, Normal(), IdentityLink())
DataFrameRegressionModel{GeneralizedLinearModel{GlmResp{Array{Float64,1},Normal,IdentityLink},
DensePredChol{Float64,Cholesky{Float64}}},Float64}:
Coefficients:
Estimate Std.Error z value Pr(>|z|)
(Intercept) 2.69863 0.265687 10.1572 <1e-23
x 2.99845 0.0474285 63.2204 <1e-99
The third and forth arguments to glm()
stipulate that we are applying simple linear regression where we expect the residuals to have a Normal distribution. The parameter estimates are close to what was expected, taking into account the additive noise introduced into the data. The call to glm()
seems rather verbose for something as simple as linear regression and, consequently, there is a shortcut lm()
which gets the same result with less fuss.
Using the result of glm()
we can directly access the estimated coefficients along with their standard errors and the associated covariance matrix.
coef(model)
2-element Array{Float64,1}:
2.69863
2.99845
stderr(model)
2-element Array{Float64,1}:
0.265687
0.0474285
vcov(model)
2x2 Array{Float64,2}:
0.0705897 -0.0107768
-0.0107768 0.00224947
The data along with the linear regression fit are shown below.
Moving on to the GLMNet package, which implements linear models with penalised maximum likelihood estimators. We’ll use the Boston housing data from R’s MASS package for illustration.
using RDatasets
boston = dataset("MASS", "Boston");
X = array(boston[:,1:13]);
y = array(boston[:,14]); # Median value of houses in units of $1000
Running glmnet()
which will fit models for various values of the regularisation parameter, λ.
path = glmnet(X, y);
The result is a set of 76 different models. We’ll have a look at the intercepts and coefficients for the first ten models (which correspond to the largest values of λ). The coefficients are held in the betas
field, which is an array with a column for each model and a row for each coefficient. Since the first few models are strongly penalised, each has only a few non-zero coefficients.
path.a0[1:10]
10-element Array{Float64,1}:
22.5328
23.6007
23.6726
21.4465
19.4206
17.5746
15.8927
14.3602
12.9638
12.5562
path.betas[:,1:10]
13x10 Array{Float64,2}:
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.127841 0.569442 0.971462 1.33777 1.67153 1.97564 2.25274 2.47954
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -0.040168
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 -0.0843998 -0.153581 -0.196981 -0.236547 -0.272599 -0.305447 -0.335377 -0.36264 -0.384493
Now that we’ve got a bundle of models, how do we choose among them? Cross-validation, of course!
path = glmnetcv(X, y)
Least Squares GLMNet Cross Validation
76 models for 13 predictors in 10 folds
Best λ 0.028 (mean loss 24.161, std 3.019)
We find that the best results (on the basis of loss) were achieved when λ had a value of 0.028, which is relatively weak regularisation. We’ll put the parameters of the corresponding model neatly in a data frame.
DataFrame(variable = names(boston)[1:13], beta = path.path.betas[:,indmin(path.meanloss)])
13x2 DataFrame
| Row | variable | beta |
|-----|----------|------------|
| 1 | Crim | -0.0983463 |
| 2 | Zn | 0.0414416 |
| 3 | Indus | 0.0 |
| 4 | Chas | 2.68519 |
| 5 | NOx | -16.3066 |
| 6 | Rm | 3.86694 |
| 7 | Age | 0.0 |
| 8 | Dis | -1.39602 |
| 9 | Rad | 0.252687 |
| 10 | Tax | -0.0098268 |
| 11 | PTRatio | -0.929989 |
| 12 | Black | 0.00902588 |
| 13 | LStat | -0.5225 |
From the fit coefficients we can conclude, for example, that average house value increases with the number of rooms in the house (Rm
) but decreases with nitrogen oxides concentration (NOx
), which is a proxy for traffic intensity.
Whew! That was exhilarating but exhausting. As a footnote, please have a look at the thesis “RegTools: A Julia Package for Assisting Regression Analysis” by Muzhou Liang. The RegTools package is available here. As always, the full code for today is available on github. Next time we’ll look at classification models. Below are a couple of pertinent videos to keep you busy in the meantime.
]]>Today we’re going to look at the Clustering package, the documentation for which can be found here. As usual, the first step is loading the package.
using Clustering
We’ll use the RDatasets package to select the xclara data and rename the columns in the resulting data frame.
using RDatasets
xclara = dataset("cluster", "xclara");
names!(xclara, [symbol(i) for i in ["x", "y"]]);
Using Gadfly to generate a plot we can clearly see that there are three well defined clusters in the data.
Next we need to transform the data into an Array and then transpose it so that each point lies in a separate column (remember that this is key to calculating distances!).
xclara = convert(Array, xclara);
xclara = xclara';
Before we can run the clustering algorithm we need to identify seed points which act as the starting locations for clusters. There are a number of options for doing this. We’re simply going to choose three points in the data at random. How did we arrive at three starting points (as opposed to, say, six)? Well, in this case it was simply visual inspection: there appear to be three clear clusters in the data. When the data are more complicated (or have higher dimensionality) then choosing the number of clusters becomes a little more tricky.
initseeds(:rand, xclara, 3)
3-element Array{Int64,1}:
2858
980
2800
Now we’re ready to run the clustering algorithm. We’ll start with k-means clustering.
xclara_kmeans = kmeans(xclara, 3);
A quick plot will confirm that it has recognised the three clusters that we intuitively identified in the data.
We can have a look at the cluster centers, the number of points assigned to each cluster and (a subset of) the cluster assignments.
xclara_kmeans.centers
2x3 Array{Float64,2}:
9.47805 69.9242 40.6836
10.6861 -10.1196 59.7159
xclara_kmeans.counts
3-element Array{Int64,1}:
899
952
1149
xclara_kmeans.assignments[1:10]
10-element Array{Int64,1}:
2
2
2
2
2
2
2
2
2
2
The k-means algorithm is limited to using the Euclidean metric to calculate the distance between points. An alternative, k-medoids clustering, is also supported in the Clustering package. The kmedoids()
function accepts a distance matrix (from an arbitrary metric) as it’s first argument, allowing for a far greater degree of flexibility.
The final algorithm implemented by Clustering is DBSCAN, which is a density based clustering algorithm. In addition to a distance matrix, dbscan()
also requires neighbourhood radius and the minimum number of points per cluster.
using Distances
dclara = pairwise(SqEuclidean(), xclara);
xclara_dbscan = dbscan(dclara, 10, 40);
As is apparent from the plot below, DBSCAN results in a dramatically different set of clusters. The loosely packed blue points on the periphery of each of the three clusters have been identified as noise by the DBSCAN algorithm. Only the high density cores of these clusters are now separately identified.
That’s it for the moment about clusters. The full code for today can be found on github. Tomorrow we’ll take a look at regression. In the meantime, take a few minutes to watch the video below about using Julia’s clustering capabilities for climate classification.
]]>Today we’ll be looking at the Distances package, which implements a range of distance metrics. This might seem a rather obscure topic, but distance calculation is at the core of all clustering techniques (which are next on the agenda), so it’s prudent to know a little about how they work.
Note that there is a Distance package as well (singular!), which was deprecated in favour of the Distances package. So please install and load the latter.
using Distances
We’ll start by finding the distance between a pair of vectors.
x = [1., 2., 3.];
y = [-1., 3., 5.];
A simple application of Pythagora’s Theorem will tell you that the Euclidean distance between the tips of those vectors is 3. We can confirm our maths with Julia though. The general form of a distance calculation uses evaluate()
, where the first argument is a distance type. Common distance metrics (like Euclidean distance) also come with convenience functions.
evaluate(Euclidean(), x, y)
3.0
euclidean(x, y)
3.0
We can just as easily calculate other metrics like the city block (or Manhattan), cosine or Chebyshev distances.
evaluate(Cityblock(), x, y)
5.0
cityblock(x, y)
5.0
evaluate(CosineDist(), x, y)
0.09649209709474871
evaluate(Chebyshev(), x, y)
2.0
Moving on to distances between the columns of matrices. Again we’ll define a pair of matrices for illustration.
X = [0 1; 0 2; 0 3];
Y = [1 -1; 1 3; 1 5];
With colwise()
distances are calculated between corresponding columns in the two matrices. If one of the matrices has only a single column (see the example with Chebyshev()
below) then the distance is calculated between that column and all columns in the other matrix.
colwise(Euclidean(), X, Y)
2-element Array{Float64,1}:
1.73205
3.0
colwise(Hamming(), X, Y)
2-element Array{Int64,1}:
3
3
colwise(Chebyshev(), X[:,1], Y)
2-element Array{Float64,1}:
1.0
5.0
We also have the option of using pairwise()
which gives the distances between all pairs of columns from the two matrices. This is precisely the distance matrix that we would use for a cluster analysis.
pairwise(Euclidean(), X, Y)
2x2 Array{Float64,2}:
1.73205 5.91608
2.23607 3.0
pairwise(Euclidean(), X)
2x2 Array{Float64,2}:
0.0 3.74166
3.74166 0.0
pairwise(Mahalanobis(eye(3)), X, Y) # Effectively just the Euclidean metric
2x2 Array{Float64,2}:
1.73205 5.91608
2.23607 3.0
pairwise(WeightedEuclidean([1.0, 2.0, 3.0]), X, Y)
2x2 Array{Float64,2}:
2.44949 9.69536
3.74166 4.24264
As you might have observed from the last example above, it’s also possible to calculate weighted versions of some of the metrics.
Finally a less contrived example. We’ll look at the distances between observations in the iris data set. We first need to extract only the numeric component of each record and then transpose the resulting matrix so that observations become columns (rather than rows).
using RDatasets
iris = dataset("datasets", "iris");
iris = convert(Array, iris[:,1:4]);
iris = transpose(iris);
dist_iris = pairwise(Euclidean(), iris);
dist_iris[1:5,1:5]
5x5 Array{Float64,2}:
0.0 0.538516 0.509902 0.648074 0.141421
0.538516 0.0 0.3 0.331662 0.608276
0.509902 0.3 0.0 0.244949 0.509902
0.648074 0.331662 0.244949 0.0 0.648074
0.141421 0.608276 0.509902 0.648074 0.0
The full distance matrix is illustrated below as a heatmap using Plotly. Note how the clearly define blocks for each of the iris species setosa, versicolor, and virginica.
Tomorrow we’ll be back to look at clustering in Julia.
]]>It’s all very well generating myriad statistics characterising your data. How do you know whether or not those statistics are telling you something interesting? Hypothesis Tests. To that end, we’ll be looking at the HypothesisTests package today.
The first (small) hurdle is loading the package.
using HypothesisTests
That wasn’t too bad. Next we’ll assemble some synthetic data.
using Distributions
srand(357)
x1 = rand(Normal(), 1000);
x2 = rand(Normal(0.5, 1), 1000);
x3 = rand(Binomial(100, 0.25), 1000); # 25% success rate on samples of size 100
x4 = rand(Binomial(50, 0.50), 1000); # 50% success rate on samples of size 50
x5 = rand(Bernoulli(0.25), 100) .== 1;
We’ll apply a one sample t-test to x1
and x2
. The output below indicates that x2
has a mean which differs significantly from zero while x1
does not. This is consistent with our expectations based on the way that these data were generated. I’m impressed by the level of detail in the output from OneSampleTTest()
: different aspects of the test are neatly broken down into sections (population, test summary and details) and there is automated high level interpretation of the test results.
t1 = OneSampleTTest(x1)
One sample t-test
------
Population details:
parameter of interest: Mean
value under h_0: 0
point estimate: -0.013027816861268473
95% confidence interval: (-0.07587776077157478,0.04982212704903784)
Test summary:
outcome with 95% confidence: fail to reject h_0
two-sided p-value: 0.6842692696393744 (not signficant)
Details:
number of observations: 1000
t-statistic: -0.40676289562651996
degrees of freedom: 999
empirical standard error: 0.03202803648352013
t2 = OneSampleTTest(x2)
One sample t-test
------
Population details:
parameter of interest: Mean
value under h_0: 0
point estimate: 0.5078522467069418
95% confidence interval: (0.44682036100064954,0.5688841324132342)
Test summary:
outcome with 95% confidence: reject h_0
two-sided p-value: 2.6256160116367554e-53 (extremely significant)
Details:
number of observations: 1000
t-statistic: 16.328833826939398
degrees of freedom: 999
empirical standard error: 0.031101562554276502
Using pvalue()
we can further interrogate the p-values generated by these tests. The values reported in the output above are for the two-sided test, but we can look specifically at values associated with either the left- or right tails of the distribution. This makes the outcome of the test a lot more specific.
pvalue(t1)
0.6842692696393744
pvalue(t2)
2.6256160116367554e-53
pvalue(t2, tail = :left) # Not significant.
1.0
pvalue(t2, tail = :right) # Very significant indeed!
1.3128080058183777e-53
The associated confidence intervals are also readily accessible. We can choose between two-sided or left/right one-sided intervals as well as change the significance level.
ci(t2, tail = :both) # Two-sided 95% confidence interval by default
(0.44682036100064954,0.5688841324132342)
ci(t2, tail = :left) # One-sided 95% confidence interval (left)
(-Inf,0.5590572480083876)
ci(t2, 0.01, tail = :right) # One-sided 99% confidence interval (right)
(0.43538291818831604,Inf)
As a second (and final) example we’ll look at BinomialTest()
. There are various ways to call this function. First, without looking at any particular data, we’ll check whether 25 successes from 100 samples is inconsistent with a 25% success rate (obviously not and, as a result, we fail to reject this hypothesis).
BinomialTest(25, 100, 0.25)
Binomial test
-----
Population details:
parameter of interest: Probability of success
value under h_0: 0.25
point estimate: 0.25
95% confidence interval: (0.16877973809934183,0.3465524957588082)
Test summary:
outcome with 95% confidence: fail to reject h_0
two-sided p-value: 1.0 (not signficant)
Details:
number of observations: 100
number of successes: 25
Next we’ll see whether the Bernoulli samples in x5
provide contradictory evidence to an assumed 50% success rate (based on the way that x5
was generated we are not surprised to find an infinitesimal p-value and the hypothesis is soundly rejected).
BinomialTest(x5, 0.5)
Binomial test
-----
Population details:
parameter of interest: Probability of success
value under h_0: 0.5
point estimate: 0.18
95% confidence interval: (0.11031122915326055,0.26947708596681197)
Test summary:
outcome with 95% confidence: reject h_0
two-sided p-value: 6.147806615048005e-11 (extremely significant)
Details:
number of observations: 100
number of successes: 18
There are a number of other tests available in this package, including a range of non-parametric tests which I have not even mentioned above. Certainly HypothesisTests should cover most of the bases for statistical inference. For more information, read the extensive documentation. Check out the sample code on github for further examples.
Look here for an explanation of the xkcd cartoon.(although if you are reading this blog, then that probably won’t be necessary).
]]>Today I’m looking at the Distributions package.
Let’s get things rolling by loading it up.
using Distributions
There’s some overlap between the functionality in Distributions and what we saw yesterday in the StatsFuns package. So, instead of looking at functions to evaluate various aspects of PDFs and CDFs, we’ll focus on sampling from distributions and calculating summary statistics.
Julia has native support for sampling from a uniform distribution. We’ve seen this before, but here’s a reminder.
srand(359) # Set random number seed.
rand() # Random number on [0, 1)
0.4770241944535658
What if you need to generate samples from a more exotic distribution? The Normal distribution, although not particularly exotic, seems like a natural place to start. The Distributions package exposes a type for each supported distribution. For the Normal distribution the type is appropriately named Normal
. It’s derived from Distribution
with characteristics Univariate
and Continuous
.
super(Normal)
Distribution{Univariate,Continuous}
names(Normal)
2-element Array{Symbol,1}:
:μ
:σ
The constructor accepts two parameters: mean (μ) and standard deviation (σ). We’ll instantiate a Normal
object with mean 1 and standard deviation 3.
d1 = Normal(1.0, 3.0)
Normal(μ=1.0, σ=3.0)
params(d1)
(1.0,3.0)
d1.μ
1.0
d1.σ
3.0
Thanks to the wonders of multiple dispatch we are then able to generate samples from this object with the rand()
method.
x = rand(d1, 1000);
We’ll use Gadfly to generate a histogram to validate that the samples are reasonable. They look pretty good.
There are functions like pdf()
, cdf()
, logpdf()
and logcdf()
which allow the density function of our distribution object to be evaluated at particular points. Check those out. We’re moving on to truncating a portion of the distribution, leaving a Truncated
distribution object.
d2 = Truncated(d1, -4.0, 6.0);
Again we can use Gadfly to get an idea of what this looks like. This time we’ll plot the actual PDF rather than a histogram of samples.
The Distributions package implements an extensive selection of other continuous distributions, like Exponential, Poisson, Gamma and Weibull. The basic interface for each of these is consistent with what we’ve seen for Normal
above, although there are some methods which are specific to some distributions.
Let’s look at a discrete distribution, using a Bernoulli distribution with success rate of 25% as an example.
d4 = Bernoulli(0.25)
Bernoulli(p=0.25)
rand(d4, 10)
10-element Array{Int64,1}:
0
1
0
1
1
0
0
0
1
0
What about a Binomial distribution? Suppose that we have a success rate of 25% per trial and want to sample the number of successes in a batch of 100 trials.
d5 = Binomial(100, 0.25)
Binomial(n=100, p=0.25)
rand(d5, 10)
10-element Array{Int64,1}:
22
21
30
23
28
25
26
26
28
21
Finally let’s look at an example of fitting a distribution to a collection of samples using Maximum Likelihood.
x = rand(d1, 10000);
fit(Normal, x)
Normal(μ=1.0015796782177036, σ=3.033914550184868)
Yup, those values are in pretty good agreement with the mean and standard deviation we specified for our Normal
object originally.
That’s it for today. There’s more to the Distributions package though. Check out my github repository to see other examples which didn’t make it into the today’s post.
]]>JuliaStats is a meta-project which consolidates various packages related to statistics and machine learning in Julia. Well worth taking a look if you plan on working in this domain.
x = rand(10);
mean(x)
0.5287191472784906
```julia
std(x)
0.2885446536178459
Julia already has some builtin support for statistical operations, so additional packages are not strictly necessary. However they do increase the scope and ease of possible operations (as we’ll see below).Julia already has some builtin support for statistical operations. Let’s kick off by loading all the packages that we’ll be looking at today.
using StatsBase, StatsFuns, StreamStats
The documentation for StatsBase can be found here. As the package name implies, it provides support for basic statistical operations in Julia.
High level summary statistics are generated by summarystats()
.
summarystats(x)
Summary Stats:
Mean: 0.528719
Minimum: 0.064803
1st Quartile: 0.317819
Median: 0.529662
3rd Quartile: 0.649787
Maximum: 0.974760
Weighted versions of the mean, variance and standard deviation are implemented. There’re also geometric and harmonic means.
w = WeightVec(rand(1:10, 10)); # A weight vector.
mean(x, w) # Weighted mean.
0.48819933297961043
var(x, w) # Weighted variance.
0.08303843715334995
std(x, w) # Weighted standard deviation.
0.2881639067498738
skewness(x, w)
0.11688162715805048
kurtosis(x, w)
-0.9210456851144664
mean_and_std(x, w)
(0.48819933297961043,0.2881639067498738)
There’s a weighted median as well as functions for calculating quantiles.
median(x) # Median.
0.5296622773635412
median(x, w) # Weighted median.
0.5729104703595038
quantile(x)
5-element Array{Float64,1}:
0.0648032
0.317819
0.529662
0.649787
0.97476
nquantile(x, 8)
9-element Array{Float64,1}:
0.0648032
0.256172
0.317819
0.465001
0.529662
0.60472
0.649787
0.893513
0.97476
iqr(x) # Inter-quartile range.
0.3319677541313941
Sampling from a population is also catered for, with a range of algorithms which can be applied to the sampling procedure.
sample(['a':'z'], 5) # Sampling (with replacement).
5-element Array{Char,1}:
'w'
'x'
'e'
'e'
'o'
wsample(['T', 'F'], [5, 1], 10) # Weighted sampling (with replacement).
10-element Array{Char,1}:
'F'
'T'
'T'
'T'
'F'
'T'
'T'
'T'
'T'
'T'
There’s also functionality for empirical estimation of distributions from histograms and a range of other interesting and useful goodies.
The StatsFuns package provides constants and functions for statistical computing. The constants are by no means essential but certainly very handy. Take, for example, twoπ
and sqrt2
.
There are some mildly exotic mathematical functions available like logistic, logit and softmax.
logistic(-5)
0.0066928509242848554
logistic(5)
0.9933071490757153
logit(0.25)
-1.0986122886681098
logit(0.75)
1.0986122886681096
softmax([1, 3, 2, 5, 3])
5-element Array{Float64,1}:
0.0136809
0.101089
0.0371886
0.746952
0.101089
Finally there is a suite of functions relating to various statistical distributions. The functions for the Normal distribution are illustrated below, but there’re functions for Beta and Binomial distribution, the Gamma and Hypergeometric distribution and many others. The function naming convention is consistent across all distributions.
normpdf(0); # PDF
normlogpdf(0); # log PDF
normcdf(0); # CDF
normccdf(0); # Complementary CDF
normlogcdf(0); # log CDF
normlogccdf(0); # log Complementary CDF
norminvcdf(0.5); # inverse-CDF
norminvccdf(0.99); # inverse-Complementary CDF
norminvlogcdf(-0.693147180559945); # inverse-log CDF
norminvlogccdf(-0.693147180559945); # inverse-log Complementary CDF
Finally, the StreamStats package supports calculating online statistics for a stream of data which is being continuously updated.
average = StreamStats.Mean()
Online Mean
* Mean: 0.000000
* N: 0
variance = StreamStats.Var()
Online Variance
* Variance: NaN
* N: 0
for x in rand(10)
update!(average, x)
update!(variance, x)
@printf("x = %3.f: mean = %.3f | variance = %.3f\n", x, state(average), state(variance))
end
x = 0.928564: mean = 0.929 | variance = NaN
x = 0.087779: mean = 0.508 | variance = 0.353
x = 0.253300: mean = 0.423 | variance = 0.198
x = 0.778306: mean = 0.512 | variance = 0.164
x = 0.566764: mean = 0.523 | variance = 0.123
x = 0.812629: mean = 0.571 | variance = 0.113
x = 0.760074: mean = 0.598 | variance = 0.099
x = 0.328495: mean = 0.564 | variance = 0.094
x = 0.303542: mean = 0.535 | variance = 0.090
x = 0.492716: mean = 0.531 | variance = 0.080
In addition to the mean and variance illustrated above, the package also supports online versions of min() and max(), and can be used to generate incremental confidence intervals for Bernoulli and Poisson processes.
That’s it for today. Check out the full code on github and watch the video below.
]]>Why would you want to call other languages from within Julia? Here are a couple of reasons:
The second reason should apply relatively seldom because, as we saw some time ago, Julia provides performance which rivals native C or FORTRAN code.
C functions are called via ccall()
, where the name of the C function and the library it lives in are passed as a tuple in the first argument, followed by the return type of the function and the types of the function arguments, and finally the arguments themselves. It’s a bit klunky, but it works!
ccall((:sqrt, "libm"), Float64, (Float64,), 64.0)
8.0
It makes sense to wrap a call like that in a native Julia function.
csqrt(x) = ccall((:sqrt, "libm"), Float64, (Float64,), x);
csqrt(64.0)
8.0
This function will not be vectorised by default (just try call csqrt()
on a vector!), but it’s a simple matter to produce a vectorised version using the @vectorize_1arg
macro.
@vectorize_1arg Real csqrt;
methods(csqrt)
# 4 methods for generic function "csqrt":
csqrt{T<:Real}(::AbstractArray{T<:Real,1}) at operators.jl:359
csqrt{T<:Real}(::AbstractArray{T<:Real,2}) at operators.jl:360
csqrt{T<:Real}(::AbstractArray{T<:Real,N}) at operators.jl:362
csqrt(x) at none:6
Note that a few extra specialised methods have been introduced and now calling csqrt()
on a vector works perfectly.
csqrt([1, 4, 9, 16])
4-element Array{Float64,1}:
1.0
2.0
3.0
4.0
I’ll freely admit that I don’t dabble in C too often these days. R, on the other hand, is a daily workhorse. So being able to import R functionality into Julia is very appealing. The first thing that we need to do is load up a few packages, the most important of which is RCall
. There’s great documentation for the package here.
using RCall
using DataArrays, DataFrames
We immediately have access to R’s builtin data sets and we can display them using rprint()
.
rprint(:HairEyeColor)
, , Sex = Male
Eye
Hair Brown Blue Hazel Green
Black 32 11 10 3
Brown 53 50 25 15
Red 10 10 7 7
Blond 3 30 5 8
, , Sex = Female
Eye
Hair Brown Blue Hazel Green
Black 36 9 5 2
Brown 66 34 29 14
Red 16 7 7 7
Blond 4 64 5 8
We can also copy those data across from R to Julia.
airquality = DataFrame(:airquality);
head(airquality)
6x6 DataFrame
| Row | Ozone | Solar.R | Wind | Temp | Month | Day |
|-----|-------|---------|------|------|-------|-----|
| 1 | 41 | 190 | 7.4 | 67 | 5 | 1 |
| 2 | 36 | 118 | 8.0 | 72 | 5 | 2 |
| 3 | 12 | 149 | 12.6 | 74 | 5 | 3 |
| 4 | 18 | 313 | 11.5 | 62 | 5 | 4 |
| 5 | NA | NA | 14.3 | 56 | 5 | 5 |
| 6 | 28 | NA | 14.9 | 66 | 5 | 6 |
rcopy()
provides a high-level interface to function calls in R.
rcopy("runif(3)")
3-element Array{Float64,1}:
0.752226
0.683104
0.290194
However, for some complex objects there is no simple way to translate between R and Julia, and in these cases rcopy()
fails. We can see in the case below that the object of class lm
returned by lm()
does not diffuse intact across the R-Julia membrane.
"fit <- lm(bwt ~ ., data = MASS::birthwt)" |> rcopy
ERROR: `rcopy` has no method matching rcopy(::LangSxp)
in rcopy at no file
in map_to! at abstractarray.jl:1311
in map_to! at abstractarray.jl:1320
in map at abstractarray.jl:1331
in rcopy at /home/colliera/.julia/v0.3/RCall/src/sexp.jl:131
in rcopy at /home/colliera/.julia/v0.3/RCall/src/iface.jl:35
in |> at operators.jl:178
But the call to lm()
was successful and we can still look at the results.
rprint(:fit)
Call:
lm(formula = bwt ~ ., data = MASS::birthwt)
Coefficients:
(Intercept) low age lwt race
3612.51 -1131.22 -6.25 1.05 -100.90
smoke ptl ht ui ftv
-174.12 81.34 -181.95 -336.78 -7.58
You can use R to generate plots with either the base functionality or that provided by libraries like ggplot2 or lattice.
reval("plot(1:10)"); # Will pop up a graphics window...
reval("library(ggplot2)");
rprint("ggplot(MASS::birthwt, aes(x = age, y = bwt)) + geom_point() + theme_classic()")
reval("dev.off()") # ... and close the window.
Watch the videos below for some other perspectives on multi-language programming with Julia. Also check out the complete code for today (including examples with C++, FORTRAN and Python) on github.
]]>
If you’re not too familiar Graph Theory, then it might be an idea to take a moment to get the basics. Graphs are an extremely versatile data structure for storing data consisting of linked entities. I’m going to look at two packages for managing graphs in Julia: LightGraphs and Graphs.
As usual, the first step is to load the package.
using LightGraphs
LightGraphs has methods which generate a selection of standard graphs like StarGraph()
, WheelGraph()
and FruchtGraph()
. There are also functions for random graphs, for example, erdos_renyi()
and watts_strogatz()
. We’ll start off by creating two small graphs. One will have 10 nodes connected by 20 random edges. The other will be a directed star graph consisting of four nodes, the central node being connected to every other node.
g1 = Graph(10, 20)
{10, 20} undirected graph
g2 = StarDiGraph(4)
{4, 3} directed graph
edges(g2)
Set{Pair{Int64,Int64}}({edge 1 - 2,edge 1 - 4,edge 1 - 3})
It’s simple to find the degree and neighbours of a given node.
degree(g1, 4) # How many neighbours for vertex 4?
6
neighbors(g1, 4) # Find neighbours of vertex 4
6-element Array{Int64,1}:
1
3
6
2
9
7
There’s a straightforward means to add and remove edges from the graph.
add_edge!(g1, 4, 8) # Add edge between vertices 4 and 8
edge 4 - 8
rem_edge!(g1, 4, 6) # Remove edge between vertices 4 and 6
edge 6 - 4
The package has functionality for performing high level tests on the graph (checking, for instance, whether it is cyclic or connected). There’s also support for path based algorithms, but we’ll dig into those when we look at the Graphs package.
Before we get started with the Graphs package you might want to restart your Julia session to purge all of that LightGraphs goodness. Take a moment to browse the Graphs.jl documentation, which is very comprehensive.
using Graphs
As with LightGraphs, there are numerous options for generating standard graphs.
g1a = simple\_frucht\_graph()
Undirected Graph (20 vertices, 18 edges)
```julia
g1b = simple\_star\_graph(8)
Directed Graph (8 vertices, 7 edges)
g1c = simple\_wheel\_graph(8)
Directed Graph (8 vertices, 14 edges)
Graphs uses the GraphViz library to generate plots.
plot(g1a)
Of course, a graph can also be constructed manually.
g2 = simple_graph(4)
Directed Graph (4 vertices, 0 edges)
add_edge!(g2, 1, 2)
edge [1]: 1 - 2
add_edge!(g2, 1, 3)
edge [2]: 1 - 3
add_edge!(g2, 2, 3)
edge [3]: 2 - 3
Individual vertices (a vertex is the same as a node) can be interrogated. Since we are considering a directed graph we look separately at the edges exiting and entering a node.
num_vertices(g2)
4
vertices(g2)
1:4
out_degree(1, g2)
2
out_edges(1, g2)
2-element Array{Edge{Int64},1}:
edge [1]: 1 - 2
edge [2]: 1 - 3
in_degree(2, g2)
1
in_edges(2, g2)
1-element Array{Edge{Int64},1}:
edge [1]: 1 - 2
Vertices can be created with labels and attributes.
V1 = ExVertex(1, "V1");
V1.attributes["size"] = 5.0
5.0
V2 = ExVertex(2, "V2");
V2.attributes["size"] = 3.0
3.0
V3 = ExVertex(3, "V3")
vertex [3] "V3"
Those vertices can then be used to define edges, which in turn can have labels and attributes.
E1 = ExEdge(1, V1, V2)
edge [1]: vertex [1] "V1" - vertex [2] "V2"
E1.attributes["distance"] = 50
50
E1.attributes["color"] = "green"
"green"
Finally the collection of vertices and edges can be gathered into a graph.
g3 = edgelist([V1, V2], [E1], is_directed = true)
Directed Graph (2 vertices, 1 edges)
It’s possible to systematically visit all connected vertices in a graph, applying an operation at every vertex. traverse_graph()
performs the graph traversal using either a depth first or breadth first algorithm. In the sample code below the operation applied at each vertex is LogGraphVisitor()
, which is a simple logger.
traverse_graph(g1c, DepthFirst(), 1, LogGraphVisitor(STDOUT))
discover vertex: 1
examine neighbor: 1 -> 2 (vertexcolor = 0, edgecolor= 0)
discover vertex: 2
open vertex: 2
examine neighbor: 2 -> 3 (vertexcolor = 0, edgecolor= 0)
discover vertex: 3
open vertex: 3
examine neighbor: 3 -> 4 (vertexcolor = 0, edgecolor= 0)
discover vertex: 4
open vertex: 4
examine neighbor: 4 -> 5 (vertexcolor = 0, edgecolor= 0)
discover vertex: 5
open vertex: 5
examine neighbor: 5 -> 6 (vertexcolor = 0, edgecolor= 0)
discover vertex: 6
open vertex: 6
examine neighbor: 6 -> 7 (vertexcolor = 0, edgecolor= 0)
discover vertex: 7
open vertex: 7
examine neighbor: 7 -> 8 (vertexcolor = 0, edgecolor= 0)
discover vertex: 8
open vertex: 8
examine neighbor: 8 -> 2 (vertexcolor = 1, edgecolor= 0)
close vertex: 8
close vertex: 7
close vertex: 6
close vertex: 5
close vertex: 4
close vertex: 3
close vertex: 2
examine neighbor: 1 -> 3 (vertexcolor = 2, edgecolor= 0)
examine neighbor: 1 -> 4 (vertexcolor = 2, edgecolor= 0)
examine neighbor: 1 -> 5 (vertexcolor = 2, edgecolor= 0)
examine neighbor: 1 -> 6 (vertexcolor = 2, edgecolor= 0)
examine neighbor: 1 -> 7 (vertexcolor = 2, edgecolor= 0)
examine neighbor: 1 -> 8 (vertexcolor = 2, edgecolor= 0)
close vertex: 1
We can use Dijkstra’s Algorithm to calculate the distance from a given vertex to all other vertices in the graph. We see, for instance, that the distance from vertex 1 to vertex 4 is three steps. Since vertex 1 and vertex 20 are not connected, the distance between them is infinite. There are a couple of other algorithms available for calculating shortest paths.
distances = ones(num_edges(g1a)); # Assign distance of 1 to each edge.
d = dijkstra\_shortest\_paths(g1a, distances, 1);
d.dists # Vector of distances to all other vertices.
20-element Array{Float64,1}:
0.0
1.0
2.0
3.0
3.0
2.0
1.0
1.0
3.0
4.0
2.0
2.0
Inf
Inf
Inf
Inf
Inf
Inf
Inf
Inf
As with the most of the packages that I have looked at already, the functionality summarised above is just a small subset of what’s available. Have a look at the home pages for these packages and check out the full code for today (which looks at a number of other features) on github. Some time in the future I plan on looking at the EvolvingGraphs which caters for graphs where the structure changes with time.
Although Julia has integrated support for various data structures (arrays, tuples, dictionaries, sets), it doesn’t exhaust the full gamut of ptions. More exotic structures (like queues and deques, stacks, counters, heaps, tries and variations on sets and dictionaries) are implemented in the DataStructures package.
As always we start by loading the required package.
julia> using DataStructures
I won’t attempt to illustrate all structures offered by the package (that would make for an absurdly dull post), but focus instead on queues and counters. The remaining types are self-explanatory and well illustrated in the package documentation.
Let’s start off with a queue. The data type being queued must be specified at instantiation. We’ll make a queue which can hold items of Any
type. Can’t get more general than that.
queue = Queue(Any);
The rules of a queue are such that new items are always added to the back. Adding items is done with enqueue!()
.
enqueue!(queue, "First in.");
for n in [2:4]; enqueue!(queue, n); end
enqueue!(queue, "Last in.")
Queue{Deque{Any}}(Deque [{"First in.",2,3,4,"Last in."}])
length(queue)
5
The queue now holds five items. We can take a look at the items at the front and back of the queue using front()
and back()
. Note that indexing does not work on a queue (that would violate the principles of queuing!).
front(queue)
"First in."
back(queue)
"Last in."
Finally we can remove items from the front of the queue using dequeue!()
. The queue implements FIFO.
dequeue!(queue)
"First in."
The counter()
function returns an Accumulator
object, which is used to assemble item counts.
cnt = counter(ASCIIString)
Accumulator{ASCIIString,Int64}(Dict{ASCIIString,Int64}())
Using a Noah’s Ark example we’ll count the instances of different types of domestic animals.
push!(cnt, "dog") # Add 1 dog
1
push!(cnt, "cat", 3) # Add 3 cats
3
push!(cnt, "cat") # Add another cat (returns current count)
4
push!(cnt, "mouse", 5) # Add 5 mice
5
Let’s see what the counter looks like now.
cnt
Accumulator{ASCIIString,Int64}(["mouse"=>5,"cat"=>4,"dog"=>1])
We can return (and remove) the count for a particular item using pop!()
.
pop!(cnt, "cat")
4
cnt["cat"] # How many cats do we have now? All gone.
0
And simply accessing the count for an item is done using []
indexing notation.
cnt["mouse"] # But we still have a handful of mice.
5
I’ve just finished reading through the second early access version of Julia in Action by Chris von Csefalvay. In the chapter on Strings the author present a nice example in which he counts the times each character speaks in Shakespeare’s Hamlet. I couldn’t help but think that this would’ve been even more elegant using an Accumulator
.
Tomorrow we’ll take a look at an extremely useful data structure: a graph. Until then, feel free to check out the full code for today on github.
Sudoku-as-a-Service is a great illustration of Julia’s integer programming facilities. Julia has several packages which implement various flavours of optimisation: JuMP, JuMPeR, Gurobi, CPLEX, DReal, CoinOptServices and OptimPack. We’re not going to look at anything quite as elaborate as Sudoku today, but focus instead on finding the extrema in some simple (or perhaps not so simple) mathematical functions. At this point you might find it interesting to browse through this catalog of test functions for optimisation.
We’ll start out by using the Optim package to find extrema in Himmelblau’s function:
$$ f(x, y) = (x^2+y-11)^2 + (x+y^2-7)^2. $$
This function has one maximum and four minima. One of the minima is conveniently located at
$$ (x, y) = (3, 2). $$
As usual the first step is to load the required package.
using Optim
Then we set up the objective function along with its gradient and Hessian functions.
function himmelblau(x::Vector)
(x[1]^2 + x[2] - 11)^2 + (x[1] + x[2]^2 - 7)^2
end
himmelblau (generic function with 1 method)
function himmelblau_gradient!(x::Vector, gradient::Vector)
gradient[1] = 4 \* x[1] \* (x[1]^2 + x[2] - 11) + 2 * (x[1] + x[2]^2 - 7)
gradient[2] = 2 \* (x[1]^2 + x[2] - 11) + 4 \* x[2] * (x[1] + x[2]^2 - 7)
end
himmelblau_gradient! (generic function with 1 method)
function himmelblau_hessian!(x::Vector, hessian::Matrix)
hessian[1, 1] = 4 \* (x[1]^2 + x[2] - 11) + 8 \* x[1]^2 + 2
hessian[1, 2] = 4 \* x[1] + 4 \* x[2]
hessian[2, 1] = 4 \* x[1] + 4 \* x[2]
hessian[2, 2] = 4 \* (x[1] + x[2]^2 - 7) + 8 \* x[2]^2 + 2
end
himmelblau_hessian! (generic function with 1 method)
There are a number of algorithms at our disposal. We’ll start with the Nelder Mead method which only uses the objective function itself. I am very happy with the detailed output provided by the optimize()
function and clearly it converges on a result which is very close to what we expected.
optimize(himmelblau, [2.5, 2.5], method = :nelder_mead)
Results of Optimization Algorithm
* Algorithm: Nelder-Mead
* Starting Point: [2.5,2.5]
* Minimum: [3.0000037281643586,2.0000105449945313]
* Value of Function at Minimum: 0.000000
* Iterations: 35
* Convergence: true
* |x - x'| < NaN: false
* |f(x) - f(x')| / |f(x)| < 1.0e-08: true
* |g(x)| < NaN: false
* Exceeded Maximum Number of Iterations: false
* Objective Function Calls: 69
* Gradient Call: 0
Next we’ll look at the limited-memory version of the BFGS algorithm. This can be applied either with or without an explicit gradient function. In this case we’ll provide the gradient function defined above. Again we converge on the right result, but this time with far fewer iterations required.
optimize(himmelblau, himmelblau_gradient!, [2.5, 2.5], method = :l_bfgs)
Results of Optimization Algorithm
* Algorithm: L-BFGS
* Starting Point: [2.5,2.5]
* Minimum: [2.999999999999385,2.0000000000001963]
* Value of Function at Minimum: 0.000000
* Iterations: 6
* Convergence: true
* |x - x'| < 1.0e-32: false
* |f(x) - f(x')| / |f(x)| < 1.0e-08: false
* |g(x)| < 1.0e-08: true
* Exceeded Maximum Number of Iterations: false
* Objective Function Calls: 25
* Gradient Call: 25
Finally we’ll try out Newton’s method, where we’ll provide both gradient and Hessian functions. The result is spot on and we’ve shaved off one iteration. Very nice indeed!
optimize(himmelblau, himmelblau_gradient!, himmelblau_hessian!, [2.5, 2.5], method = :newton)
Results of Optimization Algorithm
* Algorithm: Newton's Method
* Starting Point: [2.5,2.5]
* Minimum: [3.0,2.0]
* Value of Function at Minimum: 0.000000
* Iterations: 5
* Convergence: true
* |x - x'| < 1.0e-32: false
* |f(x) - f(x')| / |f(x)| < 1.0e-08: true
* |g(x)| < 1.0e-08: true
* Exceeded Maximum Number of Iterations: false
* Objective Function Calls: 19
* Gradient Call: 19
There is also a Simulated Annealing solver in the Optim package.
NLopt is an optimisation library with interfaces for a variety of programming languages. NLopt offers a variety of optimisation algorithms. We’ll apply both a gradient-based and a derivative-free technique to maximise the function
$$ \sin\alpha \cos\beta $$
subject to the constraints
$$ 2 \alpha \leq \beta $$
and
$$ \beta \leq \pi/2. $$
Before we load the NLopt package, it’s a good idea to restart your Julia session to flush out any remnants of the Optim package.
using NLopt
We’ll need to write the objective function and a generalised constraint function.
count = 0;
function objective(x::Vector, grad::Vector)
if length(grad) > 0
grad[1] = cos(x[1]) * cos(x[2])
grad[2] = - sin(x[1]) * sin(x[2])
end
global count
count::Int += 1
println("Iteration $count: $x")
sin(x[1]) * cos(x[2])
end
objective (generic function with 1 method)
function constraint(x::Vector, grad::Vector, a, b, c)
if length(grad) > 0
grad[1] = a
grad[2] = b
end
a * x[1] + b * x[2] - c
end
constraint (generic function with 1 method)
The COBYLA (Constrained Optimization BY Linear Approximations) algorithm is a local optimiser which doesn’t use the gradient function.
opt = Opt(:LN_COBYLA, 2); # Algorithm and dimension of problem
ndims(opt)
2
algorithm(opt)
:LN_COBYLA
algorithm_name(opt) # Text description of algorithm
"COBYLA (Constrained Optimization BY Linear Approximations) (local, no-derivative)"
We impose generous upper and lower bounds on the solution space and use two inequality constraints. Either min_objective!()
or max_objective!()
is used to specify the objective function and whether or not it is a minimisation or maximisation problem. Constraints can be either inequalities using inequality_constraint!()
or equalities using equality_constraint!()
.
lower_bounds!(opt, [0., 0.])
upper_bounds!(opt, [pi, pi])
xtol_rel!(opt, 1e-6)
max_objective!(opt, objective)
inequality_constraint!(opt, (x, g) -> constraint(x, g, 2, -1, 0), 1e-8)
inequality_constraint!(opt, (x, g) -> constraint(x, g, 0, 2, pi), 1e-8)
After making an initial guess we let the algorithm loose. I’ve purged some of the output to spare you from the floating point deluge.
initial = [0, 0]; # Initial guess
(maxf, maxx, ret) = optimize(opt, initial)
Iteration 1: [0.0,0.0]
Iteration 2: [0.7853981633974483,0.0]
Iteration 3: [0.7853981633974483,0.7853981633974483]
Iteration 4: [0.0,0.17884042066163552]
Iteration 5: [0.17562036827601815,0.3512407365520363]
Iteration 6: [0.5268611048280544,1.053722209656109]
Iteration 7: [0.7853981633974481,1.5707963267948961]
Iteration 8: [0.7526175675681757,0.9963866471510139]
Iteration 9: [0.785398163397448,1.570796326794896]
Iteration 10: [0.35124073655203625,0.7024814731040726]
.
.
.
Iteration 60: [0.42053333513020824,0.8410666702604165]
Iteration 61: [0.42053467500728553,0.8410693500145711]
Iteration 62: [0.4205360148843628,0.8410720297687256]
Iteration 63: [0.4205340050687469,0.8410680101374938]
Iteration 64: [0.4205340249920041,0.8410677994554656]
Iteration 65: [0.42053333513020824,0.8410666702604165]
Iteration 66: [0.42053456716611504,0.8410679945560181]
Iteration 67: [0.42053333513020824,0.8410666702604165]
Iteration 68: [0.42053365382801033,0.8410673076560207]
(0.27216552697496077,[0.420534,0.841067],:XTOL_REACHED)
println("got $maxf at $maxx after $count iterations.")
got 0.27216552697496077 at [0.42053365382801033,0.8410673076560207] after 68 iterations.
It takes a number of iterations to converge, but arrives at a solution which seems eminently reasonable (and which satisfies both of the constraints).
Next we’ll use the MMA (Method of Moving Asymptotes) gradient-based algorithm.
opt = Opt(:LD_MMA, 2);
We remove the second inequality constraint and simply confine the solution space appropriately. This is definitely a more efficient approach!
lower_bounds!(opt, [0., 0.])
upper_bounds!(opt, [pi, pi / 2])
xtol_rel!(opt, 1e-6)
max_objective!(opt, objective)
inequality_constraint!(opt, (x, g) -> constraint(x, g, 2, -1, 0), 1e-8)
This algorithm converges more rapidly (because it takes advantage of the gradient function!) and we arrive at the same result.
(maxf, maxx, ret) = optimize(opt, initial)
Iteration 1: [0.0,0.0]
Iteration 2: [0.046935706114911574,0.12952531487499092]
Iteration 3: [0.1734128499487191,0.5065804625164063]
Iteration 4: [0.3449211909390502,0.7904095832845456]
Iteration 5: [0.4109653874949588,0.8281977630709889]
Iteration 6: [0.41725447118163134,0.8345944447401356]
Iteration 7: [0.4188068871033356,0.8376261095301502]
Iteration 8: [0.4200799333613666,0.8401670014914709]
Iteration 9: [0.4203495290598476,0.8406993867808531]
Iteration 10: [0.4205138682235357,0.8410278412850836]
Iteration 11: [0.4205289336960578,0.8410578710185219]
Iteration 12: [0.42053231747822034,0.8410646372592685]
Iteration 13: [0.42053444274035756,0.8410688833806734]
Iteration 14: [0.4205343574933894,0.8410687141629858]
Iteration 15: [0.4205343707980632,0.8410687434944638]
Iteration 16: [0.420534312041705,0.8410686169530415]
Iteration 17: [0.4205343317839936,0.8410686604482764]
Iteration 18: [0.42053433111342814,0.8410686565253115]
Iteration 19: [0.42053433035398824,0.8410686525997696]
(0.27216552944315736,[0.420534,0.841069],:XTOL_REACHED)
println("got $maxf at $maxx after $count iterations.")
got 0.27216552944315736 at [0.42053433035398824,0.8410686525997696] after 19 iterations.
I’m rather impressed. Both of these packages provide convenient interfaces and I could solve my test problems without too much effort. Have a look at the videos below for more about optimisation in Julia and check out github for the complete code for today’s examples. We’ll kick off next week with a quick look at some alternative data structures.
]]>Yesterday we had a look at Julia’s support for Calculus. The next logical step is to solve some differential equations. We’ll look at two packages today: Sundials and ODE.
The Sundials
package is based on a library which implements a number of solvers for differential equations. First off you’ll need to install that library. In Ubuntu this is straightforward using the package manager. Alternatively you can download the source distribution.
sudo apt-get install libsundials-serial-dev
Next install the Julia package and load it.
julia> Pkg.add("Sundials")
julia> using Sundials
To demonstrate we’ll look at a standard “textbook” problem: a damped harmonic oscillator (mass on a spring with friction). This is a second order differential equation with general form
$$ \ddot{x} + a \dot{x} + b x = 0 $$
where \(x\) is the displacement of the oscillator, while \(a\) and \(b\) characterise the damping coefficient and spring stiffness respectively. To solve this numerically we need to convert it into a system of first order equations:
$$
\begin{aligned}
\dot{x} &= v \ \dot{v} &= - a v - b x
\end{aligned}
$$
We’ll write a function for those relationships and assign specific values to \(a\) and \(b\).
function oscillator(t, y, ydot)
ydot[1] = y[2]
ydot[2] = - 3 * y[1] - y[2] / 10
end
oscillator (generic function with 2 methods)
Next the initial conditions and time steps for the solution.
initial = [1.0, 0.0]; # Initial conditions
t = float([0:0.125:30]); # Time steps
And finally use cvode()
to integrate the system.
xv = Sundials.cvode(oscillator, initial, t);
xv[1:5,:]
5x2 Array{Float64,2}:
1.0 0.0
0.97676 -0.369762
0.908531 -0.717827
0.799076 -1.02841
0.65381 -1.28741
The results for the first few time steps look reasonable: the displacement (left column) is decreasing and the velocity (right column) is becoming progressively more negative. To be sure that the solution has the correct form, have a look at the Gadfly plot below. The displacement (black) and velocity (blue) curves are 90° out of phase, as expected, and both gradually decay with time due to damping. Looks about right to me!
The ODE
package provides a selection of solvers, all of which are implemented with a consistent interface (which differs a bit from Sundials).
using ODE
Again we need to define a function to characterise our differential equations. The form of the function is a little different with the ODE package: rather than passing the derivative vector by reference, it’s simply returned as the result. I’ve consider the same problem as above, but to spice things up I added a sinusoidal driving force.
function oscillator(t, y)
[y[2]; - a * y[1] - y[2] / 10 + sin(t)]
end
oscillator (generic function with 2 methods)
We’ll solve this with ode23()
, which is a second order adaptive solver with third order error control. Because it’s adaptive we don’t need to explicitly specify all of the time steps, just the minimum and maximum.
a = 1; # Resonant
T, xv = ode23(oscillator, initial, [0.; 40]);
xv = hcat(xv...).'; # Vector{Vector{Float}} -> Matrix{Float}
The results are plotted below. Driving the oscillator at the resonant frequency causes the amplitude of oscillation to grow with time as energy is transferred to the oscillating mass.
If we move the oscillator away from resonance the behavior becomes rather interesting.
a = 3; # Far from resonance
Now, because the oscillation and the driving force aren’t synchronised (and there’s a non-rational relationship between their frequencies) the displacement and velocity appear to change irregularly with time.
How about a double pendulum (a pendulum with a second pendulum suspended from its end)? This seemingly simple system exhibits a rich range of dynamics. It’s behaviour is sensitive to initial conditions, one of the characteristics of chaotic systems.
First we set up the first order equations of motion. The details of this system are explained in the video below.
function pendulum(t, y)
Y = [
6 * (2 * y[3] - 3 * cos(y[1] - y[2]) * y[4]) / (16 - 9 * cos(y[1] - y[2])^2);
6 * (8 * y[4] - 3 * cos(y[1] - y[2]) * y[3]) / (16 - 9 * cos(y[1] - y[2])^2)
]
[
Y[1];
Y[2];
- (Y[1] * Y[2] * sin(y[1] - y[2]) + 3 * sin(y[1])) / 2;
- (sin(y[2]) - Y[1] * Y[2] * sin(y[1] - y[2])) / 2;
]
end
pendulum (generic function with 1 method)
Define initial conditions and let it run…
initial = [pi / 4, 0, 0, 0]; # Initial conditions -> deterministic behaviour
T, xv = ode23(pendulum, initial, [0.; 40]);
Below are two plots which show the results. The first is a time series showing the angular displacement of the first (black) and second (blue) mass. Next is a phase space plot which shows a different view of the same variables. It’s clear to see that there is a regular systematic relationship between them.
Next we’ll look at a different set of initial conditions. This time both masses are initially located above the primary vertex of the pendulum. This represents an initial configuration with much more potential energy.
initial = [3/4 * pi, pi, 0, 0]; # Initial conditions -> chaotic behaviour
The same pair of plots now illustrate much more interesting behaviour. Note the larger range of angles, θ_{2}, achieved by the second bob. With these initial conditions the pendulum is sufficiently energetic for it to “flip over”. Look at the video below to get an idea of what this looks like with a real pendulum.
It’s been a while since I’ve played with any Physics problems. That was fun. The full code for today is available at github. Come back tomorrow when I’ll take a look at Optimisation in Julia.
]]>Mathematica is the de facto standard for symbolic differentiation and integration. But many other languages also have great facilities for Calculus. For example, R has the deriv()
function in the base stats
package as well as the numDeriv, Deriv and Ryacas packages. Python has NumPy and SymPy.
Let’s check out what Julia has to offer.
First load the Calculus package.
using Calculus
The derivative() function will evaluate the numerical derivative at a specific point.
derivative(x -> sin(x), pi)
-0.9999999999441258
derivative(sin, pi, :central) # Options: :forward, :central or :complex
-0.9999999999441258
There’s also a prime notation which will do the same thing (but neatly handle higher order derivatives).
f(x) = sin(x);
f'(0.0) # cos(x)
0.9999999999938886
f''(0.0) # -sin(x)
0.0
f'''(0.0) # -cos(x)
-0.9999977482682358
There are functions for second derivatives, gradients (for multivariate functions) and Hessian matrices too. Related packages for derivatives are ForwardDiff and ReverseDiffSource.
Symbolic differentiation works for univariate and multivariate functions expressed as strings.
differentiate("sin(x)", :x)
:(cos(x))
differentiate("sin(x) + exp(-y)", [:x, :y])
2-element Array{Any,1}:
:(cos(x))
:(-(exp(-y)))
It also works for expressions.
differentiate(:(x^2 \* y \* exp(-x)), :x)
:((2x) \* y \* exp(-x) + x^2 \* y \* -(exp(-x)))
differentiate(:(sin(x) / x), :x)
:((cos(x) * x - sin(x)) / x^2)
Have a look at the JuliaDiff project which is aggregating resources for differentiation in Julia.
Numerical integration is a snap.
integrate(x -> 1 / (1 - x), -1 , 0)
0.6931471805602638
Compare that with the analytical result. Nice.
diff(map(x -> - log(1 - x), [-1, 0]))
1-element Array{Float64,1}:
0.693147
By default the integral is evaluated using Simpson’s Rule. However, we can also use Monte Carlo integration.
integrate(x -> 1 / (1 - x), -1 , 0, :monte_carlo)
0.6930203819567551
There is also an interface to the Sympy Python library for symbolic computation. You might want to restart your Julia session before loading the SymPy package.
using Sympy
Revisiting the same definite integral from above we find that we now have an analytical expression as the result.
integrate(1 / (1 - x), (x, -1, 0))
log(2)
convert(Float64, ans)
0.6931471805599453
To perform symbolic integration we need to first define a symbolic object using Sym()
.
x = Sym("x"); # Creating a "symbolic object"
typeof(x)
Sym (constructor with 6 methods)
sin(x) |> typeof # f(symbolic object) is also a symbolic object
Sym (constructor with 6 methods)
There’s more to be said about symbolic objects (they are the basis of pretty much everything in SymPy), but we are just going to jump ahead to constructing a function and integrating it.
f(x) = cos(x) - sin(x) * cos(x);
integrate(f(x), x)
2
sin (x)
- ─────── + sin(x)
2
What about an integral with constant parameters? No problem.
k = Sym("k");
integrate(1 / (x + k), x)
log(k + x)
We have really only grazed the surface of SymPy. The capabilities of this package are deep and broad. Seriously worthwhile checking out the documentation if you are interested in symbolic computation.
I’m not ready to throw away my dated version of Mathematica just yet, but I’ll definitely be using this functionality often. Come back tomorrow when I’ll take a look at solving differential equations with Julia.
]]>The packages we’ll be looking at today should bring joy to the hearts of all Physical Scientists. Actually they should make any flavour of Scientist happy.
It is natural for man to relate the units of distance by which he travels to the dimensions of the globe that he inhabits. Thus, in moving about the earth, he may know by the simple denomination of distance its proportion to the whole circuit of the earth. This has the further advantage of making nautical and celestial measurements correspond. The navigator often needs to determine, one from the other, the distance he has traversed from the celestial arc lying between the zeniths at his point of departure and at his destination. It is important, therefore, that one of these magnitudes should be the expression of the other, with no difference except in the units. But to that end, the fundamental linear unit must be an aliquot part of the terrestrial meridian. ... Thus, the choice of the metre was reduced to that of the unity of angles. Pierre-Simon Laplace
The SIUnits
package provides unit-checked operations for quantities expressed in SI units.
using SIUnits
using SIUnits.ShortUnits
It supports both long and short forms of units and all the expected arithmetic operations.
1KiloGram + 2kg
3 kg
4Meter - 2m
2 m
4m / 2s
2.0 m s⁻¹
Note that it only recognises the American spelling of “meter” and not the (IMHO correct) “metre”! But this is a small matter. And I don’t want to engage in any religious wars.
Speaking of small matters, it’s possible to define new units of measure. Below we’ll define the micron and Angstrom along with their conversion functions.
import Base.convert
Micron = SIUnits.NonSIUnit{typeof(Meter),:µm}()
µm
convert(::Type{SIUnits.SIQuantity},::typeof(Micron)) = Micro*Meter
convert (generic function with 461 methods)
Angstrom = SIUnits.NonSIUnit{typeof(Meter),:Å}()
Å
convert(::Type{SIUnits.SIQuantity},::typeof(Angstrom)) = Nano/10*Meter
convert (generic function with 462 methods)
And now we can freely use these new units in computations.
5Micron
5 µm
1Micron + 1m
1000001//1000000 m
5200Angstrom # Green light
5200 Å
Read on below to find out about the Physical
package.
The Physical
package is documented here. Apparently it’s not as performant as SIUnits
but it does appear to have a wider scope of functionality. We’ll use it to address an issue raised on Day 17: converting between Imperial and Metric units.
Let’s kick off by loading the package.
using Physical
There’s a lot of functionality available, but we are going to focus on just one thing: converting pounds and inches into kilograms and metres. First we define a pair of derived units. To do this, of course, we need to know the appropriate conversion factors!
Inch = DerivedUnit("in", 0.0254*Meter)
1 in
Pound = DerivedUnit("lb", 0.45359237*Kilogram)
1 lb
We can then freely change the average heights and weights that we saw earlier from Imperial to Metric units.
asbase(66Inch)
1.6764 m
asbase(139Pound)
63.04933943 kg
On a related note I’ve just put together a package of physical constants for Julia.
using PhysicalConstants
PhysicalConstants.MKS.SpeedOfLight
2.99792458e8
PhysicalConstants.MKS.Teaspoon
4.92892159375e-6
Did you know that a teaspoon was 4.92892 millilitres? There I was, wallowing in my ignorance, thinking that it was 5 millilitres. Pfffft. Silly me. There are
Units can be a contentious issue. Watch the video below to see what Richard Feynman had to say about the profusion of units used by Physicists to measure energy. Also check out the full code for today along with the index to the entire series of #MonthOfJulia posts on github.
For those who want some proof that physicists are human, the proof is in the idiocy of all the different units which they use for measuring energy. Richard P. Feynman]]>
There’s a variety of options for plotting in Julia. We’ll focus on those provided by Gadfly
and Plotly
.
Gadfly is the flavour of the month for plotting in Julia. It’s based on the Grammar of Graphics, so users of ggplot2 should find it familiar.
To start using Gadfly we’ll first need to load the package. To enable generation of PNG, PS, and PDF output we’ll also want the Cairo
package.
using Gadfly
using Cairo
You can easily generate plots from data vectors or functions.
plot(x = 1:100, y = cumsum(rand(100) - 0.5), Geom.point, Geom.smooth)
plot(x -> x^3 - 9x, -5, 5)
Gadfly plots are by default rendered onto a new tab in your browser. These plots are mildly interactive: you can zoom and pan across the plot area. You can also save plots directly to files of various formats.
dampedsin = plot([x -> sin(x) / x], 0, 50)
draw(PNG("damped-sin.png", 800px, 400px), dampedsin)
Let’s load up some data from the nlschools
dataset in R’s MASS
package and look at the relationship between language score test and IQ for pupils broken down according to whether or not they are in a mixed-grade class.
using RDatasets
plot(dataset("MASS", "nlschools"), x="IQ", y="Lang", color="COMB",
Geom.point, Geom.smooth(method=:lm), Guide.colorkey("Multi-Grade"))
Those two examples just scratched the surface. Gadfly can produce histograms, boxplots, ribbon plots, contours and violin plots. There’s detailed documentation with numerous examples on the homepage.
Watch the video below (Daniel Jones at JuliaCon 2014) then read on about Plotly.
The Plotly
package provides a complete interface to plot.ly, an online plotting service with interfaces for Python, R, MATLAB and now Julia. To get an idea of what’s possible with plot.ly, check out their feed. The first step towards making your own awesomeness with be loading the package.
using Plotly
Next you should set up your plot.ly credentials using Plotly.set_credentials_file()
. You only need to do this once because the values will be cached.
Data series are stored in Julia dictionaries.
p1 = ["x" => 1:10, "y" => rand(0:20, 10), "type" => "scatter", "mode" => "markers"];
p2 = ["x" => 1:10, "y" => rand(0:20, 10), "type" => "scatter", "mode" => "lines"];
p3 = ["x" => 1:10, "y" => rand(0:20, 10), "type" => "scatter", "mode" => "lines+markers"];
Plotly.plot([p1, p2, p3], ["filename" => "basic-line", "fileopt" => "overwrite"])
Dict{String,Any} with 5 entries:
"error" => ""
"message" => ""
"warning" => ""
"filename" => "basic-line"
"url" => "https://plot.ly/~collierab/17"
You can either open the URL provided in the result dictionary or do it programatically:
Plotly.openurl(ans["url"])
By making small jumps through similar hoops it’s possible to create some rather intricate visualisations like the 3D scatter plot below. For details of how that was done, check out my code on github.
That was a static version of the plot. However, one of the major perks of Plotly is that the plots are interactive. Plus you can embed them in your site and it will, in turn, benefit from the interactivity. Feel free to interact vigorously with the plot below.
There’s also a fledgling interface to Google Charts.
Obviously plotting and visualisation in Julia are hot topics. Other plotting packages worth checking out are PyPlot
, Winston
and Gaston
. Come back tomorrow when we’ll take a look at using physical units in Julia.
The package can be installed directly from its github repository:
Pkg.clone("https://github.com/DataWookie/PhysicalConstants.jl")
Usage is pretty straightforward. Start off by loading the package.
julia> using PhysicalConstants
Now, for example, access Earth’s gravitational acceleration in MKS units.
julia> PhysicalConstants.MKS.GravAccel
9.80665
Or in CGS units.
julia> PhysicalConstants.CGS.GravAccel
980.665
Or, finally, in Imperial units.
julia> PhysicalConstants.Imperial.GravAccel
32.174049
R has an extensive range of builtin datasets, which are useful for experimenting with the language. The RDatasets
package makes many of these available within Julia. We’ll see another way of accessing R’s datasets in a couple of days' time too. In the meantime though, check out the documentation for RDatasets
and then read on below.
As always, the first thing that we need to do is load the package.
using RDatasets
We can get a list of the R packages which are supported by RDatasets
.
RDatasets.packages()
33x2 DataFrame
| Row | Package | Title |
|-----|----------------|---------------------------------------------------------------------------|
| 1 | "COUNT" | "Functions, data and code for count data." |
| 2 | "Ecdat" | "Data sets for econometrics" |
| 3 | "HSAUR" | "A Handbook of Statistical Analyses Using R (1st Edition)" |
| 4 | "HistData" | "Data sets from the history of statistics and data visualization" |
| 5 | "ISLR" | "Data for An Introduction to Statistical Learning with Applications in R" |
| 6 | "KMsurv" | "Data sets from Klein and Moeschberger (1997), Survival Analysis" |
| 7 | "MASS" | "Support Functions and Datasets for Venables and Ripley's MASS" |
| 8 | "SASmixed" | "Data sets from \"SAS System for Mixed Models\"" |
| 9 | "Zelig" | "Everyone's Statistical Software" |
| 10 | "adehabitatLT" | "Analysis of Animal Movements" |
| 11 | "boot" | "Bootstrap Functions (Originally by Angelo Canty for S)" |
| 12 | "car" | "Companion to Applied Regression" |
| 13 | "cluster" | "Cluster Analysis Extended Rousseeuw et al." |
| 14 | "datasets" | "The R Datasets Package" |
| 15 | "gap" | "Genetic analysis package" |
| 16 | "ggplot2" | "An Implementation of the Grammar of Graphics" |
| 17 | "lattice" | "Lattice Graphics" |
| 18 | "lme4" | "Linear mixed-effects models using Eigen and S4" |
| 19 | "mgcv" | "Mixed GAM Computation Vehicle with GCV/AIC/REML smoothness estimation" |
| 20 | "mlmRev" | "Examples from Multilevel Modelling Software Review" |
| 21 | "nlreg" | "Higher Order Inference for Nonlinear Heteroscedastic Models" |
| 22 | "plm" | "Linear Models for Panel Data" |
| 23 | "plyr" | "Tools for splitting, applying and combining data" |
| 24 | "pscl" | "Political Science Computational Laboratory, Stanford University" |
| 25 | "psych" | "Procedures for Psychological, Psychometric, and Personality Research" |
| 26 | "quantreg" | "Quantile Regression" |
| 27 | "reshape2" | "Flexibly Reshape Data: A Reboot of the Reshape Package." |
| 28 | "robustbase" | "Basic Robust Statistics" |
| 29 | "rpart" | "Recursive Partitioning and Regression Trees" |
| 30 | "sandwich" | "Robust Covariance Matrix Estimators" |
| 31 | "sem" | "Structural Equation Models" |
| 32 | "survival" | "Survival Analysis" |
| 33 | "vcd" | "Visualizing Categorical Data" |
Next we’ll get a list of all datasets supported across all of those R packages. There are a lot of them! Also we see some specific statistics about the number of records and fields in each of them.
sets = RDatasets.datasets();
size(sets)
(733,5)
```julia
head(sets)
6x5 DataFrame
| Row | Package | Dataset | Title | Rows | Columns |
|-----|---------|-------------|-------------|------|---------|
| 1 | "COUNT" | "affairs" | "affairs" | 601 | 18 |
| 2 | "COUNT" | "azdrg112" | "azdrg112" | 1798 | 4 |
| 3 | "COUNT" | "azpro" | "azpro" | 3589 | 6 |
| 4 | "COUNT" | "badhealth" | "badhealth" | 1127 | 3 |
| 5 | "COUNT" | "fasttrakg" | "fasttrakg" | 15 | 9 |
| 6 | "COUNT" | "lbw" | "lbw" | 189 | 10 |
Or we can find out what datasets are available from a particular R package.
RDatasets.datasets("vcd")
31x5 DataFrame
| Row | Package | Dataset | Title | Rows | Columns |
|-----|---------|-------------------|--------------------------------------------|-------|---------|
| 1 | "vcd" | "Arthritis" | "Arthritis Treatment Data" | 84 | 5 |
| 2 | "vcd" | "Baseball" | "Baseball Data" | 322 | 25 |
| 3 | "vcd" | "BrokenMarriage" | "Broken Marriage Data" | 20 | 4 |
| 4 | "vcd" | "Bundesliga" | "Ergebnisse der Fussball-Bundesliga" | 14018 | 7 |
| 5 | "vcd" | "Bundestag2005" | "Votes in German Bundestag Election 2005" | 16 | 6 |
| 6 | "vcd" | "Butterfly" | "Butterfly Species in Malaya" | 24 | 2 |
| 7 | "vcd" | "CoalMiners" | "Breathlessness and Wheeze in Coal Miners" | 32 | 4 |
| 8 | "vcd" | "DanishWelfare" | "Danish Welfare Study Data" | 180 | 5 |
| 9 | "vcd" | "Employment" | "Employment Status" | 24 | 4 |
| 10 | "vcd" | "Federalist" | "'May' in Federalist Papers" | 7 | 2 |
| 11 | "vcd" | "Hitters" | "Hitters Data" | 154 | 4 |
| 12 | "vcd" | "HorseKicks" | "Death by Horse Kicks" | 5 | 2 |
| 13 | "vcd" | "Hospital" | "Hospital data" | 3 | 4 |
| 14 | "vcd" | "JobSatisfaction" | "Job Satisfaction Data" | 8 | 4 |
| 15 | "vcd" | "JointSports" | "Opinions About Joint Sports" | 40 | 5 |
| 16 | "vcd" | "Lifeboats" | "Lifeboats on the Titanic" | 18 | 8 |
| 17 | "vcd" | "NonResponse" | "Non-Response Survey Data" | 12 | 4 |
| 18 | "vcd" | "OvaryCancer" | "Ovary Cancer Data" | 16 | 5 |
| 19 | "vcd" | "PreSex" | "Pre-marital Sex and Divorce" | 16 | 5 |
| 20 | "vcd" | "Punishment" | "Corporal Punishment Data" | 36 | 5 |
| 21 | "vcd" | "RepVict" | "Repeat Victimization Data" | 8 | 9 |
| 22 | "vcd" | "Saxony" | "Families in Saxony" | 13 | 2 |
| 23 | "vcd" | "SexualFun" | "Sex is Fun" | 4 | 5 |
| 24 | "vcd" | "SpaceShuttle" | "Space Shuttle O-ring Failures" | 24 | 6 |
| 25 | "vcd" | "Suicide" | "Suicide Rates in Germany" | 306 | 6 |
| 26 | "vcd" | "Trucks" | "Truck Accidents Data" | 24 | 5 |
| 27 | "vcd" | "UKSoccer" | "UK Soccer Scores" | 5 | 6 |
| 28 | "vcd" | "VisualAcuity" | "Visual Acuity in Left and Right Eyes" | 32 | 4 |
| 29 | "vcd" | "VonBort" | "Von Bortkiewicz Horse Kicks Data" | 280 | 4 |
| 30 | "vcd" | "WeldonDice" | "Weldon's Dice Data" | 11 | 2 |
| 31 | "vcd" | "WomenQueue" | "Women in Queues" | 11 | 2 |
Finally, the most interesting bit: accessing data from a particular dataset. Below we load up the women
dataset from the vcd
package.
women = dataset("datasets", "women")
15x2 DataFrame
| Row | Height | Weight |
|-----|--------|--------|
| 1 | 58 | 115 |
| 2 | 59 | 117 |
| 3 | 60 | 120 |
| 4 | 61 | 123 |
| 5 | 62 | 126 |
| 6 | 63 | 129 |
| 7 | 64 | 132 |
| 8 | 65 | 135 |
| 9 | 66 | 139 |
| 10 | 67 | 142 |
| 11 | 68 | 146 |
| 12 | 69 | 150 |
| 13 | 70 | 154 |
| 14 | 71 | 159 |
| 15 | 72 | 164 |
From these data we learn that the average mass of American women of height 66 inches is around 139 pounds. If you are from a country which uses the Metric system (like me!) then these numbers might seem a little mysterious. Come back in a couple of days and we’ll see how Julia can convert pounds and inches in metres and kilograms.
That’s all for now. Code for today is available on github.
]]>Yesterday we looked at how time series data can be sucked into Julia from Quandl. What happens if your data are sitting in a database? No problem, Julia can handle that too. There are a number of database packages available. I’ll be focusing on SQLite
and ODBC
, but it might be worthwhile checking out JDBC
, LevelDB
and LMDB
too.
SQLite is a lightweight transactional SQL database engine that does not require a server or any major configuration. Installation is straightforward on most platforms.
The first step towards using SQLite from Julia is to load the package.
using SQLite
Next, for illustrative purposes, we’ll create a database (which exists as a single file in the working directory) and add a table which we’ll populate directly from a delimited file.
db = SQLiteDB("passwd.sqlite")
SQLiteDB{UTF8String}("passwd.sqlite",Ptr{Void} @0x00000000059cde38,0)
create(
db,
"passwd",
readdlm("/etc/passwd", ':'),
["username", "password", "UID", "GID", "comment", "homedir", "shell"]
)
1x1 ResultSet
| Row | "Rows Affected" |
|-----|-----------------|
| 1 | 0 |
Then the interesting bit: we execute a simple query.
query(db, "SELECT username, homedir FROM passwd LIMIT 10;")
10x2 ResultSet
| Row | "username" | "homedir" |
|-----|------------|-------------------|
| 1 | "root" | "/root" |
| 2 | "daemon" | "/usr/sbin" |
| 3 | "bin" | "/bin" |
| 4 | "sys" | "/dev" |
| 5 | "sync" | "/bin" |
| 6 | "games" | "/usr/games" |
| 7 | "man" | "/var/cache/man" |
| 8 | "lp" | "/var/spool/lpd" |
| 9 | "mail" | "/var/mail" |
| 10 | "news" | "/var/spool/news" |
Most of the expected SQL operations are supported by SQLite (check the documentation) and hence also by the Julia interface. When we’re done we close the database connection.
close(db)
Of course, the database we created in Julia is now available through the shell too.
ls -l passwd.sqlite
-rw-r-r- 1 colliera colliera 6144 Sep 18 07:21 passwd.sqlite
sqlite3 passwd.sqlite
SQLite version 3.8.7.4 2014-12-09 01:34:36
Enter ".help" for usage hints.
sqlite> pragma table_info(passwd);
0|username|TEXT|0||0
1|password|TEXT|0||0
2|UID|REAL|0||0
3|GID|REAL|0||0
4|comment|TEXT|0||0
5|homedir|TEXT|0||0
6|shell|TEXT|0||0
sqlite>
If you need to access an enterprise DB (for example, Oracle, PostgreSQL, MySQL, Microsoft SQL Server or DB2) then the ODBC interface will be the way to go. To avoid the overhead of using one of these fancy DBs, I will demonstrate Julia’s ODBC functionality using the SQLite database we created above. Before we do that though, you’ll need to setup ODBC for SQLite. It’s not an onerous procedure at all. Then we fire up the ODBC
package and we’re ready to roll.
using ODBC
First we’ll check which drivers are available for ODBC (just SQLite in my case) and what data source names (DSNs) are registered.
listdrivers()
(String["SQLite","SQLite3"],String["Description=SQLite ODBC Driver\0Driver=libsqliteodbc.so\0Setup=libsqliteodbc.so\0UsageCount=1\0","Description=SQLite3 ODBC Driver\0Driver=libsqlite3odbc.so\0Setup=libsqlite3odbc.so\0UsageCount=1\0"])
listdsns()
(String["passwd"],String["SQLite3"])
We see that there is a DSN available for the passwd
database. So we create a connection:
db = ODBC.connect("passwd")
ODBC Connection Object
----------------------
Connection Data Source: passwd
passwd Connection Number: 1
Contains resultset(s)? No
At this point I’d like to execute a query. However, somewhat disappointingly, this doesn’t work. No error message but also no results. I’ve logged an issue with the package maintainer, so hopefully this will be resolved soon.
query("SELECT * FROM passwd LIMIT 5;", db)
0x0 DataFrame
What’s promising though is that I can still retrieve the metadata for that query.
querymeta("SELECT * FROM passwd LIMIT 5;", db)
Resultset metadata for executed query
-------------------------------------
Query: SELECT * FROM passwd LIMIT 5
Columns: 7
Rows: 0
7x5 DataFrame
| Row | Names | Types | Sizes | Digits | Nullable |
|-----|------------|------------------------|-------|--------|----------|
| 1 | "username" | ("SQL_LONGVARCHAR",-1) | 65536 | 0 | 1 |
| 2 | "password" | ("SQL_LONGVARCHAR",-1) | 65536 | 0 | 1 |
| 3 | "UID" | ("SQL_DOUBLE",8) | 54 | 0 | 1 |
| 4 | "GID" | ("SQL_DOUBLE",8) | 54 | 0 | 1 |
| 5 | "comment" | ("SQL_LONGVARCHAR",-1) | 65536 | 0 | 1 |
| 6 | "homedir" | ("SQL_LONGVARCHAR",-1) | 65536 | 0 | 1 |
| 7 | "shell" | ("SQL_LONGVARCHAR",-1) | 65536 | 0 | 1 |
Again, when we’re done, we close the database connection.
disconnect(db)
We’re now covered a number of means for getting data into Julia. Over the next few days we’ll be looking at Julia’s capabilities for analysing data. Stay tuned. In the meantime you can check out the code for today (and previous days) on github. Take a look at the talk below. Also, there’s a great tutorial on working with SQLite, which is well worth looking at.
]]>Yesterday we looked at Julia’s support for tabular data, which can be represented by a DataFrame
. The TimeSeries
package implements another common data type: time series. We’ll start by loading the TimeSeries
package, but we’ll also add the Quandl
package, which provides an interface to a rich source of time series data from Quandl.
using TimeSeries
using Quandl
We’ll start by getting our hands on some data from Yahoo Finance. By default these data will be of type TimeArray
, although it is possible to explicitly request a DataFrame
instead,
google = quandl("YAHOO/GOOGL"); # GOOGL at (default) daily intervals
typeof(google)
TimeArray{Float64,2,DataType} (constructor with 1 method)
apple = quandl("YAHOO/AAPL", frequency = :weekly); # AAPL at weekly intervals
mmm = quandl("YAHOO/MMM", from = "2015-07-01"); # MMM starting at 2015-07-01
rht = quandl("YAHOO/RHT", format = "DataFrame"); # As a DataFrame
typeof(rht)
DataFrame (constructor with 11 methods)
Having a closer look at one of the TimeSeries
objects we find that it actually consists of multiple data series, each represented by a separate column. The colnames
attribute gives names for each of the component series, while the timestamp
and values
attributes provide access to the data themselves. We’ll see more convenient means for accessing those data in a moment.
google
100x6 TimeArray{Float64,2,DataType} 2015-04-24 to 2015-09-15
Open High Low Close Volume Adjusted Close
2015-04-24 | 580.05 584.7 568.35 573.66 4608400 573.66
2015-04-27 | 572.77 575.52 562.3 566.12 2403100 566.12
2015-04-28 | 564.32 567.83 560.96 564.37 1858900 564.37
2015-04-29 | 560.51 565.84 559.0 561.39 1681100 561.39
⋮
2015-09-10 | 643.9 654.9 641.7 651.08 1384600 651.08
2015-09-11 | 650.21 655.31 647.41 655.3 1736100 655.3
2015-09-14 | 655.63 655.92 649.5 652.47 1497100 652.47
2015-09-15 | 656.71 668.85 653.34 665.07 1761800 665.07
names(google)
4-element Array{Symbol,1}:
:timestamp
:values
:colnames
:meta
google.colnames
6-element Array{UTF8String,1}:
"Open"
"High"
"Low"
"Close"
"Volume"
"Adjusted Close"
google.timestamp[1:5]
5-element Array{Date,1}:
2015-04-24
2015-04-27
2015-04-28
2015-04-29
2015-04-30
google.values[1:5,:]
5x6 Array{Float64,2}:
580.05 584.7 568.35 573.66 4.6084e6 573.66
572.77 575.52 562.3 566.12 2.4031e6 566.12
564.32 567.83 560.96 564.37 1.8589e6 564.37
560.51 565.84 559.0 561.39 1.6811e6 561.39
558.56 561.11 546.72 548.77 2.362e6 548.77
The TimeArray type caters for a full range of indexing operations which allow you to slice and dice those data to your exacting requirements. to()
and from()
extract subsets of the data before or after a specified instant.
google[1:5]
5x6 TimeArray{Float64,2,DataType} 2015-04-24 to 2015-04-30
Open High Low Close Volume Adjusted Close
2015-04-24 | 580.05 584.7 568.35 573.66 4608400 573.66
2015-04-27 | 572.77 575.52 562.3 566.12 2403100 566.12
2015-04-28 | 564.32 567.83 560.96 564.37 1858900 564.37
2015-04-29 | 560.51 565.84 559.0 561.39 1681100 561.39
2015-04-30 | 558.56 561.11 546.72 548.77 2362000 548.77
google[[Date(2015,8,7):Date(2015,8,12)]]
4x6 TimeArray{Float64,2,DataType} 2015-08-07 to 2015-08-12
Open High Low Close Volume Adjusted Close
2015-08-07 | 667.78 668.8 658.87 664.39 1374100 664.39
2015-08-10 | 667.09 671.62 660.23 663.14 1403900 663.14
2015-08-11 | 699.58 704.0 684.32 690.3 5264100 690.3
2015-08-12 | 694.49 696.0 680.51 691.47 2924900 691.47
google["High","Low"]
100x2 TimeArray{Float64,2,DataType} 2015-04-24 to 2015-09-15
High Low
2015-04-24 | 584.7 568.35
2015-04-27 | 575.52 562.3
2015-04-28 | 567.83 560.96
2015-04-29 | 565.84 559.0
⋮
2015-09-10 | 654.9 641.7
2015-09-11 | 655.31 647.41
2015-09-14 | 655.92 649.5
2015-09-15 | 668.85 653.34
google["Close"][3:5]
3x1 TimeArray{Float64,1,DataType} 2015-04-28 to 2015-04-30
Close
2015-04-28 | 564.37
2015-04-29 | 561.39
2015-04-30 | 548.77
We can shift observations forward or backward in time using lag()
or lead()
.
lag(google[1:5])
4x6 TimeArray{Float64,2,DataType} 2015-04-27 to 2015-04-30
Open High Low Close Volume Adjusted Close
2015-04-27 | 580.05 584.7 568.35 573.66 4608400 573.66
2015-04-28 | 572.77 575.52 562.3 566.12 2403100 566.12
2015-04-29 | 564.32 567.83 560.96 564.37 1858900 564.37
2015-04-30 | 560.51 565.84 559.0 561.39 1681100 561.39
lead(google[1:5], 3)
2x6 TimeArray{Float64,2,DataType} 2015-04-24 to 2015-04-27
Open High Low Close Volume Adjusted Close
2015-04-24 | 560.51 565.84 559.0 561.39 1681100 561.39
2015-04-27 | 558.56 561.11 546.72 548.77 2362000 548.77
We can also calculate the percentage change between observations.
percentchange(google["Close"], method = "log")
99x1 TimeArray{Float64,1,DataType} 2015-04-27 to 2015-09-15
Close
2015-04-27 | -0.0132
2015-04-28 | -0.0031
2015-04-29 | -0.0053
2015-04-30 | -0.0227
⋮
2015-09-10 | 0.0119
2015-09-11 | 0.0065
2015-09-14 | -0.0043
2015-09-15 | 0.0191
Well, that’s the core functionality in TimeSeries
. There are also methods for aggregation and moving window operations, as well as time series merging. You can check out some examples in the documentation as well as on github. Finally, watch the video below from JuliaCon 2014.
The DataFrame
type in Julia is not dissimilar to the analogous types in R and Python/pandas. It provides a way of grouping data which is convenient for analysis and reminiscent of a database table.
I’m assuming that you’ve already installed the DataFrames package. If not, take a look at yesterday’s post. The first step is then to load it up:
using DataFrames
Next we can start assembling our data. A DataFrame
can be built up one field at a time (as is done in the example below) or by passing all of the data at once to the constructor.
people = DataFrame();
people[:name] = ["Andrew", "Claire", "Bob", "Alice"];
people[:gender] = [0, 1, 0, 1];
people[:age] = [43, 35, 27, 32];
people
4x3 DataFrame
| Row | name | gender | age |
|-----|----------|--------|-----|
| 1 | "Andrew" | 0 | 43 |
| 2 | "Claire" | 1 | 35 |
| 3 | "Bob" | 0 | 27 |
| 4 | "Alice" | 1 | 32 |
names()
and eltypes()
provide a high level overview of the data, giving the names and data types respectively for each column.
names(people)
3-element Array{Symbol,1}:
:name
:gender
:age
eltypes(people)
3-element Array{Type{T<:Top},1}:
ASCIIString
Int64
Int64
You can dig deeper with describe()
, which gives a simple statistical summary of each column. It does essentially the same thing as summary()
in R.
Indexing operations allow you to access the data in various ways. There’s also head()
and tail()
, which return the first and last few records in the data.
people[:age]
4-element DataArray{Int64,1}:
43
35
27
32
people[2]
4-element DataArray{Int64,1}:
1
1
people[:,2]
4-element DataArray{Int64,1}:
1
1
people[1,:]
1x3 DataFrame
| Row | name | gender | age |
|-----|----------|--------|-----|
| 1 | "Andrew" | 0 | 43 |
You can apply a range of operations to columns. Note, however, that there is a subtle difference in syntax: while ==
is the normal equality operator, .==
is the element-wise equality operator which must be applied to columns in order to make element-by-element comparisons. A similar syntax pertains to other operators like .<=
and .>
.
people[:gender] = ifelse(people[:gender] .== 1, 'F', 'M');
people
4x3 DataFrame
| Row | name | gender | age |
|-----|----------|--------|-----|
| 1 | "Andrew" | 'M' | 43 |
| 2 | "Claire" | 'F' | 35 |
| 3 | "Bob" | 'M' | 27 |
| 4 | "Alice" | 'F' | 32 |
people[:gender] .== 'M'
4-element DataArray{Bool,1}:
true
false
true
false
people[:age] .<= 40
4-element DataArray{Bool,1}:
false
true
true
true
Of course you’re not likely to construct any serious collection of data manually. It’s more likely to come from a database or file. There are various ways to accomplish this. The simplest is reading from a delimited file.
passwd = readtable("/etc/passwd", separator = ':', header = false);
names!(passwd, [symbol(i) for i in ["username", "passwd", "UID", "GID",
"comment", "home", "shell"]]);
passwd[1:5,:]
5x7 DataFrame
| Row | username | passwd | UID | GID | comment | home | shell |
|-----|----------|--------|-----|-------|----------|-------------|---------------------|
| 1 | "root" | "x" | 0 | 0 | "root" | "/root" | "/bin/bash" |
| 2 | "daemon" | "x" | 1 | 1 | "daemon" | "/usr/sbin" | "/usr/sbin/nologin" |
| 3 | "bin" | "x" | 2 | 2 | "bin" | "/bin" | "/usr/sbin/nologin" |
| 4 | "sys" | "x" | 3 | 3 | "sys" | "/dev" | "/usr/sbin/nologin" |
| 5 | "sync" | "x" | 4 | 65534 | "sync" | "/bin" | "/bin/sync" |
Note how names!()
was used to alter the column names. There are other ways of loading data from a delimited text file that will handle column names more elegantly. We’ll get to those in a few days time.
Watch the video below and then read further to find out about the DataArrays
package.
Data are seldom perfect and missing values are not uncommon. Now, you might use some a particular numerical value (like -9999, for example) to indicate a missing datum. However, this is a bit of a kludge, difficult to maintain and open to ambiguity. The DataArrays
package introduces the singleton NA
type which can be used to unambiguously indicate missing data.
A vector with missing data is created using the @data
macro.
using DataArrays
x = @data([1, 2, 3, 4, NA, 6])
6-element DataArray{Int64,1}:
1
2
3
4
NA
6
Functions anyna()
and allna()
can be used to test whether any or all of the elements of a vector are missing.
Two ways of dealing with NA
s are to either drop them or replace them with another value.
dropna(x)
5-element Array{Int64,1}:
1
2
3
4
6
convert(Array, x, -1)
6-element Array{Int64,1}:
1
2
3
4
-1
6
Data frames have support for NA
s already baked in.
people[:age][2] = NA;
people
4x3 DataFrame
| Row | name | gender | age |
|-----|----------|--------|-----|
| 1 | "Andrew" | 'M' | 43 |
| 2 | "Claire" | 'F' | NA |
| 3 | "Bob" | 'M' | 27 |
| 4 | "Alice" | 'F' | 32 |
mean(people[:age])
NA
mean(dropna(people[:age]))
34.0
Note how dropna()
was used to calculate the mean of the non-missing data.
The DataFramesMeta
package provides a handful of macros for applying metaprogramming techniques to data frames. For example:
using DataFramesMeta
@with(passwd, maximum(:UID))
65534
@select(people, :gender)
4x1 DataFrame
| Row | gender |
|-----|--------|
| 1 | 'M' |
| 2 | 'F' |
| 3 | 'M' |
| 4 | 'F' |
Further examples can be found on the github page for MonthOfJulia.
]]>A lot of Julia’s functionality is implemented as add on packages (or “modules”). An extensive (though possibly not exhaustive) list of available packages can be found at https://juliapackages.com/. If you browse through that list I can guarantee that you will find a number of packages that pique your curiosity. How to install them? Read on.
Package management is handled via Pkg
. Pkg.dir()
will tell you where the installed packages are stored on your file system. Before installing any new packages, always call Pkg.update()
to update your local metadata and repository (it will update any installed packages to the their most recent version).
Installing a new package is done with Pkg.add()
. Any dependencies are handled automatically during the install process.
Pkg.add("VennEuler")
INFO: Cloning cache of VennEuler from git://github.com/HarlanH/VennEuler.jl.git
INFO: Installing VennEuler v0.0.1
INFO: Building NLopt
INFO: Building Cairo
INFO: Package database updated
Pkg.available()
generates a complete list of all available packages while Pkg.installed()
or Pkg.status()
can be used to find the versions of installed packages.
Pkg.installed()["VennEuler"]
v"0.0.1"
Pkg.installed("VennEuler")
v"0.0.1"
Pkg.pin()
will fix a package at a specific version (no updates will be applied). Pkg.free()
releases the effects of Pkg.pin()
.
The using
directive loads the functions exported by a package into the global namespace. You can get a view of the capabilities of a package by typing its name followed by a period and then hitting the Tab key. Alternatively, names()
will give a list of symbols exported by a package.
using VennEuler
names(VennEuler)
9-element Array{Symbol,1}:
:optimize
:render
:optimize_iteratively
:VennEuler
:EulerObject
:EulerState
:make_euler_object
:EulerSpec
:random_state
The package manager provides a host of other functionality which you can read about here. Check out the videos below to find out more about Julia’s package ecosystem. From tomorrow I’ll start looking at specific packages. To get yourself prepared for that, why not go ahead and install the following packages: Cpp, PyCall, DataArrays, DataFrames and RCall.
]]>As opposed to many other languages, where parallel computing is bolted on as an afterthought, Julia was designed from the start with parallel computing in mind. It has a number of native features which lend themselves to efficient implementation of parallel algorithms. It also has packages which facilitate cluster computing (using MPI, for example). We won’t be looking at those, but focusing instead on coroutines, generic parallel processing and parallel loops.
Coroutines are not strictly parallel processing (in the sense of “many tasks running at the same time”) but they provide a lightweight mechanism for having multiple tasks defined (if not active) at once. According to Donald Knuth, coroutines are generalised subroutines (with which we are probably all familiar).
Under these conditions each module may be made into a _coroutine_; that is, it may be coded as an autonomous program which communicates with adjacent modules as if they were input or output subroutines. Thus, coroutines are subroutines all at the same level, each acting as if it were the master program when in fact there is no master program. There is no bound placed by this definition on the number of inputs and outputs a coroutine may have.Conway, Design of a Separable Transition-Diagram Compiler, 1963.
Coroutines are implemented using produce()
and consume()
. In a moment you’ll see why those names are appropriate. To illustrate we’ll define a function which generates elements from the Lucas sequence. For reference, the first few terms in the sequence are 2, 1, 3, 4, 7, … If you know about Python’s generators then you’ll find the code below rather familiar.
function lucas_producer(n)
a, b = (2, 1)
for i = 1:n
produce(a)
a, b = (b, a + b)
end
end
lucas_producer (generic function with 1 method)
This function is then wrapped in a Task
, which has state :runnable
.
lucas_task = Task(() -> lucas_producer(10))
Task (runnable) @0x0000000005b5ee60
lucas_task.state
:runnable
Now we’re ready to start consuming data from the Task
. Data elements can be retrieved individually or via a loop (in which case the Task
acts like an iterable object and no consume()
is required).
consume(lucas_task)
2
consume(lucas_task)
1
consume(lucas_task)
3
for n in lucas_task
println(n)
end
4
7
11
18
29
47
76
Between invocations the Task
is effectively asleep. The task temporarily springs to life every time data is requested, before becoming dormant once more.
It’s possible to simultaneously set up an arbitrary number of coroutine tasks.
Coroutines don’t really feel like “parallel” processing because they are not working simultaneously. However it’s rather straightforward to get Julia to metaphorically juggle many balls at once. The first thing that you’ll need to do is launch the interpreter with multiple worker processes.
julia -p 4
There’s always one more process than specified on the command line (we specified the number of worker processes; add one for the master process).
nprocs()
5
workers() # Identifiers for the worker processes.
4-element Array{Int64,1}:
2
3
4
5
We can launch a job on one of the workers using remotecall()
.
W1 = workers()[1];
P1 = remotecall(W1, x -> factorial(x), 20)
RemoteRef(2,1,6)
fetch(P1)
2432902008176640000
@spawn
and @spawnat
are macros which launch jobs on individual workers. The @everywhere
macro executes code across all processes (including the master).
@everywhere p = 5
@everywhere println(@sprintf("ID %d: %f %d", myid(), rand(), p))
ID 1: 0.686332 5
From worker 4: ID 4: 0.107924 5
From worker 5: ID 5: 0.136019 5
From worker 2: ID 2: 0.145561 5
From worker 3: ID 3: 0.670885 5
To illustrate how easy it is to set up parallel loops, let’s first consider a simple serial implementation of a Monte Carlo technique to estimate π.
function findpi(n)
inside = 0
for i = 1:n
x, y = rand(2)
if (x^2 + y^2 <= 1)
inside +=1
end
end
4 * inside / n
end
findpi (generic function with 1 method)
The quality of the result as well as the execution time (and memory consumption!) depend directly on the number of samples.
@time findpi(10000)
elapsed time: 0.051982841 seconds (1690648 bytes allocated, 81.54% gc time)
3.14
@time findpi(100000000)
elapsed time: 9.533291187 seconds (8800000096 bytes allocated, 42.97% gc time)
3.1416662
@time findpi(1000000000)
elapsed time: 95.436185105 seconds (88000002112 bytes allocated, 43.14% gc time)
3.141605352
The parallel version is implemented using the @parallel
macro, which takes a reduction operator (in this case +
) as its first argument.
function parallel_findpi(n)
inside = @parallel (+) for i = 1:n
x, y = rand(2)
x^2 + y^2 <= 1 ? 1 : 0
end
4 * inside / n
end
parallel_findpi (generic function with 1 method)
There is some significant overhead associated with setting up the parallel jobs, so that the parallel version actually performs worse for a small number of samples. But when you run sufficient samples the speedup becomes readily apparent.
@time parallel_findpi(10000)
elapsed time: 0.45212316 seconds (9731736 bytes allocated)
3.1724
@time parallel_findpi(100000000)
elapsed time: 3.870065625 seconds (154696 bytes allocated)
3.14154744
@time parallel_findpi(1000000000)
elapsed time: 39.029650365 seconds (151080 bytes allocated)
3.141653704
For reference, these results were achieved with 4 worker processes on a DELL laptop with the following CPU:
lshw | grep product | head -n 1
product: Intel(R) Core(TM) i7-4600M CPU @ 2.90GHz
More information on parallel computing facilities in Julia can be found in the documentation. As usual the code for today’s Julia journey can be found on github.
]]>
Metaprogramming in Julia is a big topic and it’s covered extensively in both the official documentation as well as in the Introducing Julia wikibook. The idea behind metaprogramming is to write code which itself will either generate or change other code. There are two main features of the language which support this idea:
A symbol (data type Symbol
) represents an unevaluated chunk of code. As such, symbols are a means to refer to a variable (or expression) itself rather than the value it contains.
n = 5 # Assign to variable n.
5
n # Refer to contents of variable n.
5
typeof(n)
Int64
:n # Refer to variable n itself using quote operator.
:n
typeof(:n)
Symbol
eval(:n)
5
E = :(2x + y) # Unevaluated expression is also a symbol.
:(2x + y)
typeof(E)
Expr
The quote operator, :
, prevents the evaluation of its argument.
Expressions are made up of three parts: the operation (head
), the arguments to that operation (args
) and finally the return type from the expression (typ
).
names(E)
3-element Array{Symbol,1}:
:head
:args
:typ
E.head
:call
E.args
3-element Array{Any,1}:
:+
:(2x)
:y
E.typ
Any
We can evaluate an expression using eval()
. Not only does eval()
return the result of the evaluated expression but it also applies any side effects from the expression (for example, variable assignment).
x = 3; y = 5; eval(E)
11
eval(:(x = 4))
4
eval(E)
13
No real surprises there. But the true potential of all this lies in the fact that the code itself has an internal representation which can be manipulated. For example, we could change the arguments of the expression created above.
E.args[3] = :(3y) # 2x + y becomes 2x + 3y
:(3y)
E
:(2x + 3y)
eval(E)
21
That still seems a little tame. What about manipulating a function?
F = :(x -> x^2)
:(x->begin # none, line 1:
x^2
end)
eval(F)(2) # Evaluate x -> x^2 for x = 2
4
F.args[2].args[2].args[3] = 3 # Change function to x -> x^3
3
eval(F)(2) # Evaluate x -> x^3 for x = 2
8
Macros are a little like functions in that they accept arguments and return a result. However they are different because they are evaluated at parse time and return an unevaluated expression.
macro square(x)
:($x * $x)
end
@square(5)
25
@square 5
25
macroexpand(:(@square(x)))
:(x * x)
macroexpand(:(@square(5)))
:(5 * 5)
macroexpand(:(@square(x+2)))
:((x + 2) * (x + 2))
macroexpand()
is used to look at the code generated by the macro. Note that parentheses were automatically inserted to ensure the correct order of operations.
Julia has a plethora of predefined macros which do things like return the execution time for an expression (@time
), apply an assertion (@assert
), test approximate equality (@test_approx_eq
) and execute code only in a UNIX environment (@unix_only
).
The fact that one can use code to build and edit other code made me start thinking about self-replicating machines, self-reconfiguring modular robots, grey goo and utility fog. If we can do it in software, why not in hardware too? More evidence of my tinkering with metaprogramming in Julia can be found on github. No self-reconfiguring modular robots though, I’m afraid.
]]>
Modules in Julia are separate global variable workspaces. Modules allow you to create top-level definitions without worrying about name conflicts when your code is used together with somebody else’s. Within a module, you can control which names from other modules are visible (via importing), and specify which of your names are intended to be public (via exporting). Julia Documentation
To illustrate the concept, let’s define two new modules:
julia> module AfrikaansModule
__init__() = println("Initialising the Afrikaans module.")
greeting() = "Goeie môre!"
bonappetit() = "Smaaklike ete"
export greeting
end
Initialising the Afrikaans module.
julia> module ZuluModule
greeting() = "Sawubona!"
bonappetit() = "Thokoleza ukudla"
end
If an __init__()
function is present in the module then it’s executed when the module is defined. Is it my imagination or does the syntax for that function have an uncanny resemblance to something in another popular scripting language?
The greeting() function in the above modules does not exist in the global namespace (which is why the first function call below fails). But you can access functions from either of the modules by explicitly giving the module name as a prefix.
julia> greeting()
ERROR: greeting not defined
julia> AfrikaansModule.greeting()
"Goeie môre!"
julia> ZuluModule.greeting()
"Sawubona!"
The Afrikaans module exports the greeting() function, which becomes available in the global namespace once the module has been loaded.
julia> using AfrikaansModule
julia> greeting()
"Goeie môre!"
But it’s still possible to import into the global namespace functions which have not been exported.
julia> import ZuluModule.bonappetit
julia> bonappetit()
"Thokoleza ukudla"
In addition to functions, modules can obviously also encapsulate variables.
That’s pretty much the essence of it although there are a number of subtleties detailed in the official documentation. Well worth a look if you want to suck all the marrow out of Julia’s modules. As usual the code for today’s flirtation can be found on github.
]]>Direct output to the Julia terminal is done via print()
and println()
, where the latter appends a newline to the output.
julia> print(3, " blind "); print("mice!\n")
3 blind mice!
julia> println("Hello World!")
Hello World!
Terminal input is something that I never do, but it’s certainly possible. readline()
will read keyboard input until the first newline.
julia> response = readline();
Yo!
julia> response
"Yo!\n"
Writing to a file is pretty standard. Below we create a suitable name for a temporary file, open a stream to that file, write some text to the stream and then close it.
filename = tempname()
fid = open(filename, "w")
write(fid, "Some temporary text...")
close(fid)
print()
and println()
can also be used in the same way as write()
for sending data to a stream. STDIN
, STDOUT
and STDERR
are three predefined constants for standard console streams.
There are various approaches to reading data from files. One of which would be to use code similar to the example above. Another would be to do something like this (I’ve truncated the output because it really is not too interesting after a few lines):
julia> open("/etc/passwd") do fid
readlines(fid)
end
46-element Array{Union(UTF8String,ASCIIString),1}:
"root❌0:0:root:/root:/bin/bash\n"
"daemon❌1:1:daemon:/usr/sbin:/usr/sbin/nologin\n"
"bin❌2:2:bin:/bin:/usr/sbin/nologin\n"
"sys❌3:3:sys:/dev:/usr/sbin/nologin\n"
"sync❌4:65534:sync:/bin:/bin/sync\n"
Here readlines()
returns the entire contents of the file as an array, where each element corresponds to a line of content. readall()
would return everything in a single string. A somewhat different approach would be to use eachline()
which creates an iterator allowing you to process each line of the file individually.
Data can be read from a delimited file using readdlm()
, where the delimiter is specified explicitly. For a simple Comma Separated Value (CSV) file it’s more direct to simply use readcsv()
.
julia> passwd = readdlm("/etc/passwd", ':');
julia> passwd[1,:]
1x7 Array{Any,2}:
"root" "x" 0.0 0.0 "root" "/root" "/bin/bash"
The analogues writedlm()
and writcecsv()
are used for writing delimited data.
These functions will be essential if you are going to use Julia for data analyses. There is also functionality for reading and writing data in a variety of other formats like xls and xlsx, HDF5 (see embedded media below from JuliaCon2015), Matlab and Numpy data files and WAV audio files.
Julia implements a full range of file manipulation methods, most of which have names similar to their UNIX counterparts.
A few other details of my dalliance with Julia’s input/output functionality can be found on github.
]]>Conditionals allow you to branch the course of execution on the basis of one or more logical outcomes.
n = 8;
if (n > 7) # The parentheses are optional.
println("high")
elseif n < 3
println("low")
else
println("medium")
end
high
The ternary conditional operator provides a compact syntax for a conditional returning one of two possible values.
if n > 3 0 else 1 end # Conditional.
0
n > 3 ? 0 : 1 # Ternary conditional.
0
I’m still a little gutted that R does not have a ternary operator. Kudos to Python for at least having something similar, even if the syntax is somewhat convoluted.
There are a few different ways of achieving iteration in Julia. The simplest of these is the humble for
loop.
for n in [1:10]
println("number $n.")
end
number 1.
number 2.
number 3.
number 4.
number 5.
number 6.
number 7.
number 8.
number 9.
number 10.
In the code above we used the range operator, :
, to construct an iterable sequence of integers between 1 and 10. This might be a good place to take a moment to look at ranges, which might not work in quite the way you’d expect. To get the range to actually expand into an array you need to enclose it in []
, otherwise it remains a Range
object.
typeof(1:7)
UnitRange{Int64} (constructor with 1 method)
typeof([1:7])
Array{Int64,1}
1:7
1:7
[1:7]
7-element Array{Int64,1}:
1
2
3
4
5
6
7
A for
loop can iterate over any iterable object, including strings and dictionaries. Using enumerate()
in conjunction with a for loop gives a compact way to number items in a collection.
The while
construct gives a slightly different approach to iteration and is probably most useful when combined with continue
and break
statements which can be used to skip over iterations or prematurely exit from the loop.
The details of exception handling are well covered in the documentation, so I’ll just provide a few examples. Functions generate exceptions when something goes wrong.
factorial(-1)
ERROR: DomainError
in factorial_lookup at combinatorics.jl:26
in factorial at combinatorics.jl:35
super(DomainError)
Exception
All exceptions are derived from the Exception
base class.
An exception is explicitly launched via throw()
. To handle the exception in an elegant way you’ll want to enclose that dodgy bit of code in a try
block.
!(n) = n < 0 ? throw(DomainError()) : n < 2 ? 1 : n * !(n-1)
! (generic function with 7 methods)
!10
3628800
!0
1
!-1
ERROR: DomainError
in ! at none:1
try
!-1
catch
println("Well, that did't work!")
end
Well, that did't work!
Exceptional conditions can be flagged by the error()
function. Somewhat less aggressive are warn()
and info()
.
I’ve dug a little deeper into conditionals, loops and exceptions in the code on github.
]]>Functional Programming is characterised by higher order functions which accept other functions as arguments. Typically a Functional Programming language has facilities for anonymous “lambda” functions and ways to apply map, reduce and filter operations. Julia ticks these boxes.
We’ve seen anonymous functions before, but here’s a quick reminder of the syntax:
julia> x -> x^2
(anonymous function)
Let’s start with map()
which takes a function as its first argument followed by one or more collections. The function is then mapped onto each element of the collections. The first example below applies an anonymous function which squares its argument.
julia> map(x -> x^2, [1:5])
5-element Array{Int64,1}:
1
4
9
16
25
julia> map(/, [16, 9, 4], [8, 3, 2])
3-element Array{Float64,1}:
2.0
3.0
2.0
The analogues for this operation in Python and R are map()
and mapply()
or Map()
respectively.
filter()
, as its name would suggest, filters out elements from a collection for which a specific function evaluates to true. In the example below the function isprime()
is applied to integers between 1 and 50 and only the prime numbers in that range are returned.
julia> filter(isprime, [1:50])
15-element Array{Int64,1}:
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
The equivalent operation in Python and R is carried out using filter()
and Filter()
respectively.
The fold operation is implemented by reduce()
which builds up its result by applying a bivariate function across a collection of objects and using the result of the previous operation as one of the arguments. Hmmmm. That’s a rather convoluted definition. Hopefully the link and examples below will illustrate. The related functions, foldl()
and foldr()
, are explicit about the order in which their arguments are associated.
julia> reduce(/, 1:4)
0.041666666666666664
julia> ((1 / 2) / 3) / 4
0.041666666666666664
The fold operation is applied with reduce()
and Reduce()
in Python and R respectively.
Finally there’s a shortcut to achieve both map and reduce together.
julia> mapreduce(x -> x^2, +, [1:5])
55
julia> (((1^2 + 2^2) + 3^2) + 4^2) + 5^2
55
A few extra bits and pieces about Functional Programming with Julia can be found on github.
]]>Composite types are declared with the type
keyword. To illustrate we’ll declare a type for storing geographic locations, with attributes for latitude, longitude and altitude. The type immediately has two methods: a default constructor and a constructor specialised for arguments with data types corresponding to those of the type’s attributes. More information on constructors can be found in the documentation.
type GeographicLocation
latitude::Float64
longitude::Float64
altitude::Float64
end
methods(GeographicLocation)
# 2 methods for generic function "GeographicLocation":
GeographicLocation(latitude::Float64,longitude::Float64,altitude::Float64)
GeographicLocation(latitude,longitude,altitude)
Creating instances of this new type is simply a matter of calling the constructor. The second instance below clones the type of the first instance. I don’t believe I’ve seen that being done with another language. (That’s not to say that it’s not possible elsewhere! I just haven’t seen it.)
g1 = GeographicLocation(-30, 30, 15)
GeographicLocation(-30.0,30.0,15.0)
typeof(g1) # Interrogate type
GeographicLocation (constructor with 3 methods)
g2 = typeof(g1)(5, 25, 165) # Create another object of the same type.
GeographicLocation(5.0,25.0,165.0)
We can list, access and modify instance attributes.
names(g1)
3-element Array{Symbol,1}:
:latitude
:longitude
:altitude
g1.latitude
-30.0
g1.longitude
30.0
g1.latitude = -25 # Attributes are mutable
-25.0
Additional “outer” constructors can provide alternative ways to instantiate the type.
GeographicLocation(lat::Real, lon::Real) = GeographicLocation(lat, lon, 0)
GeographicLocation (constructor with 3 methods)
g3 = GeographicLocation(-30, 30)
GeographicLocation(-30.0,30.0,0.0)
Of course, we can have collections of composite types. In fact, these composite types have essentially all of the rights and privileges of the built in types.
locations = [g1, g2, g3]
3-element Array{GeographicLocation,1}:
GeographicLocation(-25.0,30.0,15.0)
GeographicLocation(5.0,25.0,165.0)
GeographicLocation(-30.0,30.0,0.0)
The GeographicLocation
type declared above is a “concrete” type because it has attributes and can be instantiated. You cannot derive subtypes from a concrete type. You can, however, declare an abstract type which acts as a place holder in the type hierarchy. As opposed to concrete types, an abstract type cannot be instantiated but it can have subtypes.
abstract Mammal
type Cow <: Mammal
end
Mammal() # You can't instantiate an abstract type!
ERROR: type cannot be constructed
Cow()
Cow()
The immutable
keyword will create a type where the attributes cannot be modified after instantiation.
Additional ramblings and examples of composite types can be found on github. Also I’ve just received an advance copy of Julia in Action by Chris von Csefalvay which I’ll be reviewing over the next week or so.
]]>An Array
is really the most important workhorse collection (IMHO). Julia can handle arrays of arbitrary dimension, but we’ll only have a look at the most commonly used, which are 1D and 2D.
julia> x = [-7, 1, 2, 3, 5]
5-element Array{Int64,1}:
-7
1
2
3
5
julia> typeof(x)
Array{Int64,1}
julia> eltype(x)
Int64
julia> y = [3, "foo", 'a'] # Elements can be of mixed type
3-element Array{Any,1}:
3
"foo"
'a'
julia> typeof(y) # Type of the Array itself
Array{Any,1}
julia> eltype(y) # Type of the elements in the Array
Any
Type promotion is applied to an array with mixed content (like the second example above, which contains an integer, a string and a character), elevating the element type to a common ancestor, which in the example is Any
.
The usual indexing operations apply, noting that in Julia indices are 1-based.
julia> x[1] # First element
-7
julia> getindex(x, [1, 3]) # Alternative syntax
2-element Array{Int64,1}:
-7
2
julia> x[end] # Last element
5
julia> x[end-1] # Penultimate element
3
julia> x[2:4] # Slicing
3-element Array{Int64,1}:
1
2
3
julia> x[2:4] = [1, 5, 9] # Slicing with assignment
3-element Array{Int64,1}:
1
5
9
An Array
can be treated like a stack or queue, where additional items can be popped from or pushed onto the “end” of the collection. Functions shift!()
and unshift!()
do analogous operations to the “front” of the collection.
julia> pop!(x) # Returns last element and remove it from array.
5
julia> push!(x, 12) # Append value to end of array.
5-element Array{Int64,1}:
-7
1
5
9
12
julia> append!(x, 1:3) # Append one array to the end of another array.
8-element Array{Int64,1}:
-7
1
5
9
12
1
2
3
What about a 2D array (or matrix)? Not too many surprises here. With reference to the examples above we can see that a 1D array is effectively a column vector.
3x3 Array{Int64,2}:
1 2 3
4 5 6
7 8 9
julia> N = [1 2; 2 3; 3 4]
3x2 Array{Int64,2}:
1 2
2 3
3 4
julia> M[2,2] # [row,column]
5
julia> M[1:end,1]
3-element Array{Int64,1}:
1
4
7
julia> M[1,:] # : is the same as 1:end
1x3 Array{Int64,2}:
1 2 3
Collections are copied by reference. A shallow copy can be created with copy()
. If you want a truly distinct collection of objects you need to use deepcopy()
.
And now a taste of the other collection types, starting with the tuple.
julia> a, b, x, text = 1, 2, 3.5, "Hello"
(1,2,3.5,"Hello")
julia> a, b = b, a # I never get tired of this!
(1,2)
A dictionary is just a collection of key-value pairs.
julia> stuff = {"number" => 43, 1 => "zap!", 2.5 => 'x'}
Dict{Any,Any} with 3 entries:
"number" => 43
2.5 => 'x'
1 => "zap!"
julia> stuff["number"]
43
Sets are unordered collections which are not indexed and do not allow duplicates.
julia> S1 = Set([1, 2, 3, 4, 5]) # Set{Int64}
Set{Int64}({4,2,3,5,1})
julia> S2 = Set({3, 4, 5, 6, 7}) # Set{Any}
Set{Any}({7,4,3,5,6})
julia> union(S1, S2)
Set{Any}({7,4,2,3,5,6,1})
julia> intersect(S1, S2)
Set{Int64}({4,3,5})
julia> setdiff(S2, S1)
Set{Any}({7,6})
We’ll see more about collections when we look at Julia’s functional programming capabilities, which will be in the next but one installment. In the meantime you can find the full code for today’s flirtation with Julia on github.
]]>Julia performs Just-in-Time (JIT) compilation using a Low Level Virtual Machine (LLVM) to create machine-specific assembly code. The first time a function is called, Julia compiles the function’s source code and the results are cached and used for any subsequent calls to the same function. However, there are some additional wrinkles to this story.
Julia caters for generic functions which will operate on variables with a variety of data types. But it gains a lot of speed by compiling specialised versions of a function which are optimised to work on particular data types. As a result the code generated by Julia achieves performance which is comparable to that of lower level languages like C and FORTRAN, as illustrated by the benchmark results below.
There are multiple routes to defining a function in Julia, ranging from verbose to succinct. The last statement in a function definition becomes the return value, so it is not necessary to have an explicit return
statement (although for clarity it is often a good idea to be explicit!).
# Verbose form.
#
function square(x)
return x^2 # The return keyword is redundant, but still a good idea.
end
# One-line form.
#
square(x) = x^2
hypotenuse(x, y) = sqrt(x^2 + y^2)
x² = x -> x*x # To get x² type x^2 followed by Tab in the REPL.
# Anonymous (lambda) form.
#
x -> x^2
#
anonsquare = function (x) # Creates an anonymous function and assigns it to anonsquare.
x^2
end
The functions defined above are generic in the sense that they do not pertain to arguments of any particular data type: they should work for all (reasonable) arguments.
Functions can return multiple values as a tuple.
julia> function divide(n, m)
div(n,m), n%m
end
divide (generic function with 1 method)
julia> divide(7,3)
(2,1)
Julia has an interesting convention for functions with side effects: function names which end in a ! modify their first argument. For example, sort()
will return a sorted version of its input, while sort!()
will reorder the elements of the input in place. Maintaining this syntax is important since all function arguments are passed by reference and it is thus perfectly permissible that the values of arguments be modified within a function.
We can get a glimpse of at what happens under the hood during the compilation process. code_llvm()
returns the bytecode generated by the LLVM for a generic function when its applied to an argument of a given type. For example, below we see the bytecode for the function sqrt()
applied to a 64 bit floating point argument.
julia> code_llvm(sqrt, (Float64,))
define double @julia_sqrt_20446(double) {
top:
%1 = fcmp uge double %0, 0.000000e+00, !dbg !1150
br i1 %1, label %pass, label %fail, !dbg !1150
fail: ; preds = %top
%2 = load %jl_value_t** @jl_domain_exception, align 8, !dbg !1150, !tbaa %jtbaa_const
call void @jl_throw_with_superfluous_argument(%jl_value_t* %2, i32 131), !dbg !1150
unreachable, !dbg !1150
pass: ; preds = %top
%3 = call double @llvm.sqrt.f64(double %0), !dbg !1150
ret double %3, !dbg !1150
}
Digging deeper we can scrutinise the native assembly instructions using code_native()
.
julia> code_native(sqrt, (Float64,))
.text
Filename: math.jl
Source line: 131
push RBP
mov RBP, RSP
xorpd XMM1, XMM1
ucomisd XMM1, XMM0
Source line: 131
ja 6
sqrtsd XMM0, XMM0
pop RBP
ret
movabs RAX, 140208263704872
mov RDI, QWORD PTR [RAX]
movabs RAX, 140208248879392
mov ESI, 131
call RAX
Not being a Computer Scientist, this information is of limited use to me. But it will serve to illustrate the next point. As one might expect, the assembly code generated for a particular function depends on type of the arguments being fed into that function. If, for example, we look at the assembly code for sqrt()
applied to a 64 bit integer argument then we see that there are a number of differences.
julia> code_native(sqrt, (Int64,))
.text
Filename: math.jl
Source line: 133
push RBP
mov RBP, RSP
Source line: 133
cvtsi2sd XMM0, RDI
xorpd XMM1, XMM1
ucomisd XMM1, XMM0
ja 6
sqrtsd XMM0, XMM0
pop RBP
ret
movabs RAX, 140208263704872
mov RDI, QWORD PTR [RAX]
movabs RAX, 140208248879392
mov ESI, 133
call RAX
So the JIT compiler is effectively generating versions of the generic function sqrt()
which are specialised for different argument types. This is good for performance because the code being executed is always optimised for the appropriate argument types.
Julia implements multiple dispatch, which means that the code executed for a specific function depends dynamically on the data type of the arguments. In the case of generic functions, each time that a function is called with a new data type specialised code is generated. But it’s also possible to define different versions of a function which are selected on the basis of the argument data type. These would then be applied in lieu of the corresponding generic function. There is an obvious parallel with function overloading.
The Julia documentation articulates these ideas better than I can:
To facilitate using many different implementations of the same concept smoothly, functions need not be defined all at once, but can rather be defined piecewise by providing specific behaviors for certain combinations of argument types and counts. A definition of one possible behavior for a function is called a method. Thus far, we have presented only examples of functions defined with a single method, applicable to all types of arguments. However, the signatures of method definitions can be annotated to indicate the types of arguments in addition to their number, and more than a single method definition may be provided. When a function is applied to a particular tuple of arguments, the most specific method applicable to those arguments is applied. Thus, the overall behavior of a function is a patchwork of the behaviors of its various method definitions. If the patchwork is well designed, even though the implementations of the methods may be quite different, the outward behavior of the function will appear seamless and consistent.
Specific data types are indicated by the ::
type annotation operator. So, for example, we can create a version of square()
which does something unique (and inane) for integer arguments.
julia> function square(x::Int64)
println("Squaring an integer.")
x^2
end
square (generic function with 2 methods)
julia> square(1.5) # Using the generic function
2.25
julia> square(2) # Using the function specialised for Int64 arguments
Squaring an integer.
4
You can get a list of the methods associated with a function using methods()
. Below we have both the specialised (Int64
) version as well as the fully generic version of square()
.
julia> methods(square)
# 2 methods for generic function "square":
square(x::Int64) at none:2
square(x) at none:1
Type annotations can also be used within the local scope of a function body (as well as for
, while
, try
, let
, and type
blocks).
function foo(x)
y::Float64 = 3
return(x + y)
end
I’ve really only scratched the surface here. There’s a lot more to say about default, positional and keyword arguments, operators, parametric types and a host of other topics. You can find further details of today’s Julia ramblings on github and there’s even more in the documentation for functions and methods. Finally, check out some thoughts on writing good Julia functions from Chris von Csefalvay.
]]>One of the major drawbacks of dynamically typed languages is that they generally sacrifice performance for convenience. This does not apply with Julia, as explained in the quote below.
The landscape of computing has changed dramatically over the years. Modern scientific computing environments such as MATLAB, R, Mathematica, Octave, Python (with NumPy), and SciLab have grown in popularity and fall under the general category known as dynamic languages or dynamically typed languages. In these programming languages, programmers write simple, high-level code without any mention of types like int, float or double that pervade statically typed languages such as C and Fortran.Julia is dynamically typed, but it is different as it approaches statically typed performance. New users can begin working with Julia as they did in the traditional numerical computing languages, and work their way up when ready. In Julia, types are implied by the computation itself together with input values. As a result, Julia programs are often completely generic and compute with data of different input types without modification—a feature known as “polymorphism.”
Variable assignment and evaluation work in precisely the way you’d expect.
x = 3
3
x^2
9
Julia supports Unicode, so ß, Ɛ and Ȁ are perfectly legitimate (although possibly not too sensible) variable names.
ß = 3; 2ß
6
Chained and simultaneous assignments are possible.
a = b = c = 5
5
d, e = 3, 5
(3,5)
Multiple expressions can be combined into a single compound expression. Julia supports both verbose and compact syntax.
x = begin
p = 2
q = 3
p + q
end
5
x = (p = 2; q = 3; p + q)
5
Constants, declared as const
, are immutable variables (forgive the horrendous contradiction, but I trust that you know what I mean!).
const SPEED_OF_LIGHT = 299792458;
Somewhat disconcertingly it is possible to change the value of a constant. You just can’t change its type. This restriction has more to do with performance than anything else.
There are numerous predefined constants too.
pi
π = 3.1415926535897...
e
e = 2.7182818284590...
VERSION
v"0.3.10"
The practical implications of picking number types are pretty important. Efficient programming requires you to use the least amount of memory. Julia in Action by Chris von Csefalvay
Julia has an extensive type hierachy with its root at the universal Any
type. You can query the current data type for a variable using typeof()
. As mentioned above, this is dynamic and a variable can readily be reassigned a value with a different type.
x = 3.5;
typeof(x)
Float64
x = "I am a string";
typeof(x)
ASCIIString (constructor with 2 methods)
There are various other functions for probing the type hierarchy. For example, you can use isa()
to check whether a variable or constant is of a particular type.
isa(x, ASCIIString)
true
isa(8, Int64)
true
isa(8, Number)
true
isa(8, ASCIIString)
false
So we see that 8 is both a In64
and a Number
. Not only does that make mathematical sense, it also suggests that there is a relationship between the In64
and Number
data types. In fact Int64
is a subtype of Signed
, which derives from Integer
, which is a subtype of Real
, which derives from Number
…
super(Int64)
Signed
super(Signed)
Integer
super(Integer)
Real
super(Real)
Number
Formally this can be written as
Int64 <: Signed <: Integer <: Real <: Number
true
where <:
is the “derived from” operator. We can explore the hierarchy in the opposite direction too, where subtypes()
descends one level in the type hierarchy.
subtypes(Integer)
5-element Array{Any,1}:
BigInt
Bool
Char
Signed
Unsigned
Julia supports integers between Int8 and Int128, with Int32 or Int64 being the default depending on the hardware and operating system. A “U” prefix indicates unsigned variants, like UInt64. Arbitrary precision integers are supported via the BigInt type.
Floating point numbers are stored by Float16, Float32 and Float64 types. Arbitrary precision floats are supported via the BigFloat type. Single and double precision floating point constants are given with specific syntax.
typeof(1.23f-1)
Float32
typeof(1.23e-1)
Float64
In Julia complex and rational numbers are parametric types, for example Complex{Float32}
and Rational{Int64}
. More information on complex and rational numbers in Julia can be found in the documentation.
(3+4im)^2
-7 + 24im
typeof(3+4im)
Complex{Int64} (constructor with 1 method)
typeof(3.0+4im)
Complex{Float64} (constructor with 1 method)
typeof(3//4)
Rational{Int64} (constructor with 1 method)
An encyclopaedic selection of mathematical operations is supported on numeric types.
1 + 2
3
1 / 2
0.5
div(5, 3), 5 % 3
(1,2)
sqrt(2)
1.4142135623730951
Julia distinguishes between strings and characters. Strings are enclosed in double quotes, while individual characters are designated by single quotes. Strings are immutable and can be either ASCII (type ASCIIString
) or unicode (type UTF8String
). The indexing operator []
is used to extract slices from within strings. Evaluating variables within a string is done with a syntax which will be familiar to most shell warriors.
name = "Julia"
"Julia"
name[4]
'i'
name[end]
'a'
length(name)
5
"My name is $name and I'm $(2015-2012) years old."
"My name is Julia and I'm 3 years old."
There is a lot of functionality attached to the String class. To get an idea of the range, have a look at the output from methodswith(String)
.
Julia supports Perl regular expressions. Regular expression objects are of type Regex
and defined by a string preceded by the character ‘r’ and possibly followed by a modifier character (‘i’, ’s', ’m' or ‘x’).
username_regex = r"^[a-z0-9_-]{3,16}$"
r"^[a-z0-9_-]{3,16}$"
ismatch(username_regex, "my-us3r_n4m3")
true
ismatch(username_regex, "th1s1s-wayt00_l0ngt0beausername")
false
hex_regex = r"#?([a-f0-9]{6}|[a-f0-9]{3})"i;
m = match(hex_regex, "I like the color #c900b5.")
RegexMatch("#c900b5", 1="c900b5")
m.match
"#c900b5"
m.offset
18
Explicit type conversion works either by using convert()
or the lower-case type name as a function.
convert(Float64, 3)
3.0
float64(3)
3.0
int64(2.5)
3
string(2.5)
"2.5"
Although Julia is dynamically typed, it’s still possible (and often desireable) to stipulate the type for a particular variable. Furthermore, from a performance perspective it’s beneficial that a variable retains a particular type once it has been assigned. This is known as type stability. We’ll find out more about these issues when we look at defining functions in Julia.
As before, my detailed notes on today’s foray into Julia can be found on github.
]]>Juno is the current IDE of choice for Julia. Installation is pretty straightforward. If you’ve worked in any other IDE then finding your way around Juno will be simple.
When you start Juno for the first time it opens a tutorial file called, not surprisingly, Tutorial.jl. It would be worth your while to take the few minutes required to work your way through this. The image below shows some of the tutorial content.
Useful features:
There are some other cool bells and whistles too. For example, while you are working through the tutorial you will evaluate double(10)
and find that by clicking and dragging on the function argument you can change its value and the value of the expression will be updated accordingly. I’ve not seen that in another IDE.
You can produce inline plots and the same click-and-drag mechanism mentioned above allows you to change the parameters of the plot and see the results in real-time.
You can run Julia code directly in your browser using the IJulia notebooks at JuliaBox. These notebooks are based on functionality from IPython. You can read more about how they work here.
🚨 Unfortunately JuliaBox is no longer operational.
Sign in using your Google identity. Everybody has one of those, right?
Open the tutorial folder and then select the 00 - Start Tutorial notebook. It’s worthwhile browsing through the other parts of the tutorial too, which cover topics like plotting and metaprogramming.
You can access the notebook functionality locally via the IJulia package. As before, these instructions pertain to Ubuntu Linux. First you’ll need to install IPython.
sudo apt-get install ipython-notebook
Then install and load the IJulia package. Finally run the notebook()
function, which will launch an IJulia notebook in your browser.
Pkg.add("IJulia")
using IJulia
notebook()
2015-08-03 07:35:33.009 [NotebookApp] Using existing profile dir: u'/home/colliera/.ipython/profile_julia'
2015-08-03 07:35:33.013 [NotebookApp] Using system MathJax
2015-08-03 07:35:33.020 [NotebookApp] Serving notebooks from local directory: /home/colliera/
2015-08-03 07:35:33.021 [NotebookApp] 0 active kernels
2015-08-03 07:35:33.021 [NotebookApp] The IPython Notebook is running at: http://localhost:8998/
2015-08-03 07:35:33.021 [NotebookApp] Use Control-C to stop this server and shut down all kernels.
Created new window in existing browser session.
Alternatively you can use an IJulia notebook directly from the shell prompt.
ipython notebook -profile julia
That’s a little more direct than first running the Julia interpreter. For ease of use I created a shell alias.
alias ijulia='ipython notebook -profile julia'
ijulia
Update (2018/07/22): Once you’ve installed the IJulia
package you’ll find that there is also a Julia kernel available in Jupyter.
There is good support for Julia in various editors. Among the ones I use (vim, gedit, Notepad++, emacs and Sublime Text) all have Julia capabilities.
Julia Studio is a project which is no longer supported and has been incorporated into Epicenter, which is a platform for hosting server-side models. It facilitates the creation of interactive web and mobile applications. I haven’t given it a go yet, but it looks interesting and it’s certainly on the agenda.
]]>And so I embarked on a Month of Julia. Over the next 30 days (give or take a few) I will be posting about my experiences as a new Julia user. I’ll start with some of the language basics and then dig into a few of the supplementary packages. Today I’ll start with installation and a quick tour of the interpreter.
If you have the time and patience you’ll find it instructive to read these:
But, if you’re like me, then you’ll get half way through the first paper and have the overpowering urge to start tinkering. Don’t delay. Tinker away.
You should undoubtedly read the papers in their entirety once the first wave of tinkering subsides.
I’m running Ubuntu 15.04, so this will be rather specific. Installation was extremely simple (handled entirely by the package manager) and it looks like other platforms are similarly straightforward (check the Downloads page). I will be using Julia version 0.3.10.
I’m not quite sure why I was in the Pictures
folder when I kicked off the install, but this is by no means a requirement!
The Julia interpreter is launched by typing julia
at the command prompt. At launch it displays a nice logo which, incidentally, can be reproduced at any time for your amusement and pleasure using Base.banner()
.
The default colours in the interpreter are not going to work for me, so I tracked down the configuration file (~/.juliarc.jl
) and made some changes. Ah, that’s better.
Update (2018/07/21): The above approach will not work with contemporary Ubuntu. Here’s what you should do instead:
Go to https://julialang.org/downloads/ and download a generic Linux binary appropriate for your machine. In my case this was an archive named julia-0.6.4-linux-x86_64.tar.gz
.
Extract the archive.
cd /opt/
sudo tar -zxvf ~/Downloads/julia-0.6.4-linux-x86_64.tar.gz
Link the resulting executable into the PATH
. The path to the Julia executable will likely be different depending on the version.
cd /usr/bin/
sudo ln -s /opt/julia-9d11f62bcb/bin/julia
Start Julia.
julia
The Julia REPL is designed for interaction. I will pick out a few of the key features (you can find further details here).
ans
variable.?
at the prompt or by using help()
.apropos()
.;
at the prompt.x\^2
followed by Tab. When I first saw that this worked I almost wet my pants.)If you’re like me then you’ll want to put your code into script files. Julia scripts normally have a “.jl” extension and are executed in the interpreter either by specifying the script name on the command line like this:
julia test-script.jl
or from within the interpreter using
include('test-script.jl')
To ensure that you have the most recent released version of Julia (not necessarily available through the default Ubuntu repositories), you can add specific PPAs.
sudo add-apt-repository ppa:staticfloat/juliareleases
sudo add-apt-repository ppa:staticfloat/julia-deps
sudo apt-get update
I’ll be posting code samples on GitHub.
More to come tomorrow. In the meantime though, do yourself a favour and watch the videos below.
]]>