Day 15: Time Series

Yesterday we looked at Julia’s support for tabular data, which can be represented by a DataFrame. The TimeSeries package implements another common data type: time series. We’ll start by loading the TimeSeries package, but we’ll also add the Quandl package, which provides an interface to a rich source of time series data from Quandl.

using TimeSeries
using Quandl

We’ll start by getting our hands on some data from Yahoo Finance. By default these data will be of type TimeArray, although it is possible to explicitly request a DataFrame instead,

google = quandl("YAHOO/GOOGL"); # GOOGL at (default) daily intervals
typeof(google)
TimeArray{Float64,2,DataType} (constructor with 1 method)
apple = quandl("YAHOO/AAPL", frequency = :weekly); # AAPL at weekly intervals
mmm = quandl("YAHOO/MMM", from = "2015-07-01"); # MMM starting at 2015-07-01
rht = quandl("YAHOO/RHT", format = "DataFrame"); # As a DataFrame
typeof(rht)
DataFrame (constructor with 11 methods)

Having a closer look at one of the TimeSeries objects we find that it actually consists of multiple data series, each represented by a separate column. The colnames attribute gives names for each of the component series, while the timestamp and values attributes provide access to the data themselves. We’ll see more convenient means for accessing those data in a moment.

google
100x6 TimeArray{Float64,2,DataType} 2015-04-24 to 2015-09-15

             Open     High     Low      Close    Volume    Adjusted Close
2015-04-24 | 580.05   584.7    568.35   573.66   4608400   573.66
2015-04-27 | 572.77   575.52   562.3    566.12   2403100   566.12
2015-04-28 | 564.32   567.83   560.96   564.37   1858900   564.37
2015-04-29 | 560.51   565.84   559.0    561.39   1681100   561.39
⋮
2015-09-10 | 643.9    654.9    641.7    651.08   1384600   651.08
2015-09-11 | 650.21   655.31   647.41   655.3    1736100   655.3
2015-09-14 | 655.63   655.92   649.5    652.47   1497100   652.47
2015-09-15 | 656.71   668.85   653.34   665.07   1761800   665.07
names(google)
4-element Array{Symbol,1}:
 :timestamp
 :values
 :colnames
 :meta
google.colnames
6-element Array{UTF8String,1}:
 "Open"
 "High"
 "Low"
 "Close"
 "Volume"
 "Adjusted Close"
google.timestamp[1:5]
5-element Array{Date,1}:
 2015-04-24
 2015-04-27
 2015-04-28
 2015-04-29
 2015-04-30
google.values[1:5,:]
5x6 Array{Float64,2}:
 580.05   584.7    568.35   573.66   4.6084e6   573.66
 572.77   575.52   562.3    566.12   2.4031e6   566.12
 564.32   567.83   560.96   564.37   1.8589e6   564.37
 560.51   565.84   559.0    561.39   1.6811e6   561.39
 558.56   561.11   546.72   548.77   2.362e6    548.77

The TimeArray type caters for a full range of indexing operations which allow you to slice and dice those data to your exacting requirements. to() and from() extract subsets of the data before or after a specified instant.

google[1:5]
5x6 TimeArray{Float64,2,DataType} 2015-04-24 to 2015-04-30

             Open     High     Low      Close    Volume    Adjusted Close
2015-04-24 | 580.05   584.7    568.35   573.66   4608400   573.66
2015-04-27 | 572.77   575.52   562.3    566.12   2403100   566.12
2015-04-28 | 564.32   567.83   560.96   564.37   1858900   564.37
2015-04-29 | 560.51   565.84   559.0    561.39   1681100   561.39
2015-04-30 | 558.56   561.11   546.72   548.77   2362000   548.77
google[[Date(2015,8,7):Date(2015,8,12)]]
4x6 TimeArray{Float64,2,DataType} 2015-08-07 to 2015-08-12

             Open     High     Low      Close    Volume    Adjusted Close
2015-08-07 | 667.78   668.8    658.87   664.39   1374100   664.39
2015-08-10 | 667.09   671.62   660.23   663.14   1403900   663.14
2015-08-11 | 699.58   704.0    684.32   690.3    5264100   690.3
2015-08-12 | 694.49   696.0    680.51   691.47   2924900   691.47
google["High","Low"]
100x2 TimeArray{Float64,2,DataType} 2015-04-24 to 2015-09-15

             High     Low
2015-04-24 | 584.7    568.35
2015-04-27 | 575.52   562.3
2015-04-28 | 567.83   560.96
2015-04-29 | 565.84   559.0
⋮
2015-09-10 | 654.9 641.7
2015-09-11 | 655.31 647.41
2015-09-14 | 655.92 649.5
2015-09-15 | 668.85 653.34
google["Close"][3:5]
3x1 TimeArray{Float64,1,DataType} 2015-04-28 to 2015-04-30

             Close
2015-04-28 | 564.37
2015-04-29 | 561.39
2015-04-30 | 548.77

We can shift observations forward or backward in time using lag() or lead().

lag(google[1:5])
4x6 TimeArray{Float64,2,DataType} 2015-04-27 to 2015-04-30

             Open     High     Low      Close    Volume    Adjusted Close
2015-04-27 | 580.05   584.7    568.35   573.66   4608400   573.66
2015-04-28 | 572.77   575.52   562.3    566.12   2403100   566.12
2015-04-29 | 564.32   567.83   560.96   564.37   1858900   564.37
2015-04-30 | 560.51   565.84   559.0    561.39   1681100   561.39
lead(google[1:5], 3)
2x6 TimeArray{Float64,2,DataType} 2015-04-24 to 2015-04-27

             Open     High     Low      Close    Volume    Adjusted Close
2015-04-24 | 560.51   565.84   559.0    561.39   1681100   561.39
2015-04-27 | 558.56   561.11   546.72   548.77   2362000   548.77

We can also calculate the percentage change between observations.

percentchange(google["Close"], method = "log")
99x1 TimeArray{Float64,1,DataType} 2015-04-27 to 2015-09-15

             Close
2015-04-27 | -0.0132
2015-04-28 | -0.0031
2015-04-29 | -0.0053
2015-04-30 | -0.0227

2015-09-10 | 0.0119
2015-09-11 | 0.0065
2015-09-14 | -0.0043
2015-09-15 | 0.0191

Well, that’s the core functionality in TimeSeries. Methods for aggregation, moving window operations and time series merging are also supported. You can check out some examples in the documentation as well as on GitHub. Finally, watch the video below from JuliaCon 2014.