Blog Posts by Andrew B. Collier / @datawookie


Desert Island Docker: Python Edition

Over the years that I’ve been dabbling in public speaking I’ve generally developed a talk, presented it once and then moved on. However, I’ve noticed other speakers who give the same (or similar) talk at different events, where the talk evolves and improves over time.

Read More →

{emayili} Support for Mailtrap

{emayili} Support for Mailtrap

The {emayili} package has adapters which make it simple to send email via a variety of services. For example, it caters specifically for ZeptoMail, MailerSend, Mailfence and Sendinblue. The latest version of {emayili}, 0.8.0 published on 23 April 2024, adds an an adapter for Mailtrap.

Read More →

Backtesting

A farmer hoeing his field in the style of John Constable.

The key to successful backtesting is to ensure that you only use the data that were available at the time of the prediction. No “future” data can be included in the model training set, otherwise the model will suffer from look-ahead bias (having unrealistic access to future data).

Read More →

Asset Allocation

A farmer, his sheep and equipment in the style of John Constable.

The Two-Fund Separation Theorem introduced by James Tobin, a Nobel Prize-winning economist, is a fundamental concept in investment theory. It addresses how investors can optimally allocate their assets. In an efficient market an optimal portfolio is a combination of a risk-free asset and a market portfolio.

Read More →

Logging like a Lumberjack

Cut logs floating down a river in the Amazon.

Sprinkling status messages across you code using print() statements can be a good temporary fix for tracking down issues in your code.

But it can get messy and there’s no way for you to selectively disable them. Sure you can redirect output to a file or /dev/null but this is an all-or-nothing solution. How about disabling some messages and retaining others?

This is where the logging module comes into its own.

Read More →

Risk/Reward Tradeoff

A painting of a river valley. On the left the countryside it verdant and green. On the right it's dry and brown.

The two quantities we have been modelling (the time-dependent average and standard deviation of the returns) represent respectively the (potential) risk and reward associated with an asset. The relationship between these two quantities is implicit in the GARCH model. However, sometimes the return depends directly on the risk. A variant of the GARCH model can take this explicit relationship into account.

Read More →

Docker Image from Scratch

An minimal image evocative of a whale.

Most often when you are creating a new Docker image it will be based on one of the standard Docker base images like ubuntu, alpine, python or nginx. But sometimes you might want to truly roll your own image. Starting with literally nothing. From scratch. Tabula rasa.

Read More →

Model Validation

A farmer inspecting a cow. Image in style of John Constable.

Is this a “good” model? How to validate a model and determine whether it’s a good representation of the training data and likely to produce robust and reliable predictions.

Read More →

Leverage Effect

Two ladies on a seesaw in a field. In style of John Constable.

The models we have been looking at do not differentiate between positive and negative residuals: both errors are treated the same. However, this does not align with reality, where the volatility resulting from a large negative return is higher than that for the corresponding positive return.

Read More →

Skewed Returns

A house tilted to the side in the middle of a river. Painting in the style of John Constable.

In the previous post we assumed that returns had a normal distribution. This assumption implied that the distribution was symmetric and a positive return was as likely as the corresponding negative return. In reality this assumption is just not true and returns are asymmetrically distributed.

Read More →

What is a GARCH Model?

A landscape in the style of John Constable.

A GARCH (Generalised Autoregressive Conditional Heteroskedasticity) model is a statistical tool used to forecast volatility by analysing patterns in past price movements and volatility.

Read More →

Rolling Volatility & Returns

An image of barrels being loaded onto carts in a style similar to that of John Constable.

In the previous post we loaded stock data into R and then calculated return volatility, both for the entire time series and shorter intervals. We saw that volatility is not constant but can change appreciably with time. One way to get a clear view of changes in volatility is by calculating them using a moving or (“rolling”) window.

Read More →

Loading Financial Time Series

An image of farm workers loading hay onto the back of a wagon in a style similar to that of John Constable.

I’m going to be writing a series of posts which will look at some applications of R (and perhaps Python) to financial modelling. We’ll start here by pulling some stock data into R, calculating the daily returns and then looking at correlations and simple volatility estimates.

Read More →