Installing XGBoost on Ubuntu

XGBoost is the flavour of the moment for serious competitors on Kaggle. It was developed by Tianqi Chen and provides a particularly efficient implementation of the Gradient Boosting algorithm. Although there is a CLI implementation of XGBoost you’ll probably be more interested in using it from either R or Python. Below are instructions for getting it installed for each of these languages. It’s pretty painless.

Installing for R

Installation in R is extremely simple.

> install.packages('xgboost')
> library(xgboost)

It’s also supported as a model in caret, which is especially handy for feature selection and model parameter tuning.

Installing for Python

This might be as simple as

$ pip install xgboost

If you run into trouble with that, try the alternative approach below.

Download the latest version from the GitHub repository. The simplest way to do this is to grab the archive of a recent release. Unpack the archive, then become root and then execute the following:

# cd xgboost-master
# make
# cd python-package/
# python setup.py install -user

And you’re ready to roll:

import xgboost

If you run into trouble during the process you might have to install a few other packages:

# apt-get install g++ gfortran
# apt-get install python-dev python-numpy python-scipy python-matplotlib python-pandas
# apt-get install libatlas-base-dev

Conclusion

Enjoy building great models with the absurdly powerful tool. I’ve found that it effortlessly consumes vast data sets that grind other algorithms to a halt. Get started by looking at some code examples. Also worth looking at are

an Introduction to Boosted Trees;
a tutorial showing how XGBoost was applied to the Otto Group Product Classification Challenge;
Understanding Gradient Boosting (Part 1); and
a presentation by Alexander Ihler.