XGBoost is the flavour of the moment for serious competitors on kaggle. It was developed by Tianqi Chen and provides a particularly efficient implementation of the Gradient Boosting algorithm. Although there is a CLI implementation of XGBoost you’ll probably be more interested in using it from either R or Python. Below are instructions for getting it installed for each of these languages. It’s pretty painless.
Installing for R
Installation in R is extremely simple.
> install.packages('xgboost') > library(xgboost)
It’s also supported as a model in caret, which is especially handy for feature selection and model parameter tuning.
Installing for Python
This might be as simple as
$ pip install xgboost
If you run into trouble with that, try the alternative approach below.
Download the latest version from the github repository. The simplest way to do this is to grab the archive of a recent release. Unpack the archive, then become root and then execute the following:
# cd xgboost-master # make # cd python-package/ # python setup.py install -user
And you’re ready to roll:
If you run into trouble during the process you might have to install a few other packages:
# apt-get install g++ gfortran # apt-get install python-dev python-numpy python-scipy python-matplotlib python-pandas # apt-get install libatlas-base-dev
Enjoy building great models with the absurdly powerful tool. I’ve found that it effortlessly consumes vast data sets that grind other algorithms to a halt. Get started by looking at some code examples. Also worth looking at are
- an Introduction to Boosted Trees;
- a tutorial showing how XGBoost was applied to the Otto Group Product Classification Challenge;
- Understanding Gradient Boosting (Part 1); and
- a presentation by Alexander Ihler.