Docker Images for Spark
I recently put together a short training course on Spark. One of the initial components of the course involved deploying a Spark cluster on AWS. I wanted to have Jupyter Notebook and RStudio servers available on the master node too and the easiest way to make that happen was to install Docker and then run appropriate images.
There’s already a jupyter/pyspark-notebook image which includes Spark and Jupyter. It’s a simple matter to extend the rocker/verse image (which already includes RStudio server, the Tidyverse, devtools
and some publishing utilities) to include the sparklyr
package.