Accessing PySpark from a Jupyter Notebook

Jupyter

Spark

Published

4 Jul 2017 12:00

It’d be great to interact with PySpark from a Jupyter Notebook. This post describes how to get that set up. It assumes that you’ve installed Spark like this.

Install the findspark package. bash pip3 install findspark
Make sure that the SPARK_HOME environment variable is defined
Launch a Jupyter Notebook. bash jupyter notebook
Import the findspark package and then use findspark.init() to locate the Spark process and then load the pyspark module. See below for a simple example.