Accessing PySpark from a Jupyter Notebook

It’d be great to interact with PySpark from a Jupyter Notebook. This post describes how to get that set up. It assumes that you’ve installed Spark like this.

Install the findspark package.
```
pip3 install findspark
```
Make sure that the SPARK_HOME environment variable is defined
Launch a Jupyter Notebook.
```
jupyter notebook
```
Import the findspark package and then use findspark.init() to locate the Spark process and then load the pyspark module. See below for a simple example.

A Jupyter notebook using the `findspark` and `pyspark` packages.