We’re developing some training about Apache Airflow and need to have a robust and portable environment for running demos and labs which we can make available to the class. This will reduce the frustration and time wasted getting everybody set up and ensure that everybody is working in the same environment.
Docker seems like a good option.
There are various Docker images which package Airflow. What I’m setting up here is somewhat different though: a desktop environment in Docker which includes Airflow.
Build the Image
I’m building an image based on the dorowu/ubuntu-desktop-lxde-vnc image, which is Ubuntu with an LXDE desktop.
FROM dorowu/ubuntu-desktop-lxde-vnc:focal
COPY requirements.txt .
RUN apt-get update -qq && \
# Install Python.
apt-get install -y \
python3 \
python3-pip && \
# Install Python packages.
pip3 install -r requirements.txt && \
rm -f requirements.txt && \
# Install R.
apt-get install -y \
r-base && \
Rscript -e "install.packages(c('littler', 'docopt'))" && \
cd /usr/local/bin && \
ln -s ../lib/R/site-library/littler/examples/install2.r && \
ln -s ../lib/R/site-library/littler/bin/r && \
# Install R packages.
install2.r -e \
dplyr \
ggplot2
We’re going to be running pipelines using Python (via the PythonOperator
) and R (via the BashOperator
). Interpreters for both of these languages are thus installed, along with a few packages.
To build the image:
docker build -t lxde-airflow .
Run the Image
Once the image is built we can give it a whirl.
docker run --rm --name airflow -p 8080:80 -p 5900:5900 lxde-airflow
Access the desktop either via a browser at http://127.0.0.1:8080/ or at 127.0.0.1:5900 using a VNC client.
Setup
Once you’ve connected to the desktop, fire up a terminal and let’s get Airflow up and running.
First check the version of Airflow installed. This is locked down via the requirements.txt
file.
airflow version
2.1.0
Next create and start the database.
airflow db init
Now create a user. You’ll need to specify a password.
airflow users create \
--username admin \
--firstname FIRST_NAME \
--lastname LAST_NAME \
--role Admin \
--email admin@example.org
Start the scheduler (in the background).
airflow scheduler &
Start the web server.
airflow webserver
Fire up a browser (within the Docker desktop!) and go to http://127.0.0.1:8080/. Login with the username and password from earlier.
Voila! Airflow running in a Docker desktop. The commands to get Airflow up and running could be baked into the image but this will be part of the training, so we prefer to leave them out and do it manually.
Resources
Here are some resources if you want to try this yourself: