Building an Airflow Environment in Docker

We’re developing some training about Apache Airflow and need to have a robust and portable environment for running demos and labs which we can make available to the class. This will reduce the frustration and time wasted getting everybody set up and ensure that everybody is working in the same environment.

Docker seems like a good option.

There are various Docker images which package Airflow. What I’m setting up here is somewhat different though: a desktop environment in Docker which includes Airflow.

Build the Image

I’m building an image based on the dorowu/ubuntu-desktop-lxde-vnc image, which is Ubuntu with an LXDE desktop.

FROM dorowu/ubuntu-desktop-lxde-vnc:focal

COPY requirements.txt .

RUN apt-get update -qq && \
    # Install Python.
    apt-get install -y \
      python3 \
      python3-pip && \
    # Install Python packages.
    pip3 install -r requirements.txt && \
    rm -f requirements.txt && \
    # Install R.
    apt-get install -y \
      r-base && \
    Rscript -e "install.packages(c('littler', 'docopt'))" && \
    cd /usr/local/bin && \
    ln -s ../lib/R/site-library/littler/examples/install2.r && \
    ln -s ../lib/R/site-library/littler/bin/r && \
    # Install R packages.
    install2.r -e \
      dplyr \
      ggplot2

We’re going to be running pipelines using Python (via the PythonOperator) and R (via the BashOperator). Interpreters for both of these languages are thus installed, along with a few packages.

To build the image:

docker build -t lxde-airflow .

Run the Image

Once the image is built we can give it a whirl.

docker run --rm --name airflow -p 8080:80 -p 5900:5900 lxde-airflow

Access the desktop either via a browser at http://127.0.0.1:8080/ or at 127.0.0.1:5900 using a VNC client.

Setup

Once you’ve connected to the desktop, fire up a terminal and let’s get Airflow up and running.

First check the version of Airflow installed. This is locked down via the requirements.txt file.

airflow version
2.1.0

Next create and start the database.

airflow db init

Now create a user. You’ll need to specify a password.

airflow users create \
      --username admin \
      --firstname FIRST_NAME \
      --lastname LAST_NAME \
      --role Admin \
      --email admin@example.org

Start the scheduler (in the background).

airflow scheduler &

Start the web server.

airflow webserver

Fire up a browser (within the Docker desktop!) and go to http://127.0.0.1:8080/. Login with the username and password from earlier.

Airflow in a Docker desktop.

Voila! Airflow running in a Docker desktop. The commands to get Airflow up and running could be baked into the image but this will be part of the training, so we prefer to leave them out and do it it manually.

Resources

Here are some resources if you want to try this yourself: