Camoufox in Docker

My scrapers often run in a serverless environment. If I’m using Camoufox then that needs to be baked into my Docker image too.

Basic Dockerfile

First a basic Dockerfile that does the following:

  1. Derives from a Python base image.
  2. Install some system dependencies required to run Firefox.
  3. Installs the camoufox package.
  4. Fetches the Camoufox data (browser, fingerprint data and addons).
  5. Copies a simple script.
FROM python:3.12

RUN apt-get update && \
    apt-get install -y \
        libgtk-3-0 \
        libasound2 \
        libx11-xcb1

RUN pip3 install camoufox[geoip]
RUN camoufox fetch

COPY script.py .

CMD ["python3", "script.py"]

Here’s the content of the simple script. It simply loads a page and prints the contents.

from camoufox.sync_api import Camoufox

with Camoufox(headless=True) as browser:
    page = browser.new_page()
    page.goto("https://www.example.com")

    print(page.content())

Fiddly Dockerfile

Here’s a different approach. For various reasons I needed to download the uBlock Origin addon separately. I wanted to prevent it from being downloaded during camoufox fetch. Unpacking the addon into the correct location (under ~/.cache/camoufox/) is simple. However, there’s a snag: normally camoufox fetch initially deletes the contents of ~/.cache/camoufox/. The Dockerfile below uses sed to remove those lines.

FROM python:3.12

RUN apt-get update && \
    apt-get install -y \
        libgtk-3-0 \
        libasound2 \
        libx11-xcb1 \
        wget

RUN wget https://addons.mozilla.org/firefox/downloads/latest/ublock-origin/latest.xpi

RUN pip3 install camoufox[geoip]

RUN SITE_PACKAGES=$(python3 -c "import sysconfig; print(sysconfig.get_paths()['purelib'])") && \
    sed -i '/Cleaning up cache/{N;d;}' $SITE_PACKAGES/camoufox/pkgman.py

RUN mkdir -p /root/.cache/camoufox/addons/UBO/ && \
    unzip -qq latest.xpi -d /root/.cache/camoufox/addons/UBO/

RUN camoufox fetch