My scrapers often run in a serverless environment. If I’m using Camoufox then that needs to be baked into my Docker image too.
Basic Dockerfile
First a basic Dockerfile
that does the following:
- Derives from a Python base image.
- Install some system dependencies required to run Firefox.
- Installs the
camoufox
package. - Fetches the Camoufox data (browser, fingerprint data and addons).
- Copies a simple script.
FROM python:3.12
RUN apt-get update && \
apt-get install -y \
libgtk-3-0 \
libasound2 \
libx11-xcb1
RUN pip3 install camoufox[geoip]
RUN camoufox fetch
COPY script.py .
CMD ["python3", "script.py"]
Here’s the content of the simple script. It simply loads a page and prints the contents.
from camoufox.sync_api import Camoufox
with Camoufox(headless=True) as browser:
page = browser.new_page()
page.goto("https://www.example.com")
print(page.content())
Fiddly Dockerfile
Here’s a different approach. For various reasons I needed to download the uBlock Origin addon separately. I wanted to prevent it from being downloaded during camoufox fetch
. Unpacking the addon into the correct location (under ~/.cache/camoufox/
) is simple. However, there’s a snag: normally camoufox fetch
initially deletes the contents of ~/.cache/camoufox/
. The Dockerfile
below uses sed
to remove those lines.
FROM python:3.12
RUN apt-get update && \
apt-get install -y \
libgtk-3-0 \
libasound2 \
libx11-xcb1 \
wget
RUN wget https://addons.mozilla.org/firefox/downloads/latest/ublock-origin/latest.xpi
RUN pip3 install camoufox[geoip]
RUN SITE_PACKAGES=$(python3 -c "import sysconfig; print(sysconfig.get_paths()['purelib'])") && \
sed -i '/Cleaning up cache/{N;d;}' $SITE_PACKAGES/camoufox/pkgman.py
RUN mkdir -p /root/.cache/camoufox/addons/UBO/ && \
unzip -qq latest.xpi -d /root/.cache/camoufox/addons/UBO/
RUN camoufox fetch