This post will show you how to set up the following:
- a Selenium instance and
- a simple script connecting to Selenium.
Both of these will run in Docker containers and will communicate over the host network.
Selenium Service
Create a Selenium container, exposing port 4444 on the host. This means that port 4444 is mapped directly to port 4444 on the host and all requests to port 4444 on the host will be redirected to the same port on the container.
docker run --name selenium -p 4444:4444 selenium/standalone-chrome:3.141
Scraper Template
First we’ll create the framework for a simple scraper in Python.
from selenium import webdriver
SELENIUM_URL = "localhost:4444"
browser = webdriver.Remote(
f"http://{SELENIUM_URL}/wd/hub",
{'browserName': 'chrome'}
)
browser.get("https://www.google.com")
print(f"Retrieved URL: {browser.current_url}.")
browser.close()
It doesn’t actually do any scraping but it does fire up a Selenium session and opens an URL. These are the biggest technical hurdles.
The script connects to Selenium at http://localhost:4444. Since localhost maps to the loopback IP address, 127.0.0.1, you can also use http://127.0.0.1:4444.
SELENIUM_URL = "127.0.0.1:4444"
In either case the script will produce the following result:
python3 google-selenium.py
Retrieved URL: https://www.google.com/.
Scraper Template in Docker
Now we’re going to wrap that script up in its own Docker image. Here’s the Dockerfile
:
FROM python:3.8.5-slim AS base
RUN pip3 install selenium==3.141.0
COPY google-selenium.py /
CMD python3 google-selenium.py
Let’s build the image.
docker build -t google-selenium .
Now run it.
docker run --net=host google-selenium
Retrieved URL: https://www.google.com/.
We’ve specified --net=host
, so we’re using the host’s network and the scraper container is accessing the Selenium instance via port 4444 on the host.
Precisely the same result as earlier (when running the scraper script directly), but now everything (except some networking) is in Docker.
Appendix: Network Details
To learn a little more about using the host network with Docker, let’s scratch beneath the surface. We’ll run a BASH shell using the scraper image.
docker run -it --net=host google-selenium /bin/bash
Now we’re the root
user inside the container. It’s a very lightweight image, so we need to install some networking tools.
root@propane:/# apt update && apt install -y iproute2
Now we can check on what the network configuration looks like in the container.
root@propane:/# ip -br -c a
lo UNKNOWN 127.0.0.1/8 ::1/128
wlp3s0 UP 10.0.0.8/24 fe80::363:4c7c:a305:62a7/64
br-f3c6be594433 DOWN 172.19.0.1/16
docker0 UP 172.17.0.1/16 fe80::42:eeff:fe7f:173f/64
br-e48eb1ef2d48 DOWN 172.22.0.1/16
br-81ccbe03027c DOWN 172.20.0.1/16 fe80::42:b0ff:fe48:4b7f/64
vethd2e0b5e@if55 UP fe80::d4a7:3ff:fefa:47e9/64
This is precisely the same network configuration that you’d see on the host. Effectively, the container has the same network configuration as the host. This is simple, convenient and performant. But it also means that the container is not isolated from the host.
Check out the next post where we’ll use a bridge network to create the same setup.