You might need to run a Selenium crawler in a GitLab CI pipeline. Here’s how to get that set up.
The Crawler
Well, it’s not much of a crawler but it illustrates the setup. This is the script that I want to run via GitLab CI.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--no-sandbox")
options.add_argument("--headless")
options.add_argument("--window-size=1920,1080")
driver = webdriver.Chrome(options=options)
driver.get('http://www.example.com')
print(driver.page_source)
There are a couple of critical options specified.
--no-sandbox
— This is necessary if you’re going to be launching Chrome as theroot
user.--headless
— Since we’re running Chrome in Docker it needs to be headless.--window-size=1920,1080
— Not strictly necessary but I like to set a specific window size. Superstition!
The CI Pipeline
And here’s the .gitlab-ci.yml
file to create the pipeline. We’re installing Chrome and ChromeDriver from Chome for Testing.
chrome:
image: python:3.11.4
stage: test
before_script:
- apt-get update -qq -y
- >
apt-get install -y wget unzip fonts-liberation libasound2 libatk-bridge2.0-0 libatk1.0-0 libatspi2.0-0
libcups2 libdbus-1-3 libdrm2 libgbm1 libgtk-4-1 libnspr4 libnss3 libu2f-udev libvulkan1 libxcomposite1
libxdamage1 libxfixes3 libxkbcommon0 libxrandr2 xdg-utils
# Chrome
- wget -q -O chrome-linux64.zip https://bit.ly/chrome-linux64-121-0-6167-85
- unzip chrome-linux64.zip
- rm chrome-linux64.zip
- mv chrome-linux64 /opt/chrome/
- ln -s /opt/chrome/chrome /usr/local/bin/
# Chromedriver
- wget -q -O chromedriver-linux64.zip https://bit.ly/chromedriver-linux64-121-0-6167-85
- unzip -j chromedriver-linux64.zip chromedriver-linux64/chromedriver
- rm chromedriver-linux64.zip
- mv chromedriver /usr/local/bin/
- chrome --version
- chromedriver --version
- pip3 install selenium
script:
- python3 run.py
What’s going on there? Here’s a breakdown of the steps:
- Install
wget
andunzip
. We’ll usewget
for two downloads andunzip
to unpack a ZIP archive. Also install a bunch of dependencies that are required by Chrome. - Download a specific version of Chrome. For brevity I’m using a shortened URL. The full URL is https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/121.0.6167.85/linux64/chrome-linux64.zip.
- Download a specific version of ChromeDriver. The full URL is https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/121.0.6167.85/linux64/chromedriver-linux64.zip. Unzip and install this directly into the execution path.
- Check on the versions of Chrome and ChromeDriver.
- Install the Selenium package for Python.
- Run the script.
The CI log will show the installed versions of Chrome and ChromeDriver.
$ chrome --version
Google Chrome 121.0.6167.85
$ chromedriver --version
ChromeDriver 121.0.6167.85
And the script will dump the HTML contents from http://www.example.com.