There’s one major problem with ChromeDriver: anti-bot services are able to detect that a browser session is being automated (as opposed to being used by a regular meat sack) and will often impose restrictions or deny connections altogether. The Undetected ChromeDriver (
undetected-chromedriver) Python package is a patched version of ChromeDriver which avoids triggering a selection of anti-bot services, allowing it to glide under the anti-bot radar.
What is ChromeDriver?
ChromeDriver is used for testing websites and apps, as well as web scraping. It is often used via Selenium, which provides a consistent, high level interface for controlling a browser. It’s useful to understand the relationship between client programming languages, Selenium, ChromeDriver and the controlled browser.
It’s useful to be able to choose from a selection of browsers. If you’re testing an app or website then you’ll want to be confident that it works on a variety of browsers. If you’re web scraping then your choice of browser might be based on subtle changes in the way that a site is rendered on different browsers, differences in performance and memory footprint, or just personal preference.
The WebDriver specification defines a protocol for remotely inspecting and controlling user agents (which in this context is just a general term for “browsers”). It’s a general specification, which means that it is language and browser agnostic. ChromeDriver and GeckoDriver are implementations of WebDriver for browsers built on the Chromium and Mozilla codebases respectively. They provide the mechanism for controlling a specific browser.
The WebDriver specification provides a low level protocol for communicating with a browser. Using this protocol directly would be hard work. Selenium provides a high level interface to WebDriver, which makes writing client code easier and more efficient.
Undetected ChromeDriver in Docker
You can install the
undetected-chromedriver package using
pip install undetected-chromedriver
Many applications get wrapped up in a Docker image, so it’s rather useful to have Python, the
undetected-chromedriver package, ChromeDriver and a browser all neatly enclosed in a single image.
There’s an Undetected ChromeDriver Docker image. However, the corresponding
Dockerfile is not available and I like to understand what’s gone into an image. So I rolled my own, which can be found here.
We’re going to access two sites:
- https://nowsecure.nl — a test site with “max anti-bot protection” and
- https://datadome.co — a provider of “bot management software”.
💡 If you’re trying this out yourself then you might want to run the examples using Undetected ChromeDriver first before coming back to Selenium because the latter will likely result in your IP address being flagged.
To run these examples I launched a Selenium Docker container exposing VNC on port 5900 and the Selenium hub on port 4444.
docker run -p 4444:4444 -p 5900:5900 selenium/standalone-chrome-debug:3.141.59
First we’ll visit https://nowsecure.nl and take a screenshot.
from selenium import webdriver from selenium.webdriver.common.desired_capabilities import DesiredCapabilities driver = webdriver.Remote("http://127.0.0.1:4444/wd/hub", DesiredCapabilities.CHROME) driver.get("https://nowsecure.nl") driver.set_window_size(1000, 900) driver.save_screenshot('selenium-nowsecure.png')
This is what the screenshot looks like.
It’s a little underwhelming, but it indicates that one of the anti-bot mechanisms on the site is blocking us. Hang on, you’ll see shortly what it should look like. Or just visit the site now. If you experience a sensory assault then it’s confirmation that you’re not a bot.
Now let’s take a swing at https://datadome.co. We’ll take another screenshot to record the result.
Aha! It looks like we’ve been spotted. A CAPTCHA indicates that the site regards the request as suspicious and would normally scupper our attempts to browse the site.
Using Undetected Chromedriver
Now we’ll try the same sites using Undetected Chromedriver. These examples were run in an interactive session using the Undetected Chromedriver Docker image. Again VNC is exposed on port 5900.
docker run -it -p 5900:5900 datawookie/undetected-chromedriver:latest
Let’s visit https://nowsecure.nl.
import undetected_chromedriver as uc driver = uc.Chrome() driver.get("https://nowsecure.nl")
A screenshot indicates that we have penetrated the anti-bot measures.
What about DataDome?
🚨 If your IP address has already been flagged by an anti-bot mechanism then using Undetected ChromeDriver is probably not going to help you. Well, not from the compromised IP address. If you can get a fresh IP address then you’re back in business.
Extending the Undetected Chromedriver Image
The benefits of having a Docker image with the Undetected ChromeDriver functionality is that you can easily create a derived image with additional capabilities. Suppose, for example, that I wanted an Undetected ChromeDriver script that also used the
pyjokes package (because why wouldn’t you?). The script,
doit.py, might look like this:
import undetected_chromedriver as uc import pyjokes driver = uc.Chrome() driver.get("https://nowsecure.nl") print(driver.page_source) print(pyjokes.get_joke())
And the corresponding
Dockerfile would be:
FROM datawookie/undetected-chromedriver:latest RUN pip3 install pyjokes COPY doit.py . CMD ["python", "doit.py"]
This is based on the Undetected ChromeDriver image but adds the
pyjokes package and includes the script itself (a container will automatically run the script).
🚨 Don’t install the following Python packages into the derived image because the correct versions are already in the base image:
It can be useful to record the ChromeDriver logs. This is especially handy if you have trouble with launching Chrome. To do this simply give the
service_log_path argument when you instantiate a
driver = uc.Chrome(service_log_path="chromedriver.log")
Somestimes you might get the following
unknown error: session deleted because of page crash
This is most likely due to the process running out of memory in the Docker container. To get around this use
--shm-size="2g" when running the image.