Undetected ChromeDriver: Stay Below the Radar

There’s one major problem with ChromeDriver: anti-bot services are able to detect that a browser session is being automated (as opposed to being used by a regular meat sack) and will often impose restrictions or deny connections altogether. The Undetected ChromeDriver (undetected-chromedriver) Python package is a patched version of ChromeDriver which avoids triggering a selection of anti-bot services, allowing it to glide under the anti-bot radar.

What is ChromeDriver?

ChromeDriver is used for testing websites and apps, as well as web scraping. It is often used via Selenium, which provides a consistent, high level interface for controlling a browser. It’s useful to understand the relationship between client programming languages, Selenium, ChromeDriver and the controlled browser.

Relationship between ChromeDriver, client code and browser.

Browsers

It’s useful to be able to choose from a selection of browsers. If you’re testing an app or website then you’ll want to be confident that it works on a variety of browsers. If you’re web scraping then your choice of browser might be based on subtle changes in the way that a site is rendered on different browsers, differences in performance and memory footprint, or just personal preference.

WebDriver

The WebDriver specification defines a protocol for remotely inspecting and controlling user agents (which in this context is just a general term for “browsers”). It’s a general specification, which means that it is language and browser agnostic. ChromeDriver and GeckoDriver are implementations of WebDriver for browsers built on the Chromium and Mozilla codebases respectively. They provide the mechanism for controlling a specific browser.

Selenium

The WebDriver specification provides a low level protocol for communicating with a browser. Using this protocol directly would be hard work. Selenium provides a high level interface to WebDriver, which makes writing client code easier and more efficient.

Clients

There are wrappers for the Selenium library which make it accessible from a variety of languages. Possibly the most frequently used languages for this purpose are (IMHO) Java, Python and R, but you could also use C#, Ruby or JavaScript.

Undetected ChromeDriver in Docker

You can install the undetected-chromedriver package using pip.

pip install undetected-chromedriver

Many applications get wrapped up in a Docker image, so it’s rather useful to have Python, the undetected-chromedriver package, ChromeDriver and a browser all neatly enclosed in a single image.

There’s an Undetected ChromeDriver Docker image. However, the corresponding Dockerfile is not available and I like to understand what’s gone into an image. So I rolled my own, which can be found here.

Example

We’re going to access two sites:

💡 If you’re trying this out yourself then you might want to run the examples using Undetected ChromeDriver first before coming back to Selenium because the latter will likely result in your IP address being flagged.

Using Selenium

To run these examples I launched a Selenium Docker container exposing VNC on port 7900 and the Selenium hub on port 4444.

docker run -p 4444:4444 -p 7900:7900 selenium/standalone-chrome-debug:3.141.59

First we’ll visit https://nowsecure.nl and take a screenshot.

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

driver = webdriver.Remote("http://127.0.0.1:4444/wd/hub", DesiredCapabilities.CHROME)

driver.get("https://nowsecure.nl")

driver.set_window_size(1000, 900)
driver.save_screenshot('selenium-nowsecure.png')

This is what the screenshot looks like.

Cloudfare checking for a secure site connection.

It’s a little underwhelming, but it indicates that one of the anti-bot mechanisms on the site is blocking us. Hang on, you’ll see shortly what it should look like. Or just visit the site now. If you experience a sensory assault then it’s confirmation that you’re not a bot.

Now let’s take a swing at https://datadome.co. We’ll take another screenshot to record the result.

driver.get("https://datadome.co")

driver.save_screenshot('selenium-datadome.png')

Aha! It looks like we’ve been spotted. A CAPTCHA indicates that the site regards the request as suspicious and would normally scupper our attempts to browse the site.

CAPTCHA triggered on DataDome website.

Using Undetected Chromedriver

Now we’ll try the same sites using Undetected Chromedriver. These examples were run in an interactive session using the Undetected Chromedriver Docker image. Again VNC is exposed on port 7900.

docker run -it -p 7900:7900 --shm-size=2gb datawookie/undetected-chromedriver:latest

The -shm-size is not always necessary. Depending on the resource use associated with specific web pages it might or not be required. If in doubt, use it! See this post for an explanation of shared memory and Docker.

Let’s visit https://nowsecure.nl.

import undetected_chromedriver as uc

driver = uc.Chrome()
driver.get("https://nowsecure.nl")

A screenshot indicates that we have penetrated the anti-bot measures.

Successfully penetrated anti-bot measures.

What about DataDome?

driver.get("https://datadome.co")

Looks good!

DataDome landing page.

🚨 If your IP address has already been flagged by an anti-bot mechanism then using Undetected ChromeDriver is probably not going to help you. Well, not from the compromised IP address. If you can get a fresh IP address then you’re back in business.

Extending the Undetected Chromedriver Image

The benefits of having a Docker image with the Undetected ChromeDriver functionality is that you can easily create a derived image with additional capabilities. Suppose, for example, that I wanted an Undetected ChromeDriver script that also used the pyjokes package (because why wouldn’t you?). The script, doit.py, might look like this:

import undetected_chromedriver as uc
import pyjokes

driver = uc.Chrome()
driver.get("https://nowsecure.nl")

print(driver.page_source)

print(pyjokes.get_joke())

And the corresponding Dockerfile would be:

FROM datawookie/undetected-chromedriver:latest

RUN pip3 install pyjokes

COPY doit.py .

CMD ["python", "doit.py"]

This is based on the Undetected ChromeDriver image but adds the pyjokes package and includes the script itself (a container will automatically run the script).

🚨 Don’t install the following Python packages into the derived image because the correct versions are already in the base image:

  • selenium
  • requests or
  • urllib3
  • undetected_chromedriver.

Logging

It can be useful to record the ChromeDriver logs. This is especially handy if you have trouble with launching Chrome. To do this simply give the service_log_path argument when you instantiate a Chrome object.

driver = uc.Chrome(service_log_path="chromedriver.log")

Troubleshooting

Somestimes you might get the following WebDriverException:

unknown error: session deleted because of page crash

This is most likely due to the process running out of memory in the Docker container. To get around this use --shm-size="2g" when running the image.