Headless Browser Hacks

Sometimes a site will work fine with Selenium or Playwright until you try headless mode. Then it might fling up some anti-both mechanism. Or just stop responding altogether. Fortunately there are some simple things that you can do to work around this.

These are the approaches that I usually take.

Realistic Explicit User Agent

Using an explicit User Agent (rather than the one used by default with Selenium or Playwright) is often enough to persuade a site that you are a legitimate browser.

Get a User Agent string from a recently updated browser.

USER_AGENT = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:137.0) Gecko/20100101 Firefox/137.0"

That’s from Firefox. You might have more success with a Chrome User Agent. Try various options.

With Selenium:

options = Options()
options.add_argument(f"user-agent={USER_AGENT}")

driver = webdriver.Chrome(options=options)

With Playwright:

browser = p.chromium.launch(headless=True)

context = browser.new_context(user_agent=USER_AGENT)

page = context.new_page()

Realistic Window Size

In the same way that a realistic User Agent can improve the browser fingerprint, so too can setting a non-standard window size.

With Selenium:

driver.set_window_size(1250, 750)

With Playwright:

context = browser.new_context(viewport={"width": 1250, "height": 750})

Appearing to be Less Headless

There are other settings that can be applied to make the browser less likely to be flagged as automated.

With Selenium:

# Chrome
options.add_argument("--disable-blink-features=AutomationControlled")

With Playwright:

context = browser.new_context(
    device_scale_factor=1,
    is_mobile=False,
    has_touch=False,
)

context.add_init_script("""
    Object.defineProperty(navigator, 'webdriver', {get: () => undefined});
""")

Virtual Framebuffer

Another approach is to simply not use a headless browser at all.

Normally I’ll use a headless browser under the following conditions:

a crawler that’s working reliably and I don’t need to monitor its progress;
a crawler that’s running on a remote server; or
a crawler that’s running in a container.

Presumably the first case is not applicable here because we’re talking about crawler’s that don’t run well in headless mode. In both of the remaining cases using a Virtual Framebuffer is a good option.

First install the xvfb package. Instruction below is for Debian-based machines.

sudo apt-get update -q
sudo apt-get install xvfb

Now start the framebuffer as a background process.

Xvfb :99 -screen 0 1024x768x16 &

That will effectively launch a virtual X11 server. It’s “virtual” in the sense that it doesn’t require any physical hardware. Breaking down the arguments:

:99 — the display number;
-screen 0 — the screen number;
1024x768x16 — the width, height and colour depth of the virtual display.

Set the DISPLAY environment variable to correspond to the display number specified when launching the framebuffer.

export DISPLAY=:99

Now you can run the crawler without headless mode but it will not launch a browser window onto a physical display. Rather the browser will be rendered onto the framebuffer.