Headless Browser Hacks

Sometimes a site will work fine with Selenium or Playwright until you try headless mode. Then it might fling up some anti-both mechanism. Or just stop responding altogether. Fortunately there are some simple things that you can do to work around this.

These are the approaches that I usually take.

Explicit User Agent

Using an explicit User Agent (rather than the one used by default with Selenium or Playwright) is often enough to persuade a site that you are a legitimate browser.

Get a User Agent string from a recently updated browser.

USER_AGENT = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:137.0) Gecko/20100101 Firefox/137.0"

That’s from Firefox. You might have more success with a Chrome User Agent. Try various options.

With Selenium:

options = Options()
options.add_argument(f"user-agent={USER_AGENT}")

driver = webdriver.Chrome(options=options)

With Playwright:

browser = p.chromium.launch(headless=True)

context = browser.new_context(user_agent=USER_AGENT)

page = context.new_page()

Virtual Framebuffer

Another approach is to simply not use a headless browser at all.

Normally I’ll use a headless browser under the following conditions:

  • a crawler that’s working reliably and I don’t need to monitor its progress;
  • a crawler that’s running on a remote server; or
  • a crawler that’s running in a container.

Presumably the first case is not applicable here because we’re talking about crawler’s that don’t run well in headless mode. In both of the remaining cases using a Virtual Framebuffer is a good option.

First install the xvfb package. Instruction below is for Debian-based machines.

sudo apt-get update -q
sudo apt-get install xvfb

Now start the framebuffer as a background process.

Xvfb :99 -screen 0 1024x768x16 &

That will effectively launch a virtual X11 server. It’s “virtual” in the sense that it doesn’t require any physical hardware. Breaking down the arguments:

  • :99 — the display number;
  • -screen 0 — the screen number;
  • 1024x768x16 — the width, height and colour depth of the virtual display.

Set the DISPLAY environment variable to correspond to the display number specified when launching the framebuffer.

export DISPLAY=:99

Now you can run the crawler without headless mode but it will not launch a browser window onto a physical display. Rather the browser will be rendered onto the framebuffer.