Playwright Browser Footprint

Playwright launches a browser. And browsers can be resource hungry beasts.

I often run Playwright on small, resource constrained virtual machines or in a serverless environment. These normally don’t have a lot of memory or disk space. Running out of either of these resources will cause Playwright (and potentially other processes) to fall over.

Is it possible to prune Playwright so that it plays better in a resource constrained environment? Let’s see.

🚨 Caveat: These measures might make Playwright more lightweight, but this might not be all good. By stripping down the features in a browser you potentially make it an easier target for anti-bot mechanisms.

Launching a Lean Browser

Headless browsers generally consume fewer resources, so it’s a good idea to go headless. But there are a number of other startup options that you can tweak to reduce a browser’s footprint.

Chromium

There are a bunch of relevant command line arguments that can be supplied when launching Chromium.

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(
        headless=True,
        args=[
            "--disable-background-networking",
            "--disable-background-timer-throttling",
            "--disable-client-side-phishing-detection",
            "--disable-dev-shm-usage",
            "--disable-extensions",
            "--disable-gpu",
            "--disable-popup-blocking",
            "--disable-renderer-backgrounding",
            "--disable-setuid-sandbox",
            "--disable-software-rasterizer",
            "--mute-audio",
            "--no-first-run",
            "--no-sandbox",
        ],
    )

    context = browser.new_context(device_scale_factor=1, is_mobile=True)
    page = context.new_page()
    page.goto("https://example.com")

    page.wait_for_timeout(5000)

    browser.close()

What do they do?

  • --disable-background-networking — Disable features which typically improve browsing experience.
  • --disable-background-timer-throttling — Don’t worry about throttling JavaScript timer for background jobs.
  • --disable-client-side-phishing-detection
  • --disable-dev-shm-usage — Don’t use /dev/shm.
  • --disable-extensions — Don’t load any extensions.
  • --disable-gpu — Disable hardware acceleration. Chromium still runs GPU processes in headless mode.
  • --disable-popup-blocking — Don’t block popups.
  • --disable-renderer-backgrounding
  • --disable-setuid-sandbox
  • --disable-software-rasterizer
  • --mute-audio
  • --no-first-run — Don’t perform any of the actions from browser first run.
  • --no-sandbox — Disable security sandbox.

I’ve been indiscriminate and included everything that might conceivably have an effect. At some stage it would be worthwhile winnowing down the list of arguments because it’s likely that some have negligible effect.

These options should work with Chrome and Edge channels too.

Let’s take a look at the memory profile for three configurations:

  • standard — launch a browser window;
  • headless — no browser window but otherwise default settings; and
  • minimal — using the settings above.

In each case the browser is left to idle for 5 seconds after launch.

The plot below shows the total memory consumption (the Python driver process as well as the child browser) versus time. I used the memory_profiler module to record the memory profiles. The peak memory consumption for standard is 1094 MB, while for the headless and minimal alternatives it drops to 706 MB and 690 MB respectively. The biggest saving comes from simply using a headless browser. However, the other options will shave off a bit more.

It also looks like minimal launches a bit quicker, while there’s little difference between the standard and headless launch times.

Firefox

What about Firefox? It’s known to be less of a memory hog than Chromium. There are a couple of options and user preference tweaks that can be applied.

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.firefox.launch(
        headless=True,
        args=["--no-remote", "--safe-mode"],
        firefox_user_prefs={
            "browser.sessionstore.resume_from_crash": False,
            "browser.startup.homepage": "about:blank",
            "datareporting.policy.dataSubmissionEnabled": False,
            "devtools.jsonview.enabled": False,
            "dom.ipc.processCount": 1,
            "dom.ipc.processCount.web": 1,
            "extensions.enabledScopes": 0,
            "layers.acceleration.disabled": True,
            "layout.css.devPixelsPerPx": "1.0",
            "media.autoplay.enabled": False,
            "network.preload": False,
            "permissions.default.image": 2,
            "toolkit.telemetry.reportingpolicy.firstRun": False,
        },
    )

    context = browser.new_context()
    page = context.new_page()
    page.goto("https://example.com")

    page.wait_for_timeout(5000)

    browser.close()

Once again I’ve thrown in everything but the kitchen sink. It’s quite likely that some of these tweaks have little or no effect.

We’ll consider the same three configurations. The peak memory consumption for standard is 874 MB, while for the headless and minimal alternatives it drops to 826 MB and 770 MB respectively. There’s really not a big difference between the three configurations, although standard requires the most memory and minimal the least. However, the standard Firefox option is significantly leaner than the corresponding Chromium configuration. Again, standard is the slowest to launch while minimal is the fastest.

WebKit

I’ve not used WebKit browsers. Normally I default to Firefox or a Chrome derivative. However, I’ve heard that WebKit is a reasonable option for resource constrained environments, so it was worth finding out more. WebKit is an open-source browser engine originally developed by Apple. WebKit is the engine used by the Safari browser. The corresponding engines for Firefox and Chrome are Gecko and Blink.

Although Safari doesn’t run on Windows and Linux, Playwright has a custom build of WebKit that works on these platforms.

I’m not aware of a way to make a WebKit driver any more minimal. We’ll simply compare Playwright using WebKit with a UI to headless. Below is a plot of the memory consumption profiles for these two scenarios. There’s really little to choose between the two regarding peak memory use. Headless does fire up a little quicker. However, compare the memory used by the standard and headless versions, 590 MB and 588 MB, to the corresponding values for Firefox and Chromium, and you’ll note that this is a very lightweight alternative.

Other Interventions

Are there other things that you can do? Yes, indeed!

Context Tweaks

Once you’ve instantiated a browser the next step is to create a context. Here too there are options for reducing the memory footprint.

context = browser.new_context(
  device_scale_factor=1,
  is_mobile=True,
  viewport={"width": 375, "height": 667}
)

The is_mobile=True setting results in mobile emulation, which should reduce memory consumption provided that the mobile version of the site isn’t more JavaScript heavy! Applying device_scale_factor=1 ensures that you don’t simulate high resolution screens. Using a custom viewport results in a smaller screen size, which means that less content is rendered at any one time, which can also save some memory.

Routing Tweaks

Another option is to avoid downloading bulky resources like fonts, images and media files. This should save time too!

def handler(route, request):
    # Don't satisfy requests for these resources.
    if request.resource_type in ["font", "image", "media"]:
        route.abort()
    else:
        route.continue_()

page.route("**/*", handler)

Conclusion

These efficacy of these options may vary from site to site. Generally a bit of trial and error is required to find the right combination. If, however, you need to run Playwright on platform with limited resources then it’s worth the effort.