RAM & CPU Requirements for a Selenium Crawler

How much memory and CPU resources should be allocated to a simple Selenium crawler? I’ve been fudging these parameters but the time has come to man up and do this right.

I want my task to have sufficient resources that it’s able to perform its function. It should never be starved of resources! But, at the same, I also don’t want to extravagantly allocate excess resources. More resources → higher costs. I want to allocate the minimal resources to get the job done.

Setup

I’ve got an ECS service running a single task. The task has the following containers:

  • Selenium (using the Docker image for headless Chrome) and
  • crawler.

The duration of the crawl is around 90 minutes and a new task is triggered immediately after the old task terminates.

Minimal Task Parameters

After some experimentation I found that despite having two containers in this task, the memory and CPU requirements were modest. I could use the smallest possible task size:

  • Task memory: 512 MiB and
  • Task CPU: 0.25 vCPU (256 CPU units)

CPU is being shared evenly between containers (each gets 128 CPU units).

ECS task size settings.

Robust Task Parameters

Although the minimal task parameters above will work for a simple crawler, if you start hitting any complicated web pages and open up a series of pages, then Selenium will be more demanding of resources and this setup will probably just fall over.

A more robust setup would allocate 2 Gib (2048 MiB) and 0.5 vCPU (512 CPU units) for Selenium alone.

Task Performance

We can take a look at the container insights for this task to see the memory and CPU utilisation. We’re consistently using around 70% of the allocated CPU resources. This is about where I want to be: close to full utilisation but with some buffer. Memory utilisation averages approximately 55%. Still some room to spare, but since this is the smallest task memory size available on ECS I can’t narrow the margin any further.

Resource utilisation for a Selenium crawler.

I now have a task which is running reliably on ECS but using a minimal set of resources. A robust yet frugal solution.