How much memory and CPU resources should be allocated to a simple Selenium crawler? I’ve been fudging these parameters but the time has come to man up and do this right.
I want my task to have sufficient resources that it’s able to perform its function. It should never be starved of resources! But, at the same, I also don’t want to extravagantly allocate excess resources. More resources → higher costs. I want to allocate the minimal resources to get the job done.
Setup
I’ve got an ECS service running a single task. The task has the following containers:
- Selenium (using the Docker image for headless Chrome) and
- crawler.
The duration of the crawl is around 90 minutes and a new task is triggered immediately after the old task terminates.
Minimal Task Parameters
After some experimentation I found that despite having two containers in this task, the memory and CPU requirements were modest. I could use the smallest possible task size:
- Task memory: 512 MiB and
- Task CPU: 0.25 vCPU (256 CPU units)
CPU is being shared evenly between containers (each gets 128 CPU units).
Robust Task Parameters
Although the minimal task parameters above will work for a simple crawler, if you start hitting any complicated web pages and open up a series of pages, then Selenium will be more demanding of resources and this setup will probably just fall over.
A more robust setup would allocate 2 Gib (2048 MiB) and 0.5 vCPU (512 CPU units) for Selenium alone.
Task Performance
We can take a look at the container insights for this task to see the memory and CPU utilisation. We’re consistently using around 70% of the allocated CPU resources. This is about where I want to be: close to full utilisation but with some buffer. Memory utilisation averages approximately 55%. Still some room to spare, but since this is the smallest task memory size available on ECS I can’t narrow the margin any further.
I now have a task which is running reliably on ECS but using a minimal set of resources. A robust yet frugal solution.