In this post we’ll look at setting up an ECS Service. A service is just a persistent set of tasks running on a cluster.
Our existing task includes both a Selenium container and a crawler container. However, the Selenium container is a bit of a bottle neck since it takes a little while to spin up. Also, if we had a number of crawlers, each of which using Selenium, then this approach would require multiple Selenium containers. That seems extravagant.
A better design might be to have a single Selenium container which runs continuously and to which crawlers connect as required. We need a Selenium service.
Create a Selenium Task
Create a task definition which contains only a Selenium container. You can do this via the AWS console. A configuration JSON is included below for reference. 🚨 To create a robust Selenium service I’ve allocated 2 Gib (2048 MiB) and 0.5 vCPU (512 CPU units) for this task. This should ensure that it’s able to handle a reasonable load. If the service tends to fall over, increase the memory and/or CPU allocation.
{
"family": "selenium",
"revision": 2,
"status": "ACTIVE",
"cpu": "512",
"memory": "2048",
"networkMode": "awsvpc",
"volumes": [],
"containerDefinitions": [
{
"image": "selenium/standalone-chrome:3.141",
"name": "selenium",
"logConfiguration": {
"options": {
"awslogs-group": "/ecs/selenium"
}
},
"portMappings": [
{
"hostPort": 4444,
"protocol": "tcp",
"containerPort": 4444
}
],
"environment": [
{
"name": "SE_OPTS",
"value": "-sessionTimeout 3600"
},
"healthCheck": {
"command": [
"CMD-SHELL",
"curl -f http://localhost:4444/ || exit 1"
],
"timeout": 2,
"interval": 5,
"retries": 3
}
}
]
}
Create a Selenium Service
- Login to AWS and go to the ECS Dashboard.
- Click on the link for a cluster.
- Select the Services tab (this is the default) and press the .
- Configure the service.
- Select the Fargate launch type.
- Choose the previously created Selenium task as the task definition family.
- Specify a service name and the number of tasks (just 1 for the moment).
- You can also tweak the capacity provider strategy. A capacity provider strategy specifies how tasks are distributed across a cluster’s capacity providers. You can either use the default strategy or click on Switch to capacity provider strategy to specify another strategy. These are the options:
FARGATE
FARGATE_SPOT
— A more cost effective strategy which uses spare compute capacity to run “interruption tolerant” tasks which may occasionally be stopped.
- Press the button.
- Configure the network.
- Choose a cluster VPC and select all subnets.
- If you want the service to have a public IP then enable it. This will ensure that a public IP address is assigned to each task launched by this service.
- Edit the security group and allow all inbound traffic. You could be more specific, but this will do for the moment. 🚨 This is very important. If you don’t permit inbound access (at least on the required ports) then this task will be unreachable.
- Check the Enable service discovery integration box.
- If you have an existing Route 53 namespace, choose that, otherwise select the option to create a new private namespace.
- Choose a service discovery name.
- Specify
A
as the DNS record type and 60 seconds as the DNS TTL.
- Press the button.
- Set auto scaling.
- Press the button.
- Review.
- Press the button.
It might take a few moments to create the service.
Press the button.
Find the Service Discovery Endpoint
You’ll need to find the service discovery endpoint. Go back to the cluster view and select the Services tab. Click on the link to the service. Select the Details tab. The service discovery endpoint will be listed. It might be something like selenium.datawookie.dev
.
Specify Service Discovery Endpoint
We’ll update the crawler to optionally get the hostname and port for the Selenium service from environment variables.
import os, sys, logging
from time import sleep
from subprocess import Popen, DEVNULL, STDOUT
from selenium import webdriver
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s [%(levelname)7s] %(message)s',
)
HOST = os.getenv("SELENIUM_HOST", "localhost")
PORT = os.getenv("SELENIUM_PORT", "4444")
SELENIUM_URL = f"{HOST}:{PORT}"
RETRY = 10
TIMEOUT = 5
# Check connection to host and port.
#
def check_connection():
process = Popen(
['nc', '-zvw', str(TIMEOUT), HOST, PORT],
stdout=DEVNULL,
stderr=STDOUT
)
#
if process.wait() != 0:
logging.warning(f"â›” Unable to communicate with {SELENIUM_URL}.")
return False
else:
logging.info(f"✅ Can connect to {SELENIUM_URL}!")
return True
for i in range(RETRY):
if check_connection():
break
logging.info("Sleeping.")
sleep(1)
else:
logging.error(f"🚨 Failed to connect to {SELENIUM_URL}!")
sys.exit(1)
browser = webdriver.Remote(
f"http://{SELENIUM_URL}/wd/hub",
{'browserName': 'chrome'}
)
browser.get("https://www.google.com")
logging.info(f"Retrieved URL: {browser.current_url}.")
browser.close()
After rebuilding and pushing the crawler image we need to update the crawler task. In the definition for the crawler container create a SELENIUM_HOST
environment variable and set its value to the service discovery endpoint.
The task definition JSON might look something like this:
{
"family": "crawler",
"revision": 11,
"status": "ACTIVE",
"cpu": "256",
"memory": "512",
"networkMode": "awsvpc",
"volumes": [],
"containerDefinitions": [
{
"image": "datawookie/google-selenium",
"name": "crawler",
"logConfiguration": {
"options": {
"awslogs-group": "/ecs/crawler"
}
},
"environment": [
{
"name": "SELENIUM_HOST",
"value": "selenium.datawookie.dev"
}
]
}
]
}
Note the definition of the SELENIUM_HOST
environment variable.
Run the task. If we got everything right then it will connect to the Selenium service. Let’s check the CloudWatch logs.
2021-04-28 04:52:08,210 ✅ Can connect to selenium.datawookie.dev:4444!
2021-04-28 04:52:19,385 Retrieved URL: https://www.google.com/.
🚀 Success!
Additional Notes
Turn a Task into a Service
Another way to create a service is from an existing task.
- Click on Task Definitions (menu on left).
- Follow the link to the task.
- Select the most recent revision.
- Click the Actions dropdown and select the Create Service option.
- Fill in the service details as above.
A task can be turned into a service?
- Create service
- Update service
- Deregister
Disabling a Service
Suppose that you want to temporarily disable a service. No problem, just set number of tasks to zero.