AWS Containers #8: Setting up a Service

In this post we’ll look at setting up an ECS Service. A service is just a persistent set of tasks running on a cluster.

Our existing task includes both a Selenium container and a crawler container. However, the Selenium container is a bit of a bottle neck since it takes a little while to spin up. Also, if we had a number of crawlers, each of which using Selenium, then this approach would require multiple Selenium containers. That seems extravagant.

A better design might be to have a single Selenium container which runs continuously and to which crawlers connect as required. We need a Selenium service.

Create a Selenium Task 

Create a task definition which contains only a Selenium container. You can do this via the AWS console. A configuration JSON is included below for reference. 🚨 To create a robust Selenium service I’ve allocated 2 Gib (2048 MiB) and 0.5 vCPU (512 CPU units) for this task. This should ensure that it’s able to handle a reasonable load. If the service tends to fall over, increase the memory and/or CPU allocation.

The task definition JSON below has been abridged and edited for clarity.

{
  "family": "selenium",
  "revision": 2,
  "status": "ACTIVE",
  "cpu": "512",
  "memory": "2048",
  "networkMode": "awsvpc",
  "volumes": [],
  "containerDefinitions": [
    {
      "image": "selenium/standalone-chrome:3.141",
      "name": "selenium",
      "logConfiguration": {
        "options": {
          "awslogs-group": "/ecs/selenium"
        }
      },
      "portMappings": [
        {
          "hostPort": 4444,
          "protocol": "tcp",
          "containerPort": 4444
        }
      ],
      "environment": [
        {
          "name": "SE_OPTS",
          "value": "-sessionTimeout 3600"
        },
      "healthCheck": {
        "command": [
          "CMD-SHELL",
          "curl -f http://localhost:4444/ || exit 1"
        ],
        "timeout": 2,
        "interval": 5,
        "retries": 3
      }
    }
  ]
}

Create a Selenium Service 

  1. Login to AWS and go to the ECS Dashboard.
  2. Click on the link for a cluster.
  3. Select the Services tab (this is the default) and press the .
  4. Configure the service.
    • Select the Fargate launch type.
    • Choose the previously created Selenium task as the task definition family.
    • Specify a service name and the number of tasks (just 1 for the moment).
    • You can also tweak the capacity provider strategy. A capacity provider strategy specifies how tasks are distributed across a cluster’s capacity providers. You can either use the default strategy or click on Switch to capacity provider strategy to specify another strategy. These are the options:
      • FARGATE
      • FARGATE_SPOT — A more cost effective strategy which uses spare compute capacity to run “interruption tolerant” tasks which may occasionally be stopped.
    • Press the button.
  5. Configure the network.
    • Choose a cluster VPC and select all subnets.
    • If you want the service to have a public IP then enable it. This will ensure that a public IP address is assigned to each task launched by this service.
    • Edit the security group and allow all inbound traffic. You could be more specific, but this will do for the moment. 🚨 This is very important. If you don’t permit inbound access (at least on the required ports) then this task will be unreachable.
    • Check the Enable service discovery integration box.
      • If you have an existing Route 53 namespace, choose that, otherwise select the option to create a new private namespace.
      • Choose a service discovery name.
      • Specify A as the DNS record type and 60 seconds as the DNS TTL.
    • Press the button.
  6. Set auto scaling.
    • Press the button.
  7. Review.
    • Press the button.

It might take a few moments to create the service.

Press the button.

Find the Service Discovery Endpoint 

You’ll need to find the service discovery endpoint. Go back to the cluster view and select the Services tab. Click on the link to the service. Select the Details tab. The service discovery endpoint will be listed. It might be something like selenium.datawookie.dev.

Specify Service Discovery Endpoint 

We’ll update the crawler to optionally get the hostname and port for the Selenium service from environment variables.

import os, sys, logging
from time import sleep
from subprocess import Popen, DEVNULL, STDOUT
from selenium import webdriver

logging.basicConfig(
  level=logging.INFO,
  format='%(asctime)s [%(levelname)7s] %(message)s',
)

HOST = os.getenv("SELENIUM_HOST", "localhost")
PORT = os.getenv("SELENIUM_PORT", "4444")

SELENIUM_URL = f"{HOST}:{PORT}"

RETRY = 10
TIMEOUT = 5

# Check connection to host and port.
#
def check_connection():
  process = Popen(
    ['nc', '-zvw', str(TIMEOUT), HOST, PORT],
    stdout=DEVNULL,
    stderr=STDOUT
  )
  #
  if process.wait() != 0:
    logging.warning(f"⛔ Unable to communicate with {SELENIUM_URL}.")
    return False
  else:
    logging.info(f"✅ Can connect to {SELENIUM_URL}!")
    return True

for i in range(RETRY):
  if check_connection():
    break
  logging.info("Sleeping.")
  sleep(1)
else:
  logging.error(f"🚨 Failed to connect to {SELENIUM_URL}!")
  sys.exit(1)

browser = webdriver.Remote(
  f"http://{SELENIUM_URL}/wd/hub",
  {'browserName': 'chrome'}
)

browser.get("https://www.google.com")

logging.info(f"Retrieved URL: {browser.current_url}.")

browser.close()

After rebuilding and pushing the crawler image we need to update the crawler task. In the definition for the crawler container create a SELENIUM_HOST environment variable and set its value to the service discovery endpoint.

The task definition JSON might look something like this:

The configuration file below has been abridged and edited for clarity.

{
  "family": "crawler",
  "revision": 11,
  "status": "ACTIVE",
  "cpu": "256",
  "memory": "512",
  "networkMode": "awsvpc",
  "volumes": [],
  "containerDefinitions": [
    {
      "image": "datawookie/google-selenium",
      "name": "crawler",
      "logConfiguration": {
        "options": {
          "awslogs-group": "/ecs/crawler"
        }
      },
      "environment": [
        {
          "name": "SELENIUM_HOST",
          "value": "selenium.datawookie.dev"
        }
      ]
    }
  ]
}

Note the definition of the SELENIUM_HOST environment variable.

Run the task. If we got everything right then it will connect to the Selenium service. Let’s check the CloudWatch logs.

2021-04-28 04:52:08,210 ✅ Can connect to selenium.datawookie.dev:4444!
2021-04-28 04:52:19,385 Retrieved URL: https://www.google.com/.

🚀 Success!

Additional Notes 

Turn a Task into a Service 

Another way to create a service is from an existing task.

  1. Click on Task Definitions (menu on left).
  2. Follow the link to the task.
  3. Select the most recent revision.
  4. Click the Actions dropdown and select the Create Service option.
  5. Fill in the service details as above.

A task can be turned into a service?

  • Create service
  • Update service
  • Deregister

Disabling a Service 

Suppose that you want to temporarily disable a service. No problem, just set number of tasks to zero.