Mocking S3 from Python tests

Code that moves data to and from S3 can slow down testing. A lot. This post demonstrates how you can speed things up by mocking S3.

What are the advantages of mocking? These are the most important for me:

  • Speed (saves you time!).
  • Don’t need to clean up afterwards (don’t want testing data bloating your bucket!).
  • Don’t need to worry about clobbering valid data.

Tests

For the sake of illustration, these are the two functions that I want to test. They use the boto3 library to interact with S3.

import os
import boto3

S3_BUCKET = "datawookie-scratch"

AWS_REGION = os.getenv("AWS_REGION")
BOTO3_ENDPOINT_URL = os.getenv("BOTO3_ENDPOINT_URL")

s3 = boto3.client('s3', region_name=AWS_REGION, endpoint_url=BOTO3_ENDPOINT_URL)
  
def store_put(path):
  key = os.path.basename(path)
  
  s3.upload_file(path, S3_BUCKET, key)

  return key

def store_get(key, path=None):
  if path is None:
    path = key

  s3.download_file(S3_BUCKET, key, path)

  return path

The functions live in the store.py module. Here’s a simple test that copies the contents of /etc/passwd to and from S3, ensuring that the contents of the retrieved file match the original.

import hashlib
import pathlib
from unittest import TestCase

from store import s3, store_put, store_get, S3_BUCKET

# Need to create a bucket for testing.
#
s3.create_bucket(Bucket=S3_BUCKET)

def file_hash(path):
  return hashlib.blake2b(pathlib.Path(path).read_bytes()).hexdigest()

class TestStore(TestCase):
  def test_store(self):
    key = store_put("/etc/passwd")
    
    assert key == "passwd"
    
    path = store_get(key)

    assert path == "passwd"
    
    assert file_hash("/etc/passwd") == file_hash(path)

Even using S3 a single test like this is pretty quick. But if you’re testing a lot of functions that access S3 or moving a lot of data back and forth then it can get time consuming.

Moto

Moto is a library for mocking AWS services. You can install it directly.

pip3 install moto

However, I prefer to use the Docker image, which is documented here.

Docker Setup

Either create a moto container from the command line.

docker run -p 3000:3000 motoserver/moto:4.1.13

Or launch it via Docker Compose.

docker-compose up

A simple docker-compose.yml would look something like this:

version: "3.9"

services:
  moto:
    image: motoserver/moto:4.1.13
    ports:
      - "3000:3000"
    environment:
      - MOTO_PORT=3000

The Moto server will be running on local port 3000. You need to tell the boto3 library to use the Moto server rather than S3. There are a few ways to do this. You can use a decorator or context manager. However, both of these approaches require you to modify existing tests. My preferred approach is to use the endpoint_url argument when creating the S3 client (take another look at the store.py module above). Set the BOTO3_ENDPOINT_URL environment variable either in the shell prior to running the tests or directly in the tests using os.environ.

export AWS_REGION="us-east-1"
export BOTO3_ENDPOINT_URL="http://localhost:3000"

Outside of testing the BOTO3_ENDPOINT_URL variable won’t be defined.

Timing Statistics

We’ll user hyperfine to gather robust statistics to compare the execution time of the tests using S3 and Moto.

hyperfine -r 20 'pytest'

These are the numbers for S3.

Time (mean ± σ):      1.612 s ±  0.180 s    [User: 0.272 s, System: 0.030 s]
Range (min … max):    1.330 s …  1.944 s    20 runs

And these are the corresponding numbers using Moto.

Time (mean ± σ):     278.4 ms ±  34.4 ms    [User: 243.7 ms, System: 25.8 ms]
Range (min … max):   243.8 ms … 413.2 ms    20 runs

The local mocking server is, not surprisingly, much quicker: an average of 278.4 milliseconds versus 1.612 seconds.