Mocking S3 from Python tests

Code that moves data to and from S3 can slow down testing. A lot. This post demonstrates how you can speed things up by mocking S3.

What are the advantages of mocking? These are the most important for me:

Speed (saves you time!).
Don’t need to clean up afterwards (don’t want testing data bloating your bucket!).
Don’t need to worry about clobbering valid data.

Tests

For the sake of illustration, these are the two functions that I want to test. They use the boto3 library to interact with S3.

import os
import boto3

S3_BUCKET = "datawookie-scratch"

AWS_REGION = os.getenv("AWS_REGION")
BOTO3_ENDPOINT_URL = os.getenv("BOTO3_ENDPOINT_URL")

s3 = boto3.client('s3', region_name=AWS_REGION, endpoint_url=BOTO3_ENDPOINT_URL)
  
def store_put(path):
  key = os.path.basename(path)
  
  s3.upload_file(path, S3_BUCKET, key)

  return key

def store_get(key, path=None):
  if path is None:
    path = key

  s3.download_file(S3_BUCKET, key, path)

  return path

The functions live in the store.py module. Here’s a simple test that copies the contents of /etc/passwd to and from S3, ensuring that the contents of the retrieved file match the original.

import hashlib
import pathlib
from unittest import TestCase

from store import s3, store_put, store_get, S3_BUCKET

# Need to create a bucket for testing.
#
s3.create_bucket(Bucket=S3_BUCKET)

def file_hash(path):
  return hashlib.blake2b(pathlib.Path(path).read_bytes()).hexdigest()

class TestStore(TestCase):
  def test_store(self):
    key = store_put("/etc/passwd")
    
    assert key == "passwd"
    
    path = store_get(key)

    assert path == "passwd"
    
    assert file_hash("/etc/passwd") == file_hash(path)

Even using S3 a single test like this is pretty quick. But if you’re testing a lot of functions that access S3 or moving a lot of data back and forth then it can get time consuming.

Moto

Moto is a library for mocking AWS services. You can install it directly.

pip3 install moto

However, I prefer to use the Docker image, which is documented here.

Docker Setup

Either create a moto container from the command line.

docker run -p 3000:3000 motoserver/moto:4.1.13

Or launch it via Docker Compose.

docker-compose up

A simple docker-compose.yml would look something like this:

version: "3.9"

services:
  moto:
    image: motoserver/moto:4.1.13
    ports:
      - "3000:3000"
    environment:
      - MOTO_PORT=3000

The Moto server will be running on local port 3000. You need to tell the boto3 library to use the Moto server rather than S3. There are a few ways to do this. You can use a decorator or context manager. However, both of these approaches require you to modify existing tests. My preferred approach is to use the endpoint_url argument when creating the S3 client (take another look at the store.py module above). Set the BOTO3_ENDPOINT_URL environment variable either in the shell prior to running the tests or directly in the tests using os.environ.

export AWS_REGION="us-east-1"
export BOTO3_ENDPOINT_URL="http://localhost:3000"

Outside of testing the BOTO3_ENDPOINT_URL variable won’t be defined.

Timing Statistics

We’ll user hyperfine to gather robust statistics to compare the execution time of the tests using S3 and Moto.

hyperfine -r 20 'pytest'

These are the numbers for S3.

Time (mean ± σ):      1.612 s ±  0.180 s    [User: 0.272 s, System: 0.030 s]
Range (min … max):    1.330 s …  1.944 s    20 runs

And these are the corresponding numbers using Moto.

Time (mean ± σ):     278.4 ms ±  34.4 ms    [User: 243.7 ms, System: 25.8 ms]
Range (min … max):   243.8 ms … 413.2 ms    20 runs

The local mocking server is, not surprisingly, much quicker: an average of 278.4 milliseconds versus 1.612 seconds.