Code that moves data to and from S3 can slow down testing. A lot. This post demonstrates how you can speed things up by mocking S3.
What are the advantages of mocking? These are the most important for me:
- Speed (saves you time!).
- Don’t need to clean up afterwards (don’t want testing data bloating your bucket!).
- Don’t need to worry about clobbering valid data.
Tests
For the sake of illustration, these are the two functions that I want to test. They use the boto3
library to interact with S3.
import os
import boto3
S3_BUCKET = "datawookie-scratch"
AWS_REGION = os.getenv("AWS_REGION")
BOTO3_ENDPOINT_URL = os.getenv("BOTO3_ENDPOINT_URL")
s3 = boto3.client('s3', region_name=AWS_REGION, endpoint_url=BOTO3_ENDPOINT_URL)
def store_put(path):
key = os.path.basename(path)
s3.upload_file(path, S3_BUCKET, key)
return key
def store_get(key, path=None):
if path is None:
path = key
s3.download_file(S3_BUCKET, key, path)
return path
The functions live in the store.py
module. Here’s a simple test that copies the contents of /etc/passwd
to and from S3, ensuring that the contents of the retrieved file match the original.
import hashlib
import pathlib
from unittest import TestCase
from store import s3, store_put, store_get, S3_BUCKET
# Need to create a bucket for testing.
#
s3.create_bucket(Bucket=S3_BUCKET)
def file_hash(path):
return hashlib.blake2b(pathlib.Path(path).read_bytes()).hexdigest()
class TestStore(TestCase):
def test_store(self):
key = store_put("/etc/passwd")
assert key == "passwd"
path = store_get(key)
assert path == "passwd"
assert file_hash("/etc/passwd") == file_hash(path)
Even using S3 a single test like this is pretty quick. But if you’re testing a lot of functions that access S3 or moving a lot of data back and forth then it can get time consuming.
Moto
Moto is a library for mocking AWS services. You can install it directly.
pip3 install moto
However, I prefer to use the Docker image, which is documented here.
Docker Setup
Either create a moto
container from the command line.
docker run -p 3000:3000 motoserver/moto:4.1.13
Or launch it via Docker Compose.
docker-compose up
A simple docker-compose.yml
would look something like this:
version: "3.9"
services:
moto:
image: motoserver/moto:4.1.13
ports:
- "3000:3000"
environment:
- MOTO_PORT=3000
The Moto server will be running on local port 3000. You need to tell the boto3
library to use the Moto server rather than S3. There are a few ways to do this. You can use a decorator or context manager. However, both of these approaches require you to modify existing tests. My preferred approach is to use the endpoint_url
argument when creating the S3 client (take another look at the store.py
module above). Set the BOTO3_ENDPOINT_URL
environment variable either in the shell prior to running the tests or directly in the tests using os.environ
.
export AWS_REGION="us-east-1"
export BOTO3_ENDPOINT_URL="http://localhost:3000"
Outside of testing the BOTO3_ENDPOINT_URL
variable won’t be defined.
Timing Statistics
We’ll user hyperfine
to gather robust statistics to compare the execution time of the tests using S3 and Moto.
hyperfine -r 20 'pytest'
These are the numbers for S3.
Time (mean ± σ): 1.612 s ± 0.180 s [User: 0.272 s, System: 0.030 s]
Range (min … max): 1.330 s … 1.944 s 20 runs
And these are the corresponding numbers using Moto.
Time (mean ± σ): 278.4 ms ± 34.4 ms [User: 243.7 ms, System: 25.8 ms]
Range (min … max): 243.8 ms … 413.2 ms 20 runs
The local mocking server is, not surprisingly, much quicker: an average of 278.4 milliseconds versus 1.612 seconds.