Code that moves data to and from S3 can slow down testing. A lot. This post demonstrates how you can speed things up by mocking S3.
What are the advantages of mocking? These are the most important for me:
- Speed (saves you time!).
- Don’t need to clean up afterwards (don’t want testing data bloating your bucket!).
- Don’t need to worry about clobbering valid data.
For the sake of illustration, these are the two functions that I want to test. They use the
boto3 library to interact with S3.
import os import boto3 S3_BUCKET = "datawookie-scratch" AWS_REGION = os.getenv("AWS_REGION") BOTO3_ENDPOINT_URL = os.getenv("BOTO3_ENDPOINT_URL") s3 = boto3.client('s3', region_name=AWS_REGION, endpoint_url=BOTO3_ENDPOINT_URL) def store_put(path): key = os.path.basename(path) s3.upload_file(path, S3_BUCKET, key) return key def store_get(key, path=None): if path is None: path = key s3.download_file(S3_BUCKET, key, path) return path
The functions live in the
store.py module. Here’s a simple test that copies the contents of
/etc/passwd to and from S3, ensuring that the contents of the retrieved file match the original.
import hashlib import pathlib from unittest import TestCase from store import s3, store_put, store_get, S3_BUCKET # Need to create a bucket for testing. # s3.create_bucket(Bucket=S3_BUCKET) def file_hash(path): return hashlib.blake2b(pathlib.Path(path).read_bytes()).hexdigest() class TestStore(TestCase): def test_store(self): key = store_put("/etc/passwd") assert key == "passwd" path = store_get(key) assert path == "passwd" assert file_hash("/etc/passwd") == file_hash(path)
Even using S3 a single test like this is pretty quick. But if you’re testing a lot of functions that access S3 or moving a lot of data back and forth then it can get time consuming.
Moto is a library for mocking AWS services. You can install it directly.
pip3 install moto
Either create a
moto container from the command line.
docker run -p 3000:3000 motoserver/moto:4.1.13
Or launch it via Docker Compose.
docker-compose.yml would look something like this:
version: "3.9" services: moto: image: motoserver/moto:4.1.13 ports: - "3000:3000" environment: - MOTO_PORT=3000
The Moto server will be running on local port 3000. You need to tell the
boto3 library to use the Moto server rather than S3. There are a few ways to do this. You can use a decorator or context manager. However, both of these approaches require you to modify existing tests. My preferred approach is to use the
endpoint_url argument when creating the S3 client (take another look at the
store.py module above). Set the
BOTO3_ENDPOINT_URL environment variable either in the shell prior to running the tests or directly in the tests using
export AWS_REGION="us-east-1" export BOTO3_ENDPOINT_URL="http://localhost:3000"
Outside of testing the
BOTO3_ENDPOINT_URL variable won’t be defined.
hyperfine to gather robust statistics to compare the execution time of the tests using S3 and Moto.
hyperfine -r 20 'pytest'
These are the numbers for S3.
Time (mean ± σ): 1.612 s ± 0.180 s [User: 0.272 s, System: 0.030 s] Range (min … max): 1.330 s … 1.944 s 20 runs
And these are the corresponding numbers using Moto.
Time (mean ± σ): 278.4 ms ± 34.4 ms [User: 243.7 ms, System: 25.8 ms] Range (min … max): 243.8 ms … 413.2 ms 20 runs
The local mocking server is, not surprisingly, much quicker: an average of 278.4 milliseconds versus 1.612 seconds.