
Previous posts in this series used the responses
and vcr
packages to mock HTTP responses. Now we’re going to look at the capabilities for mocking in the unittest
package, which is part of the Python Standard Library. Relative to responses
and vcr
this functionality is rather low-level. There’s more work required, but as a result there’s potential for greater control.
Mocks are (Deceptively) Simple
The unittest.mock
module has two classes for mocking: Mock
and MagicMock
.
Mock Objects
The Mock
class offers a number of constructor arguments. These are the most important ones:
return_value
— Specifies the value to be returned when the mock is called.side_effect
— Theside_effect
argument can provide either a function, an iterable or an exception.- Function The function is called each time the mock is invoked. The return value of the function is returned as the mock’s return value.
- Iterable Each time that the mock is called it will return the next value from the iterable.
- Exception The specified exception is raised when the mock is called
There are a few other arguments (spec
, spec_set
, wraps
, name
and unsafe
) that are a bit niche for the moment.
A Vanilla Mock
We’ll start by creating a plain Mock
object.
from unittest.mock import Mock, MagicMock
rng = Mock()
rng
<Mock id='131922784248480'>
The ID in the __repr__()
string is simply the result of the standard Python id()
function and is not specific to the class.
We can call the Mock
object like a function.
rng()
<Mock name='mock()' id='131922849355712'>
The result is another Mock
object (it’s a distinct object because it has a different ID). We can provide arbitrary arguments.
rng(min=0, max=100, quantum_fluctuation=99, luck_modifier="rabbit foot")
<Mock name='mock()' id='131922849355712'>
The ID has not changed, so the arguments at present have no effect: we get back the same object as without arguments.
Seems flexible… but also somewhat meaningless. Why? How could this be remotely useful? Bear with me.
Mocked Return Value
Add some real functionality by using the return_value
to set the value returned by the mock.
rng = Mock(return_value=0.42)
rng()
0.42
And if we call it again?
rng()
0.42
Not at all random. But it does what we asked: it returns the specified value.
Mocked Side Effect
If we want a function to be called whenever we use the mock then use the side_effect
argument. Here, for example, is a mock that counts the number of times that it’s been called. Forgive the heinous use of global
. 😕
counter = 0
def mock_random():
global counter
counter += 1
return 0.42
rng = Mock(side_effect=mock_random)
Call the mock a couple of times.
rng()
rng()
Then check on the counter.
counter
2
As mentioned earlier, the side_effect
argument can be used for a host of other purposes. Here are a couple of examples.
# Yields a series of values then raises StopIteration.
rng = Mock(side_effect=[0.135, 0.52, 0.9])
# Raising an exception.
rng = Mock(side_effect=RuntimeError("Randomness tank empty. Refill with chaos and try again!"))
Mock Attributes & Methods
You can access arbitrary attributes and methods on the mock object.
rng.state
<Mock name='mock.state' id='131923072201504'>
rng.random()
<Mock name='mock.random()' id='131923193252848'>
But they don’t do anything meaningful until you define them, simply returning other Mock
objects.
rng.state = 299792
rng.random = lambda: 0.13
Now let’s try them again.
rng.state
299792
rng.random()
0.13
You could also set these via constructor arguments.
rng = Mock(state = 299792, random = lambda: 0.13)
Magic Mock Objects 🧙
The MagicMock
class is just like Mock
except it can also mock dunder methods.
days = MagicMock()
For the purpose of illustration I’m going to need a sorted list of the days of the week.
import calendar
SORTED_DAYS_OF_WEEK = sorted(list(calendar.day_name))
SORTED_DAYS_OF_WEEK
['Friday', 'Monday', 'Saturday', 'Sunday', 'Thursday', 'Tuesday', 'Wednesday']
Mock the __getitem__
method so that the mock can be indexed.
days.__getitem__.side_effect = lambda key: SORTED_DAYS_OF_WEEK[key]
days[2]
'Saturday'
Mock the __iter__
method so that it can be treated as a generator.
days.__iter__.return_value = iter(SORTED_DAYS_OF_WEEK)
for day in days:
print(day)
Friday
Monday
Saturday
Sunday
Thursday
Tuesday
Wednesday
Give a meaningful value for the str()
function.
days.__str__.return_value = "Sorted Days of the Week"
str(rng)
"<Mock id='131923193520704'>"
Right, that’s quite enough background. Let’s get down to testing some scrapers.
The Scrapers
For continuity we’ll build tests for the Quotes to Scrape scraper considered in previous posts. However, in order to illustrate a wider range of capabilities we’ll introduce a second scraper with a different architecture. The scraper below extracts the paginated data from Books to Scrape. Feel free to skip over this code for the moment.
import logging
import re
from typing import Iterator
from urllib.parse import urljoin
import bs4
import pandas as pd
import requests
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)7s] %(message)s",
)
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
BASE_URL = "https://books.toscrape.com/catalogue/"
class BooksScraper:
def __init__(self):
self.client = requests.Session()
def __del__(self):
self.client.close()
def download(self) -> Iterator[str]:
page = 1
while True:
logging.info(f"Get page {page}.")
url = urljoin(BASE_URL, f"page-{page}.html")
response = self.client.get(url)
if response.status_code != 200:
break
yield response.text
page += 1
def parse(self, html: str) -> list[dict]:
soup = bs4.BeautifulSoup(html, "html.parser")
def stars(tag):
ratings = {"One": 1, "Two": 2, "Three": 3, "Four": 4, "Five": 5}
return ratings.get(tag.get("class")[1])
return [
{
"title": book.select_one("h3 a")["title"],
"url": urljoin(BASE_URL, book.select_one("h3 a")["href"]),
"price": book.select_one(".product_price > p").text,
"img": urljoin(BASE_URL, book.select_one("img")["src"]),
"in_stock": book.select_one(".availability").text.strip() == "In stock",
"stars": stars(book.select_one(".star-rating")),
}
for book in soup.select("article")
]
def normalise(self, books: list[dict]) -> list[dict]:
for book in books:
# Remove currency.
book["price"] = re.sub(r"^[^0-9]+", "", book["price"])
# Convert to float.
book["price"] = float(book["price"])
return books
def transform(self, books: list[dict]) -> pd.DataFrame:
return pd.DataFrame(books).sort_values(by="title").reset_index(drop=True)
def crawl(self) -> pd.DataFrame:
parsed = []
for html in self.download():
parsed.extend(self.parse(html))
normalised = self.normalise(parsed)
return self.transform(normalised)
if __name__ == "__main__":
scraper = BooksScraper()
df = scraper.crawl()
print(df.head())
The QuotesScraper
downloader stored the HTML content in an object attribute, making the method a procedure with a side effect (but no return value) rather than a function. The BooksScraper
downloader, by contrast, actually returns the HTML content.
The BooksScraper
class has a crawl()
method that orchestrates the scraping process. It retrieves HTML from download()
, passing it to parse()
, which extracts the required data and returns a list of dictionaries. That in turn is passed to normalise()
for cleaning and another list of dictionaries is returned. Finally that’s passed to transform()
, which returns a data frame. No data is stored in the object.
Reference HTML
Both the responses
and vcr
packages provide functionality for storing the reference HTML response in a YAML file. We don’t have that luxury now and we need to do it ourselves. We’ll use curl
to harvest copies of the HTML content from the target sites and redirect the output to files. You could equally do this by saving the page directly from your browser.
curl https://quotes.toscrape.com/ >quotes-to-scrape.html
curl https://books.toscrape.com/ >books-to-scrape.html
Tests with Mocking
Mocking a Return Value
Let’s test the BookScraper
class. In order to decouple our tests from the target site we need to ensure that the download()
method doesn’t actually issue a network request. We’ll mock the download()
method to load the expected content from file downloaded a moment ago.
from unittest.mock import Mock
import pandas as pd
import pytest
from scraper.books import BooksScraper
BOOKS = pd.read_csv("books-to-scrape.csv")
HTML = "books-to-scrape.html"
@pytest.fixture
def scraper():
# Load the HTML from file.
with open(HTML, "r") as f:
html = f.read()
bs = BooksScraper()
# Mock the download() method, setting the return value to the HTML string.
bs.download = Mock(return_value=html)
return bs
def test_scraper(scraper):
# This method is mocked.
html = scraper.download()
# The remaining methods use original implementation.
parsed = scraper.parse(html)
normalised = scraper.normalise(parsed)
books = scraper.transform(normalised)
assert books.equals(BOOKS)
Relative to the implementations using responses
and vcr
the test itself, test_scraper()
, is clean and simple: no decorators or code for loading data from a YAML file.
All of the action is in the scraper()
fixture. First the HTML is loaded from the HTML file. Then a BookScraper
object is created. Normally the download()
method on this object retrieves HTML from https://books.toscrape.com/. However, in the test it’s replaced with a mocked method via the Mock
class. When creating the Mock()
object the return_value
parameter is set to the content of the HTML file. Now, rather than making a network request the download()
method simply returns the loaded HTML. It’s fast and robust.
Mocking with Side Effects
As discussed earlier, the QuotesScraper
class works differently, so the approach used above won’t work. The download()
method doesn’t return a value, but instead sets an attribute.
from unittest.mock import Mock
import pandas as pd
import pytest
from scraper.quotes import QuotesScraper
QUOTES = pd.read_csv("quotes-to-scrape.csv")
HTML = "quotes-to-scrape.html"
@pytest.fixture
def scraper():
with open(HTML, "r") as f:
html = f.read()
def mock_download():
bs.html = html
bs = QuotesScraper()
bs.download = Mock(side_effect=mock_download)
return bs
def test_scraper(scraper):
scraper.download()
scraper.parse()
scraper.normalise()
quotes = scraper.transform()
assert quotes.equals(QUOTES)
The implementation of the scraper()
fixture is analogous to that in the previous example. However, rather than using the return_value
parameter when creating the Mock
object we use side_effect
. The mocked download()
method assigns the loaded HTML to the html
attribute on the QuotesScraper
object. The inner function, mock_download()
, is required because the side_effect
parameter expects a callable.
Mocking a Generator
There’s a weakness in the BookScraper
test above: the mock returns a single HTML document. If you look back at the code for the download()
method then you’ll see that it’s actually a generator, yielding HTML for each of a series of pages. To properly test this class we should really mock this behaviour.
from unittest.mock import Mock
import pandas as pd
import pytest
from scraper.books import BooksScraper
BOOKS = pd.read_csv("books-to-scrape.csv")
HTML = "books-to-scrape.html"
@pytest.fixture
def scraper():
# Load the HTML from file.
with open(HTML, "r") as f:
html = f.read()
def mock_download():
yield html
yield html
yield html
bs = BooksScraper()
# Mock the download() method, setting the return value to the HTML string.
bs.download = Mock(side_effect=mock_download)
return bs
def test_scraper(scraper):
parsed = []
for html in scraper.download():
parsed.extend(scraper.parse(html))
assert len(parsed) == 60
Once again we use the side_effect
argument to create a Mock
object, this time passing a generator function which yields three copies of the HTML document. The test simply checks that the content from each of those documents is parsed. A more complete test would also check that the normalise()
and transform()
methods also work as intended.
📌 This could have been implemented more concisely by providing a list of items to the side_effect
argument (which treats an iterable as a generator), but I prefer this approach because it’s explicit.
Mocks are Flexible
The examples above have just scratched the surface of what’s possible using the Mock
class. Here are some other bells and whistles.
Mocking with Reckless Abandon
Suppose we wanted to mock the response from https://dummyjson.com/user/1. To keep things simple we’ll only implement part of the response payload. First create a mocked response object. Start with a plain vanilla Mock
object and then add status_code
and text
attributes.
import json
from unittest.mock import Mock
emily = {
"id": 1,
"firstName": "Emily",
"lastName": "Johnson",
"age": 28,
"gender": "female"
}
mock_response = Mock()
mock_response.status_code = 200
mock_response.text = json.dumps(emily)
Assign the mock to the requests.get()
function.
import requests
# Mock the requests.get() method.
requests.get = Mock(return_value=mock_response)
📢 Due to the simple way that we have mocked requests.get()
we’ll get the same response regardless of the provided URL, so https://www.example.com/ will yield the same response.
Now when you call requests.get()
, it returns the mock_response
object.
response = requests.get("https://dummyjson.com/user/1")
Check the status code.
response.status_code
200
Looks good. What about the text?
response.text
'{"id": 1, "firstName": "Emily", "lastName": "Johnson", "age": 28, "gender": "female"}'
Nice!
For future reference, let’s see what happens if we request another attribute on the mock object.
response.previous
<Mock name='mock.previous' id='131922784306816'>
It works but it doesn’t return anything meaningful. Hold this in your working memory for a short while.
Mocking with Rules
The previous example illustrates just how flexible the Mock
class can be. But perhaps you don’t want to be quite that flexible. Maybe you want to ensure that the mocked response object only has the attributes and methods of a “proper” response? Enter the spec
parameter.
from requests import Response
mock_response = Mock(
# The Response class is used as the specification template for the mocked object.
spec=Response,
# Mock a few attributes and methods.
status_code=200,
text=json.dumps(emily),
json=lambda: emily
)
The text
attribute works as before.
mock_response.text
'{"id": 1, "firstName": "Emily", "lastName": "Johnson", "age": 28, "gender": "female"}'
And I’ve added in a mocked json()
method as well.
mock_response.json()
{'id': 1, 'firstName': 'Emily', 'lastName': 'Johnson', 'age': 28, 'gender': 'female'}
Both of these work because they’re part of the Response
interface. What if we stray a little further?
# Will work because .next is part of the Response spec (won't return anything meaningful!).
mock_response.next
# Will not work because .previous is not part of the Response spec.
mock_response.previous
We can access the next
attribute because it’s part of the Response
interface, although it doesn’t return any useful information because it was not explicitly mocked. There is no previous
attribute on a Response
object, so trying to access it on the mock will now fail 🚧.
Conclusion
The mocking capacity in unittest.mock
is an excellent alternative to the responses
and vcr
packages if you want to have more granular control over what’s happening in your tests.
References:
- There are two relevant sections in Effective Python by Brett Slatkin (second edition), “Use Mocks to Test Code with Complex Dependencies” and “Encapsulate Dependencies to Facilitate Mocking and Testing”.
