Test a Web Scraper using Patching

The previous post in this series considered the mocking capabilities in the unittest package. Now we’ll look at what it offers for patching.

What’s the difference between “mocking” and “patching”? A perfectly reasonable question. The distinction is subtle, and TBH I’m not always 100% clear on the difference myself. These two terms are often used interchangeably. This is the way that I currently think about it:

mocking — creates an entire fake object that mimics the behaviour of a real object; while
patching — fakes a specific behaviour on an existing object.

Hopefully the difference will become clear as we work our way through the examples below.

Patchers

The unittest.mock package offers a selection of patchers:

patch()
patch.object()
patch.dict() and
patch.multiple().

Each of these can be used as a decorator or a context manager. We’ll start by importing the patch() function.

from unittest.mock import patch

💡 You only need to import patch(). The other three functions automatically come with it. Effectively patch() is the master function and the other functions are attributes attached to patch().

Patching

For the purpose of illustration, suppose that I have a randomiser module (source file) that implements a function, rng(), and a class, Random.

from randomiser import rng, Random

Call the rng() function. It’s not much of a RNG because it always returns the same number. However, the value that it consistently returns is a particularly fine example of a random number.

rng()

0.42

Create an instance of Random and call its random() method. This is also a fiendishly poor implementation of a RNG.

Random().random()

0.42

Patcher Objects

The patch() function is used for patching module-level properties. You supply it with a string identifying the thing you want to patch. The example below creates a patcher object, patcher, for the rng() function in the randomiser module.

patcher = patch("randomiser.rng", return_value=0.13)

💡 The target is only imported when the patcher object is used, not when it is created. So you don’t need to have already imported the thing that you’re patching.

We specified the required return value as an argument to patch(). As we’ll see below, this is not the only way to do this. You could also assign to the return_value attribute on the patcher object. The approach you take will depend on whether or not the required return value is available at the time that you create the patcher object.

You might imagine that you’d now be able to call the patcher object. Sadly, you’d be wrong: that’s not how it works. However, the reality might be better than you imagined.

First turn the patcher on by starting it.

patcher.start()

Now call the function.

import randomiser

randomiser.rng()

0.13

Bravo! We get the patched result. 🚀 Now turn the patcher off by stopping it.

patcher.stop()

Try the function again.

randomiser.rng()

0.42

Aha, the original (unpatched) behaviour is restored. So, with this patcher object you can use the patched version of the function whenever required.

Context Manager

If that feels like a bit too much work, then a context manager may be your jam.

with patch("randomiser.rng", return_value=0.13) as mocked:
  randomiser.rng()
  # Try out one of the assertions attached to the patcher object.
  mocked.assert_called_once()

0.13

💡 When used as a context manager or a decorator (see below), the result of patch() is a MagicMock object. It’s not necessary to name this object, but doing so will unlock extra functionality, like checking that the patched object was called at least once. More on these assertions to follow below.

You can also use the previously created patcher object as a context manager.

with patcher:
  randomiser.rng()

It’s possible to patch global variables too. The randomiser module has a global SEED (with a value of 13) and a function rng_seed() that simply returns the value of SEED.

randomiser.rng_seed()

We can use patch() to temporarily change the value of SEED.

with patch("randomiser.SEED", 777):
    randomiser.rng_seed()

Decorator

Finally (and most relevant to the topic of testing), you can use patch() as a decorator.

@patch("randomiser.rng", return_value=0.13)
def test_rng(mock_rng):
    assert randomiser.rng() == 0.13

Another way to do the same thing is to assign the return_value within the scope of the decorator.

@patch("randomiser.rng")
def test_rng(mock_rng):
    mock_rng.return_value = 0.13
    assert randomiser.rng() == 0.13

Patching an Object

The patch.object function is used to patch properties on a specific object. The name of this function implies that it’s for patching “objects”. This can be confusing. It can be used for patching either

a class (because in Python a class is also an object!) or
an instance of a class.

Both of the following are valid:

# Patch the class method.
patch.object(Random, "random", return_value=0.13)
# The patch will apply to all instances of the class.

r = Random()
# Patch the instance method.
patch.object(r, "random", return_value=0.13)
# The patch will only apply to this specific instance of the class.

Patcher Object

As before we can create a patcher object. But now, rather than patching a function or method in a module, we’ll patch a method on a class.

patcher = patch.object(Random, "random", return_value=0.13)

As we’ll see below, the result can be used as a context manager or decorator. But first we’ll use it manually: start the patcher, call the patched method and then stop the patcher.

patcher.start()

r = Random()
r.random()

0.13

patcher.stop()

Voila! 🚀

Again, the object will revert to its original behaviour after the patcher is stopped.

Context Manager

Using the patcher object as a context manager means you don’t need to manually start and stop it. The object is only patched within the scope of the context manager.

with patch.object(Random, "random", return_value=0.13):
  r.random()

0.13

Decorator

Finally we can use it as a decorator. The object is only patched within the scope of the decorated function.

@patch.object(Random, "random", return_value=0.13)
def test_scraper(mock_random):
    r = Random()
    assert r.random() == 0.13
    mock_random.assert_called_once()

Patching a Dictionary

The patch.dictionary() function can be used for objects that implement a dictionary interface, with all of the associated dunder methods. This is the analog to the MagicMock class.

Multiple Patches

The patch.multiple() function makes it possible to patch more than one item at a time. This can be useful, but TBH if I don’t really need to do it in a single statement, then I’m inclined so simply use multiple calls to one of the other functions.

Tests with Patching

Let’s apply these patching functions to tests for the BooksScraper and QuotesScraper classes.

Patching a Return Value

Many of the simple examples above used the return_value parameter to specify the return value required on the patched object. Let’s see how this would be used when testing a scraper. We’ll apply it to the BookScraper class.

First we’ll use a context manager to apply the patch.

from unittest.mock import patch

import json
import pytest

from scraper.books import BooksScraper

BOOKS = json.load(open("books-to-scrape.json"))
HTML = "books-to-scrape.html"


@pytest.fixture
def scraper():
    return BooksScraper()


def test_scraper(scraper):
    with open(HTML, "r") as f:
        html = f.read()

    with patch.object(scraper, "download", return_value=html):
        html = scraper.download()

    books = scraper.parse(html)
    assert books == BOOKS

The context manager creates a scope in which the download() method is called. Within this scope the method is replaced with a patched version that simply returns the HTML read from a file.

Decorators are more commonly used for testing. Now the scope of the patch is the entire test.

import json
from unittest.mock import patch

from scraper.books import BooksScraper

BOOKS = json.load(open("books-to-scrape.json"))
HTML = "books-to-scrape.html"


@patch.object(BooksScraper, "download")
def test_scraper(patched_download):
    with open(HTML, "r") as f:
        html = f.read()

    # Because the required return value is only loaded inside the test function,
    # it's not available to the decarator. Instead we assign it here.
    #
    patched_download.return_value = html

    scraper = BooksScraper()
    html = scraper.download()
    books = scraper.parse(html)

    assert books == BOOKS

    # Use assertions on attributes of the patched object. (Non-essential meta-testing.)
    #
    assert patched_download.called
    assert patched_download.call_count == 1

    # Use assertion methods on the patched object itself. (Non-essential meta-testing.)
    #
    patched_download.assert_called_once()

💡 Since the required return value is only loaded within the scope of the decorator it’s not possible to specify the return_value argument to patch.object(). Not a problem! This can be done later by setting the return_value attribute on the patcher object.

Patching with Side Effects

The implementation of QuotesScraper is such that we cannot use return_value but must instead use side_effect. The side effect implemented in the mock_download inner function sets the html attribute on the patched object.

from unittest.mock import patch

import pandas as pd
import pytest

from scraper.quotes import QuotesScraper

QUOTES = pd.read_csv("quotes-to-scrape.csv")
HTML = "quotes-to-scrape.html"


@pytest.fixture
def scraper():
    return QuotesScraper()


@patch.object(QuotesScraper, "download")
def test_scraper(patched_download, scraper):
    def mock_download():
        with open(HTML, "r") as f:
            scraper.html = f.read()

    patched_download.side_effect = mock_download

    scraper.download()
    scraper.parse()
    scraper.normalise()
    data = scraper.transform()

    assert data.equals(QUOTES)

The side_effect property can be used to patch a generator too. The version of the mock_download inner function below yields the content of the HTML file.

import json
from unittest.mock import patch

from scraper.books import BooksScraper

BOOKS = json.load(open("books-to-scrape.json"))
HTML = "books-to-scrape.html"


@patch.object(BooksScraper, "download")
def test_scraper(patched_download):
    def mock_download():
        with open(HTML, "r") as f:
            yield f.read()

    patched_download.side_effect = mock_download

    scraper = BooksScraper()

    # Extract HTML from iterator.
    for html in scraper.download():
        pass
    # It will only be called once due to the way that the generator is implemented.
    patched_download.assert_called_once()

    books = scraper.parse(html)
    assert books == BOOKS

Patcher Attribute Assertions

You might have noticed that some tests access the called and called_count attributes on the patch object. These are handy little utilities that can be used to (1) ensure that the patch is indeed called and (2) that the patch is called the expected number of times.

Patcher objects expose a number of assertions linked to these attributes that can be useful for meta-testing (checking that the tests are working as expected):

assert_any_call()
assert_called() — checks that called
assert_called_once() — checks that called only once
assert_called_once_with() — checks the argument used to call the object
assert_called_with()
assert_has_calls() and
assert_not_called.

Patches are Flexible

In the previous post we mocked the response from https://dummyjson.com/user/1. Let’s do the same thing with patching.

import json
from unittest.mock import patch

import requests

emily = {
  "id": 1,
  "firstName": "Emily",
  "lastName": "Johnson",
  "age": 28,
  "gender": "female"
}

@patch("requests.get")
def test_user_response(mock_get):
  mock_get.return_value.status_code = 200
  mock_get.return_value.text = json.dumps(emily)
  
  respose = requests.get("https://dummyjson.com/user/1")
  
  assert response.status_code == 200
  assert response.text == json.dumps(emily)

This is not in any way a meaningful test, but it shows how functionality from third party packages can easily be patched.

Conclusion

Patching is a useful alternative to mocking for web scraper tests. Often the two can be used interchangeably. Use one. Or the other. Or both. But, whatever you do, make use of patching and mocking to ensure that your web scrapers are tested quickly, robustly and predictably.