Zyte API Sessions

The Zyte API implements session management, which makes it possible to emulate a browser session when interacting with a site via the API.

What is a Session?

A session is a set of conditions that are applied to multiple requests. Those conditions will generally include:

  • IP address
  • cookies and
  • headers.

Requests sent from the same session should appear to be part of the same browsing session.

Session Management in the Zyte API

Zyte’s session management offers two approaches to handling sessions:

Client-Managed Sessions

Creating a client-managed session is simple: just provide an unique session ID. All other requests that use the same session ID will be executed in the same session.

Client-managed sessions provide granular session management, making it possible to

  • create and manage multiple sessions simultaneously and
  • send specific requests through particular sessions.

🚨 Take care when using client-managed sessions in parallel. Parallel requests using the same session ID, where the session doesn’t already exist, may each be assigned a different IP address. It’s best to ensure that the initial request that establishes a session happens in isolation. Then feel free to use the session ID in parallel.

Server-Managed Sessions

In principle server-managed sessions delegate all of the session management to the Zyte API. However, they don’t align well with my existing workflows. I’d be interested to hear from anybody who found this approach preferable to using client-managed sessions.

Session to Manage Woolworths Cookies

Like Checkers in the previous post, Woolworths uses a cookie to persist a selected retail store. This is what the site looks like initially. There’s no Delivery Address selected.

Default Woolworths landing page.

The script below uses client-managed sessions. It does the following:

  1. Send an initial request in which I provide a cookie that specifies a delivery address. I also give a session ID, which initiates a session.
  2. Send a second request, now without request cookies but specifying the ID for the session created in the previous step.
import os
from base64 import b64decode
from uuid import uuid4
from zyte_api import ZyteAPI

ZYTE_API_KEY = os.getenv("ZYTE_API_KEY")

SESSION_ID = str(uuid4())

URL_HOME = "https://www.woolworths.co.za/"
URL_PRODUCT = "https://www.woolworths.co.za/prod/A-20150501"

cookies = [
    # Specify delivery address, which sets store location.
    {
        "name": "location",
        "value": "CnC|false|340|4500018|||Ccol||false|false|115 Saint Andrews Drive|",
        "domain": ".woolworths.co.za",
    },
    # Provide cookie consent.
    {
        "name": "cookieconsent_status",
        "value": "dismiss",
        "domain": ".woolworths.co.za",
    },
]

client = ZyteAPI(api_key=ZYTE_API_KEY)

# Send a first request setting a session ID and providing cookies.
#
response = client.get(
    {
        "url": URL_HOME,
        "requestCookies": cookies,
        "screenshot": True,
        "session": {"id": SESSION_ID},
    }
)

with open("screenshot-woolworths-home.png", "wb") as file:
    file.write(b64decode(response["screenshot"]))

# Now send a second request using the same session but not sending cookies.
#
response = client.get(
    {
        "url": URL_PRODUCT,
        "screenshot": True,
        "session": {"id": SESSION_ID},
    }
)

with open("screenshot-woolworths-product.png", "wb") as file:
    file.write(b64decode(response["screenshot"]))

After the initial step it’s apparent that a specific store (La Lucia, see top/right of window, next to the shopping cart) has been selected based on the delivery address provided.

Woolworths landing page with delivery address selected.

The subsequent request goes to a specific product page. However, despite not providing request cookies the selected store is persisted because the cookies are retained in the session.

Woolworths product with selected delivery address populated via session.

Conclusion

If you are sending multiple requests to the same site and want to persist state between those requests then session management in the Zyte API is very likely to fit the bill.

If you need help building web crawlers or advice of data acquisition, then get in touch with Fathom Data. The team has over a decade’s experience building bespoke data solutions. As an AWS Partner they can also set up and maintain your cloud infrastructure.