SQLAlchemy: Efficient Counting

SQLAlchemy

SQL

How to count() efficiently with SQLAlchemy.

Published

9 Jan 2021 12:00

SQLAlchemy logo

I have a SQLAlchemy count() query which is being called fairly frequently in my API. The query itself is not terribly inefficient, but it’s being called with sufficient frequency that it has a performance impact.

This is the basic setup.

session = db.Session()

# Create Query object.
#
products = session.query(db.Product)

And here’s the corresponding SQL, basically just retrieving all columns from the product table.

SELECT
  product.id AS product_id,
  product.retailer_id AS product_retailer_id,
  product.url AS product_url,
  product.product AS product_product,
  product.brand AS product_brand,
  product.model AS product_model,
  product.sku AS product_sku 
FROM
  product;

Using the `count()` Method

Now if we want to get a count of results we could simple do

products.count()

This generates the following SQL:

SELECT count(*) AS count_1 
FROM (
  SELECT
    product.id AS product_id,
    product.retailer_id AS product_retailer_id,
    product.url AS product_url,
    product.product AS product_product,
    product.brand AS product_brand,
    product.model AS product_model,
    product.sku AS product_sku 
  FROM
    product
) AS anon_1;

This is inefficient from a couple of perspectives:

it involves a sub-query and
it counts all columns (via the * in count(*)).

A Function to Count

After a bit of research I found the following function at https://gist.github.com/hest/8798884:

from sqlalchemy import func

def query_count(query):
    counter = query.statement.with_only_columns([func.count()])
    counter = counter.order_by(None)
    return query.session.execute(counter).scalar()

Now, rather than calling the count() method I use the query_count() function.

query_count(products)

Let’s quickly dissect what’s going on inside that function.

The query argument is just the original query we saw above.
And query.statement is the query without aliases.

SELECT
  product.id,
  product.retailer_id,
  product.url,
  product.product,
  product.brand,
  product.model,
  product.sku 
FROM
  product;

The with_only_columns() method returns a new query with columns specified by its arguments.

SELECT count(*) AS count_1 FROM product;

Much better! But it’s still using all columns (because of the * in count(*)).

Counting a Constant

What if we replace func.count() with func.count(1)? Well then we get this:

SELECT count(%(count_2)s) AS count_1 
FROM dev.product
{'count_2': 1}

This is better because we’re now just counting a constant. But passing that constant as a query parameter seems to be overkill.

Droppuig the Parameter

Enter literal_column(), which takes us over the line.

from sqlalchemy import func, literal_column

def query_count(query):
    ONE = literal_column("1")
    counter = query.statement.with_only_columns([func.count(ONE)])
    counter = counter.order_by(None)
    return query.session.execute(counter).scalar()

And the final SQL query is now:

SELECT count(1) AS count_1 FROM product;

Can’t get simpler than that! The sub-query is gone and now, rather than retrieving all columns to count, we just count the number of occurrences of the literal 1.

Using the count() Method

A Function to Count

Counting a Constant

Droppuig the Parameter

Using the `count()` Method