SQLAlchemy: Efficient Counting

I have a SQLAlchemy count() query which is being called fairly frequently in my API. The query itself is not terribly inefficient, but it’s being called with sufficient frequency that it has a performance impact.

This is the basic setup.

session = db.Session()

# Create Query object.
#
products = session.query(db.Product)

And here’s the corresponding SQL, basically just retrieving all columns from the product table.

SELECT
  product.id AS product_id,
  product.retailer_id AS product_retailer_id,
  product.url AS product_url,
  product.product AS product_product,
  product.brand AS product_brand,
  product.model AS product_model,
  product.sku AS product_sku 
FROM
  product;

Using the count() Method

Now if we want to get a count of results we could simple do

products.count()

This generates the following SQL:

SELECT count(*) AS count_1 
FROM (
  SELECT
    product.id AS product_id,
    product.retailer_id AS product_retailer_id,
    product.url AS product_url,
    product.product AS product_product,
    product.brand AS product_brand,
    product.model AS product_model,
    product.sku AS product_sku 
  FROM
    product
) AS anon_1;

This is inefficient from a couple of perspectives:

  • it involves a sub-query and
  • it counts all columns (via the * in count(*)).

A Function to Count

After a bit of research I found the following function at https://gist.github.com/hest/8798884:

from sqlalchemy import func

def query_count(query):
    counter = query.statement.with_only_columns([func.count()])
    counter = counter.order_by(None)
    return query.session.execute(counter).scalar()

Now, rather than calling the count() method I use the query_count() function.

query_count(products)

Let’s quickly dissect what’s going on inside that function.

  1. The query argument is just the original query we saw above.
  2. And query.statement is the query without aliases.
SELECT
  product.id,
  product.retailer_id,
  product.url,
  product.product,
  product.brand,
  product.model,
  product.sku 
FROM
  product;
  1. The with_only_columns() method returns a new query with columns specified by its arguments.
SELECT count(*) AS count_1 FROM product;

Much better! But it’s still using all columns (because of the * in count(*)).

Counting a Constant

What if we replace func.count() with func.count(1)? Well then we get this:

SELECT count(%(count_2)s) AS count_1 
FROM dev.product
{'count_2': 1}

This is better because we’re now just counting a constant. But passing that constant as a query parameter seems to be overkill.

Droppuig the Parameter

Enter literal_column(), which takes us over the line.

from sqlalchemy import func, literal_column

def query_count(query):
    ONE = literal_column("1")
    counter = query.statement.with_only_columns([func.count(ONE)])
    counter = counter.order_by(None)
    return query.session.execute(counter).scalar()

And the final SQL query is now:

SELECT count(1) AS count_1 FROM product;

Can’t get simpler than that! The subquery is gone and now, rather than retrieving all columns to count, we just count the number of occurrences of the literal 1.