I have a SQLAlchemy count()
query which is being called fairly frequently in my API. The query itself is not terribly inefficient, but it’s being called with sufficient frequency that it has a performance impact.
This is the basic setup.
session = db.Session()
# Create Query object.
#
products = session.query(db.Product)
And here’s the corresponding SQL, basically just retrieving all columns from the product
table.
SELECT
product.id AS product_id,
product.retailer_id AS product_retailer_id,
product.url AS product_url,
product.product AS product_product,
product.brand AS product_brand,
product.model AS product_model,
product.sku AS product_sku
FROM
product;
Using the count()
Method
Now if we want to get a count of results we could simple do
products.count()
This generates the following SQL:
SELECT count(*) AS count_1
FROM (
SELECT
product.id AS product_id,
product.retailer_id AS product_retailer_id,
product.url AS product_url,
product.product AS product_product,
product.brand AS product_brand,
product.model AS product_model,
product.sku AS product_sku
FROM
product
) AS anon_1;
This is inefficient from a couple of perspectives:
- it involves a sub-query and
- it counts all columns (via the
*
incount(*)
).
A Function to Count
After a bit of research I found the following function at https://gist.github.com/hest/8798884:
from sqlalchemy import func
def query_count(query):
counter = query.statement.with_only_columns([func.count()])
counter = counter.order_by(None)
return query.session.execute(counter).scalar()
Now, rather than calling the count()
method I use the query_count()
function.
query_count(products)
Let’s quickly dissect what’s going on inside that function.
- The
query
argument is just the original query we saw above. - And
query.statement
is the query without aliases.
SELECT
product.id,
product.retailer_id,
product.url,
product.product,
product.brand,
product.model,
product.sku
FROM
product;
- The
with_only_columns()
method returns a new query with columns specified by its arguments.
SELECT count(*) AS count_1 FROM product;
Much better! But it’s still using all columns (because of the *
in count(*)
).
Counting a Constant
What if we replace func.count()
with func.count(1)
? Well then we get this:
SELECT count(%(count_2)s) AS count_1
FROM dev.product
{'count_2': 1}
This is better because we’re now just counting a constant. But passing that constant as a query parameter seems to be overkill.
Droppuig the Parameter
Enter literal_column()
, which takes us over the line.
from sqlalchemy import func, literal_column
def query_count(query):
ONE = literal_column("1")
counter = query.statement.with_only_columns([func.count(ONE)])
counter = counter.order_by(None)
return query.session.execute(counter).scalar()
And the final SQL query is now:
SELECT count(1) AS count_1 FROM product;
Can’t get simpler than that! The sub-query is gone and now, rather than retrieving all columns to count, we just count the number of occurrences of the literal 1.