Earnings Calendar

A few days ago I wrote about a scraper for gathering economic calendar data. Well, I’m back again to write about another aspect of the same project: acquiring earnings calendar data.

I split the task into two parts:

  1. a Python scraper for the calendar data, which dumps its output to CSV files; and
  2. a BASH script that consolidates the individual CSV files into a single master file.

In retrospect I’m not sure why I didn’t do the whole thing in Python. However, it works and I’m not going to mess with it.

Scraper

The data were scraped from the Company Earnings Calendar on Yahoo! Finance.

The data are divided by date and each page is accessible via an URL parameter. If there are many announcements on a specific day then the listing is paginated in batches of 100 announcements. Fortunately the pages are statically rendered, which means that I could do the scrape with just requests and BeautifulSoup. Data for each date was aggregated and written to a CSV file.

The scraper accepts two command line arguments:

  • --week — gather announcements for next week or
  • --month — gather announcements for next month.

In the absence of either argument it will just gather announcements for the current day.

Consolidator

This is a simple BASH script that gather all of the individual CSV files and concatenates them together. The only interesting component of the script is stripping off the header for all but the first file so that the header is not replicated.

The resulting CSV file is available here and will be updated daily.

I’m sitting on the fence between Pandas and Polars right now. Read and print the first few records in Pandas.

import pandas as pd

df = pd.read_csv("earnings.csv")

df["date"] = pd.to_datetime(df["date"])

df = df[df["date"] >= pd.Timestamp("2024-10-01")]

print(df[["date", "symbol", "event"]].head(10))
            date symbol                             event
24749 2024-10-01  AAUGF         Q1 2025  Earnings Release
24750 2024-10-01  ARLLF         Q3 2024  Earnings Release
24751 2024-10-01  ATVXF  Full Year 2024  Earnings Release
24752 2024-10-01  AWLIF         Q1 2024  Earnings Release
24753 2024-10-01    AYI                               NaN
24754 2024-10-01   CALM                               NaN
24755 2024-10-01  CDELF         Q1 2025  Earnings Release
24756 2024-10-01   CHEK         Q2 2024  Earnings Release
24757 2024-10-01  CKHGF  Half Year 2025  Earnings Release
24758 2024-10-01  CKHGY  Half Year 2025  Earnings Release

And something similar using Polars.

import polars as pl
from datetime import datetime

df = (
    pl.read_csv("earnings.csv")
    .with_columns(
        pl.col("date").str.strptime(pl.Date)
    )
    .filter(
        pl.col("date") >= datetime(2024, 10, 1)
    )
)

print(df.select(["date", "symbol", "time"]).head(10))
shape: (10, 3)
┌────────────┬────────┬────────────────────┐
│ date       ┆ symbol ┆ time               │
│ ---        ┆ ---    ┆ ---                │
│ date       ┆ str    ┆ str                │
╞════════════╪════════╪════════════════════╡
│ 2024-10-01 ┆ AAUGF  ┆ After Market Close │
│ 2024-10-01 ┆ ARLLF  ┆ null               │
│ 2024-10-01 ┆ ATVXF  ┆ null               │
│ 2024-10-01 ┆ AWLIF  ┆ null               │
│ 2024-10-01 ┆ AYI    ┆ TAS                │
│ 2024-10-01 ┆ CALM   ┆ TAS                │
│ 2024-10-01 ┆ CDELF  ┆ After Market Close │
│ 2024-10-01 ┆ CHEK   ┆ null               │
│ 2024-10-01 ┆ CKHGF  ┆ TAS                │
│ 2024-10-01 ┆ CKHGY  ┆ TAS                │
└────────────┴────────┴────────────────────┘

Full list of columns in the CSV file:

  • date
  • symbol
  • company
  • event
  • time
  • eps_estimate
  • eps_reported
  • surprise

For around half of the records the estimated earnings per share (EPS) is included in the data. After the announcement has occurred the data are updated with the actual reported EPS. When both the estimated and reported values are present the percentage surprise is also added.

df = df.filter(
  pl.col("eps_estimate").is_not_null()
)

print(df.select(["date", "symbol", "eps_estimate", "eps_reported", "surprise"]).head(10))
shape: (10, 5)
┌────────────┬────────┬──────────────┬──────────────┬──────────┐
│ date       ┆ symbol ┆ eps_estimate ┆ eps_reported ┆ surprise │
│ ---        ┆ ---    ┆ ---          ┆ ---          ┆ ---      │
│ date       ┆ str    ┆ f64          ┆ f64          ┆ str      │
╞════════════╪════════╪══════════════╪══════════════╪══════════╡
│ 2024-10-01 ┆ AYI    ┆ 4.28         ┆ 4.3          ┆ +0.49    │
│ 2024-10-01 ┆ CALM   ┆ 2.33         ┆ 3.06         ┆ +31.16   │
│ 2024-10-01 ┆ LW     ┆ 0.72         ┆ 0.73         ┆ +1.67    │
│ 2024-10-01 ┆ MKC    ┆ 0.68         ┆ 0.83         ┆ +22.42   │
│ 2024-10-01 ┆ NKE    ┆ 0.52         ┆ 0.7          ┆ +34.62   │
│ 2024-10-01 ┆ NRSN   ┆ -0.2         ┆ -0.02        ┆ +90      │
│ 2024-10-01 ┆ PAYX   ┆ 1.14         ┆ 1.16         ┆ +1.58    │
│ 2024-10-01 ┆ PTN    ┆ -0.55        ┆ null         ┆ null     │
│ 2024-10-01 ┆ RGP    ┆ 0.03         ┆ null         ┆ -100     │
│ 2024-10-01 ┆ RNW    ┆ 0.01         ┆ null         ┆ -62.5    │
└────────────┴────────┴──────────────┴──────────────┴──────────┘

Since we’re talking about calendars, I wanted to mention the {qlcal} package for R.

library(qlcal)

What calendars are supported?

calendars
 [1] "TARGET"                         "UnitedStates"                   "UnitedStates/LiborImpact"       "UnitedStates/NYSE"             
 [5] "UnitedStates/GovernmentBond"    "UnitedStates/NERC"              "UnitedStates/FederalReserve"    "UnitedStates/SOFR"             
 [9] "Argentina"                      "Australia"                      "Australia/ASX"                  "Austria"                       
[13] "Austria/Exchange"               "Bespoke"                        "Botswana"                       "Brazil"                        
[17] "Brazil/Exchange"                "Canada"                         "Canada/TSX"                     "Chile"                         
[21] "China"                          "China/IB"                       "CzechRepublic"                  "Denmark"                       
[25] "Finland"                        "France"                         "France/Exchange"                "Germany"                       
[29] "Germany/FrankfurtStockExchange" "Germany/Xetra"                  "Germany/Eurex"                  "Germany/Euwax"                 
[33] "HongKong"                       "Hungary"                        "Iceland"                        "India"                         
[37] "Indonesia"                      "Israel"                         "Italy"                          "Italy/Exchange"                
[41] "Japan"                          "Mexico"                         "NewZealand"                     "Norway"                        
[45] "Null"                           "Poland"                         "Romania"                        "Russia"                        
[49] "SaudiArabia"                    "Singapore"                      "Slovakia"                       "SouthAfrica"                   
[53] "SouthKorea"                     "SouthKorea/KRX"                 "Sweden"                         "Switzerland"                   
[57] "Taiwan"                         "Thailand"                       "Turkey"                         "Ukraine"                       
[61] "UnitedKingdom"                  "UnitedKingdom/Exchange"         "UnitedKingdom/Metals"           "WeekendsOnly"                  

Select a specific calendar.

setCalendar("UnitedKingdom")

Get the 2024 holidays.

getHolidays(as.Date("2024-01-01"), as.Date("2024-12-31"))
[1] "2024-01-01" "2024-03-29" "2024-04-01" "2024-05-06" "2024-05-27" "2024-08-26" "2024-12-25" "2024-12-26"

Indeed, those are the 2024 UK bank holidays.