A few days ago I wrote about a scraper for gathering economic calendar data. Well, I’m back again to write about another aspect of the same project: acquiring earnings calendar data.
I split the task into two parts:
- a Python scraper for the calendar data, which dumps its output to CSV files; and
- a BASH script that consolidates the individual CSV files into a single master file.
In retrospect I’m not sure why I didn’t do the whole thing in Python. However, it works and I’m not going to mess with it.
Scraper
The data were scraped from the Company Earnings Calendar on Yahoo! Finance.
The data are divided by date and each page is accessible via an URL parameter. If there are many announcements on a specific day then the listing is paginated in batches of 100 announcements. Fortunately the pages are statically rendered, which means that I could do the scrape with just requests
and BeautifulSoup
. Data for each date was aggregated and written to a CSV file.
The scraper accepts two command line arguments:
--week
— gather announcements for next week or--month
— gather announcements for next month.
In the absence of either argument it will just gather announcements for the current day.
Consolidator
This is a simple BASH script that gather all of the individual CSV files and concatenates them together. The only interesting component of the script is stripping off the header for all but the first file so that the header is not replicated.
The resulting CSV file is available here and will be updated daily.
I’m sitting on the fence between Pandas and Polars right now. Read and print the first few records in Pandas.
import pandas as pd
df = pd.read_csv("earnings.csv")
df["date"] = pd.to_datetime(df["date"])
df = df[df["date"] >= pd.Timestamp("2024-10-01")]
print(df[["date", "symbol", "event"]].head(10))
date symbol event
24749 2024-10-01 AAUGF Q1 2025 Earnings Release
24750 2024-10-01 ARLLF Q3 2024 Earnings Release
24751 2024-10-01 ATVXF Full Year 2024 Earnings Release
24752 2024-10-01 AWLIF Q1 2024 Earnings Release
24753 2024-10-01 AYI NaN
24754 2024-10-01 CALM NaN
24755 2024-10-01 CDELF Q1 2025 Earnings Release
24756 2024-10-01 CHEK Q2 2024 Earnings Release
24757 2024-10-01 CKHGF Half Year 2025 Earnings Release
24758 2024-10-01 CKHGY Half Year 2025 Earnings Release
And something similar using Polars.
import polars as pl
from datetime import datetime
df = (
pl.read_csv("earnings.csv")
.with_columns(
pl.col("date").str.strptime(pl.Date)
)
.filter(
pl.col("date") >= datetime(2024, 10, 1)
)
)
print(df.select(["date", "symbol", "time"]).head(10))
shape: (10, 3)
┌────────────┬────────┬────────────────────┐
│ date ┆ symbol ┆ time │
│ --- ┆ --- ┆ --- │
│ date ┆ str ┆ str │
╞════════════╪════════╪════════════════════╡
│ 2024-10-01 ┆ AAUGF ┆ After Market Close │
│ 2024-10-01 ┆ ARLLF ┆ null │
│ 2024-10-01 ┆ ATVXF ┆ null │
│ 2024-10-01 ┆ AWLIF ┆ null │
│ 2024-10-01 ┆ AYI ┆ TAS │
│ 2024-10-01 ┆ CALM ┆ TAS │
│ 2024-10-01 ┆ CDELF ┆ After Market Close │
│ 2024-10-01 ┆ CHEK ┆ null │
│ 2024-10-01 ┆ CKHGF ┆ TAS │
│ 2024-10-01 ┆ CKHGY ┆ TAS │
└────────────┴────────┴────────────────────┘
Full list of columns in the CSV file:
date
symbol
company
event
time
eps_estimate
eps_reported
surprise
For around half of the records the estimated earnings per share (EPS) is included in the data. After the announcement has occurred the data are updated with the actual reported EPS. When both the estimated and reported values are present the percentage surprise is also added.
df = df.filter(
pl.col("eps_estimate").is_not_null()
)
print(df.select(["date", "symbol", "eps_estimate", "eps_reported", "surprise"]).head(10))
shape: (10, 5)
┌────────────┬────────┬──────────────┬──────────────┬──────────┐
│ date ┆ symbol ┆ eps_estimate ┆ eps_reported ┆ surprise │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ date ┆ str ┆ f64 ┆ f64 ┆ str │
╞════════════╪════════╪══════════════╪══════════════╪══════════╡
│ 2024-10-01 ┆ AYI ┆ 4.28 ┆ 4.3 ┆ +0.49 │
│ 2024-10-01 ┆ CALM ┆ 2.33 ┆ 3.06 ┆ +31.16 │
│ 2024-10-01 ┆ LW ┆ 0.72 ┆ 0.73 ┆ +1.67 │
│ 2024-10-01 ┆ MKC ┆ 0.68 ┆ 0.83 ┆ +22.42 │
│ 2024-10-01 ┆ NKE ┆ 0.52 ┆ 0.7 ┆ +34.62 │
│ 2024-10-01 ┆ NRSN ┆ -0.2 ┆ -0.02 ┆ +90 │
│ 2024-10-01 ┆ PAYX ┆ 1.14 ┆ 1.16 ┆ +1.58 │
│ 2024-10-01 ┆ PTN ┆ -0.55 ┆ null ┆ null │
│ 2024-10-01 ┆ RGP ┆ 0.03 ┆ null ┆ -100 │
│ 2024-10-01 ┆ RNW ┆ 0.01 ┆ null ┆ -62.5 │
└────────────┴────────┴──────────────┴──────────────┴──────────┘
Related
Since we’re talking about calendars, I wanted to mention the {qlcal}
package for R.
library(qlcal)
What calendars are supported?
calendars
[1] "TARGET" "UnitedStates" "UnitedStates/LiborImpact" "UnitedStates/NYSE"
[5] "UnitedStates/GovernmentBond" "UnitedStates/NERC" "UnitedStates/FederalReserve" "UnitedStates/SOFR"
[9] "Argentina" "Australia" "Australia/ASX" "Austria"
[13] "Austria/Exchange" "Bespoke" "Botswana" "Brazil"
[17] "Brazil/Exchange" "Canada" "Canada/TSX" "Chile"
[21] "China" "China/IB" "CzechRepublic" "Denmark"
[25] "Finland" "France" "France/Exchange" "Germany"
[29] "Germany/FrankfurtStockExchange" "Germany/Xetra" "Germany/Eurex" "Germany/Euwax"
[33] "HongKong" "Hungary" "Iceland" "India"
[37] "Indonesia" "Israel" "Italy" "Italy/Exchange"
[41] "Japan" "Mexico" "NewZealand" "Norway"
[45] "Null" "Poland" "Romania" "Russia"
[49] "SaudiArabia" "Singapore" "Slovakia" "SouthAfrica"
[53] "SouthKorea" "SouthKorea/KRX" "Sweden" "Switzerland"
[57] "Taiwan" "Thailand" "Turkey" "Ukraine"
[61] "UnitedKingdom" "UnitedKingdom/Exchange" "UnitedKingdom/Metals" "WeekendsOnly"
Select a specific calendar.
setCalendar("UnitedKingdom")
Get the 2024 holidays.
getHolidays(as.Date("2024-01-01"), as.Date("2024-12-31"))
[1] "2024-01-01" "2024-03-29" "2024-04-01" "2024-05-06" "2024-05-27" "2024-08-26" "2024-12-25" "2024-12-26"
Indeed, those are the 2024 UK bank holidays.