Downloading content from SharePoint can be tricky. It might appear that a Microsoft login is required. You might attempt to automate the login process but run into other challenges.
If you are lucky though it might not be all that hard.
For the purpose of illustration I’ll document the process I went through for downloading a CSV document. The URL for the document is stored as URL in a module called const.py.
Simple Static
You can try simply retrieving the page via a static request. It’s important to follow redirects because the SharePoint URL will likely result in a 302 redirect.
import httpx
from const import URL
response = httpx.get(URL, follow_redirects=True)
response.raise_for_status()
But this is not useful because the page is a heap of JavaScript, which of course is not executed. You don’t get the actual content of the CSV document. The JavaScript needs to be executed. Perhaps browser automation with Playwright or Selenium would help?
Full Dynamic
The CSV file is presented as a Spreadsheet in Excel on SharePoint. Since this is clearly a dynamic page (via the JavaScript mentioned before) one could use Playwright.
from playwright.sync_api import sync_playwright
from const import CHROME_PATH, URL
def main():
with sync_playwright() as p:
browser = p.chromium.launch(headless=False, executable_path=CHROME_PATH)
context = browser.new_context()
# Extend timeout (currently on a slow connection).
context.set_default_timeout(60000)
page = context.new_page()
page.goto(URL)
# Do the actual download here...
browser.close()
if __name__ == "__main__":
main()
That code simply (and successfully) opens the page. You’d then need to automate the clicking of some buttons to export the data as CSV (or ODS or PDF). This might work. But it would probably be fragile. And you might get kicked out and confronted with a login page.
Diversion: SharePoint Parameters
You might notice that there are some seemingly random query parameters appended to the end of the SharePoint URLs. For example:
?e=5cyLBCA short-lived sharing token created when someone shares an anonymous (no login required) link from SharePoint or OneDrive. A new token is generated each time that the content is shared. The token identifies the specific share request and determines the access level. This parameter can safely be omitted.?rtime=J1sP10YU3kgRuntime metadata generated by SharePoint or OneDrive. It’s simply for performance and diagnostic logging and can also be omitted.
Those parameters normally simply appear while browsing the content on SharePoint. They are not terribly useful to us. However, there are some other parameters that don’t just spontaneously appear, but which can be remarkably useful!
download=1This parameter will force download of the underlying document. It’s just what we need!web=1This does the opposite of thedownloadparameter, forcing the document to open in an online viewer. So it’s really only useful with a browser.
Downloading Done Right!
Appending the download parameter to the URL does the job.
from io import StringIO
import httpx
import polars as pl
from const import URL
url = f"{URL}?download=1"
response = httpx.get(url, follow_redirects=True)
response.raise_for_status()
df = pl.read_csv(StringIO(response.text), separator=";")
print(df.shape)
(29, 22)
Great success.