Downloading Files with Selenium

If you use Selenium for browser automation then at some stage you are likely to need to download a file by clicking a button or link on a website. Sometimes this just works. Other times it doesn’t.

When I encounter a stubborn download I have found that adding some specific preferences when I launch Selenium can help.

These are the preferences I apply:

prefs = {
  "download.default_directory": os.getcwd(),
  "download.prompt_for_download": False,
  "directory_upgrade": True,
  "safebrowsing.enabled": True,
  "profile.default_content_settings.popups": 0,
  "profile.content_settings.exceptions.automatic_downloads.*.setting": 1,
  "profile.default_content_setting_values.automatic_downloads": 1,
  "profile.default_content_settings.mimetype_overrides": {
    "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
  }
}

What does each of those do?

  • download.default_directory — Sets the download directory. Not strictly necessary, but useful to have control over this. Defaults to ~/Downloads.
  • download.prompt_for_download — Prevents the browser from asking where to save the file.
  • directory_upgrade — Allows browser to change download directory.
  • safebrowsing.enabled — Enables the Safe Browsing feature, which protects against phishing, malware, and other malicious content. Again, not strictly necessary, but good to have.
  • profile.default_content_settings.popups — Block popups. This refers to browser popups, not in-page dialogs or popups.
  • profile.content_settings.exceptions.automatic_downloads.*.setting — Allow multiple automatic downloads without requiring user intervention.
  • profile.default_content_setting_values.automatic_downloads — Allow automatic downloads.
  • profile.default_content_settings.mimetype_overrides — Override MIME type handling for specific file types.

Of these, the final preference, which specifies how the XLSX MIME type should be handled, is probably the most important. Where does the MIME type come from? It should be found in the server headers for the download (so crack open Developer Tools to find it). Without this setting it’s possible that the browser might apply a generic MIME type (like application/octet-stream), and this might cause the browser to prompt the user for how to handle the downloaded file.

Take a look at a complete Python script that downloads an XLS file from here. In the interests of full disclosure, this script will work fine without those extra preferences, but it does illustrate what needs to be done for a more stubborn site. The server headers for this download are included below.

HTTP/2 200 
last-modified: Tue, 22 Mar 2022 12:47:49 GMT
content-length: 8704
content-type: application/vnd.ms-excel
date: Sat, 05 Oct 2024 04:15:52 GMT
cache-control: max-age=0
expires: Sat, 05 Oct 2024 04:15:52 GMT
server: Apache

Clearly the browser already knows to save the application/vnd.ms-excel MIME type specified in the content-type header. For comparison, here are the server headers for a download from here:

HTTP/2 200 
last-modified: Thu, 27 Jan 2022 17:47:57 GMT
content-length: 9487759
content-type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
date: Sat, 05 Oct 2024 04:19:11 GMT
cache-control: max-age=86400
expires: Sun, 06 Oct 2024 04:19:11 GMT
server: nginx/1.25.5

Note that this uses a different MIME type (application/vnd.openxmlformats-officedocument.spreadsheetml.sheet) to download a XLSX file.