Fathoming Email Headers

If you pull back the curtain and take a look at what a naked email looks like, then you might be astonished.

For a start, despite the fact that an email can embed binary data, the message itself is generally transformed to 7 bit ASCII (the first 128 symbols in ASCII). This means that you can easily read the contents (although not all of it might make much sense). At the top of the message are the headers. Your email client will normally extract some pertinent information from these headers (like the contents from the From, To and Date fields) and display it above the message body. But there’s a lot more information embedded in the headers. In this post we’re going to take a look at what some of those headers mean and what information can be gleaned from them.

Typical Headers

Below is the raw content of a simple (text) email sent from Alice (alice@gmail.com) to Bob (bob@yahoo.com).

Delivered-To: bob@yahoo.com
Received: from 10.214.167.142
by atlas103.free.mail.gq1.yahoo.com with HTTPS; Fri, 8 Oct 2021 03:34:29 +0000
Return-Path: <alice@gmail.com>
  X-Originating-Ip: [209.85.221.51]
Authentication-Results: atlas103.free.mail.gq1.yahoo.com;
dkim=pass header.i=@gmail-com.20210112.gappssmtp.com header.s=20210112;
spf=none smtp.mailfrom=gmail.com;
dmarc=unknown header.from=gmail.com;
X-Apparently-To: bob@yahoo.com; Fri, 8 Oct 2021 03:34:29 +0000
Received: from 209.85.221.51 (EHLO mail-wr1-f51.google.com)
by 10.214.167.142 with SMTPs
(version=TLS1_2 cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256);
Fri, 08 Oct 2021 03:34:29 +0000
Received: by mail-wr1-f51.google.com with SMTP id o20so25174853wro.3
for <bob@yahoo.com>; Thu, 07 Oct 2021 20:34:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail-com.20210112.gappssmtp.com; s=20210112;
h=message-id:date:mime-version:to:from:subject:content-disposition
:content-transfer-encoding:content-md5;
bh=krdyOAo/jiepPlfm3uymwB2gf1qtzni7L7sg3hCmaSU=;
b=8J0ymoL07NMNgk/0NXGCujWtAZ62KdnEk3HwxZpQS99M4PD4/MKKYhjrJxzt5QJGUq
erS+1nXOeHZD5k7IVlUo7rDJZbDQdt4FFh1wOEaWc8CUPqBu3hJDSgDdWmQRVlsntnnc
CB6tqF/VC3C4jdoBXX39npp+FFJSBNWcVsZLHdqj1dxhHWbIed3Q98Lfkh+rrb7xHBy4
cKzdloNNisVPRKQXnNENWRxAF+22fS6DuvfsFyZLctlvgRg8WXGDQACt6WR4prfuV1R3
thP7NoM7DIIYn3PnepZ3zAbN8P5GG+VOqv64L3sJNUkxrEcNczLdqDDwnyhfUtPKB0wP
hrig==
  X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20210112;
h=x-gm-message-state:message-id:date:mime-version:to:from:subject
:content-disposition:content-transfer-encoding:content-md5;
bh=krdyOAo/jiepPlfm3uymwB2gf1qtzni7L7sg3hCmaSU=;
b=ul2T8UXULfKNYI7WHtEG1+zImSks02kwY4jormij3ejAZpdUnYQeNuiBFWAZV+2SWU
qfLLEEPLcU4Wp9CwX+CbdOZZVM2UKo00EDUjJ3eyOnv0MQy0aBajV27x6L5gKWcURuP4
rPwGEOInoRE3scYbEwmimebQC8xNQD2kCVJH2HfEII3g5L5U8EVk4UH2HxT7LOwwYfx6
Gwlhqf4z7xdJc56ywP7UjRXepIEKo2GRwUxs7BrM/v22380Yge6enehyfJCvRmQe+34z
6Qh0eowKDRbKawIl9RHE6/h1OWBw9mC3I/VzHoXV7+DoXUUYJ883TuTSR47cQ7sqx3+Z
24KQ==
  Received: from allieyoo (host-92-12-241-137.as13285.net. [92.12.241.137])
by smtp.gmail.com with ESMTPSA id u5sm1069389wrg.57.2021.10.07.20.34.28
for <bob@yahoo.com>
  (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
Thu, 07 Oct 2021 20:34:28 -0700 (PDT)
Message-ID: <615fbc44.1c69fb81.55d71.5951@mx.google.com>
  Date: Thu, 07 Oct 2021 20:34:28 -0700 (PDT)
X-Google-Original-Date: Fri, 08 Oct 2021 03:34:27 GMT
From:                      alice@gmail.com
To:                        bob@yahoo.com
Subject:                   Hello
MIME-Version:              1.0
Content-Type:              text/plain; charset=utf-8; format=flowed
Content-Disposition:       inline
Content-Transfer-Encoding: 7bit
Content-MD5:               ZajifYh5KDgxtmS9i38K1A==
  Content-Length: 14
Content-Language: en-GB

Hello, World!

If you skimmed through the raw message contents then (hopefully before your eyes glazed over) you would have noted that the headers consist of key-value pairs. For example, Date: Thu, 07 Oct 2021 20:34:28 -0700 (PDT) has key Date and value Thu, 07 Oct 2021 20:34:28 -0700 (PDT).

You might also have noticed that a number of the header keys start with X-. These are custom headers inserted by the various processes that handle the mail while it’s being delivered.

Original Message Content

For reference, here is the content of the message that was actually sent by Alice.

Date:                      Fri, 08 Oct 2021 03:34:27 GMT
From:                      alice@gmail.com
To:                        bob@yahoo.com
Subject:                   Hello
MIME-Version:              1.0
Content-Type:              text/plain; charset=utf-8; format=flowed
Content-Disposition:       inline
Content-Transfer-Encoding: 7bit
Content-MD5:               ZajifYh5KDgxtmS9i38K1A==

Hello, World!

The message was generated and sent using the {emayili} package. Clearly a lot of headers are introduced in the process of delivering the message!

Number of Headers

Some of the header records occur more than once. For example, the Return-Path field. This is not uncommon. However, there are some rules about which headers may be repeated. These rules also indicates which headers are optional and which are mandatory. The table below (extracted from RFC 5322) indicates the limits on the number of times each type of header should appear in an email. The range of headers included in the table is only a subset of all possible headers. But it gives a fair idea of what the rules are. For example, the From field must be specified, as must the Date field. The To, Cc and Bcc fields are all optional though. In principle, you can create a valid email with no recipients.

Permitted number of values for a range of email header fields.

Source Header Fields

First we’ll take a look at the header fields which are part of the original (source) message as sent. These fields are all populated by the email client.

Date

The Date field specifies the date and time at which the message was sent.

You might note that the date specified in the source message was Fri, 08 Oct 2021 03:34:27 GMT (the time in Alice’s timezone), while the date in the delivered message was Thu, 07 Oct 2021 20:34:28 -0700 (PDT) (the time in the timezone of the email server). The format of the date and time was originally defined in RFC 822 Section 5 and subsequently clarified in RFC 2822 Section 3.3. However you will find that there is quite a variety in the formats used in practice.

The custom X-Google-Original-Date field retains the date and time in their original source format.

From and To

The From and To fields specify the email addresses of the sender and recipient of the message. The format for the addresses is detailed in RFC 822 Section 6.

In addition to these fields you can also have Cc (carbon copy.) and Bcc (blind carbon copy.) fields, which specify additional recipients who are not the main intended recipient. The details of those listed under Bcc are not visible to any of the recipients.

Subject

The Subject field doesn’t really require much explanation: it tells you what the message is about.

MIME-Version

Multipurpose Internet Mail Extensions (MIME) is a format which allows email messages to contain content beyond mere text. MIME is described in great detail in RFC 1341. The version of MIME is typically 1.0 because it appears that’s the only version available!

Content-Type

The Content-Type field specifies the media type (or MIME type) of data in the message. The type is specified as a two-part identifier with form <type>/<subtype>. For example, text/plain or text/html for plain text or HTML content, and image/png for a PNG image. An extensive list of a wide variety of media types can be found here.

The media type is often followed by one or more parameters. For example, in Alice’s email the Content-Type field is text/plain; charset=utf-8; format=flowed, where the charset and format parameters indicate UTF-8 encoding of the text and flowed format (lines are not wrapped).

In a multi-part message (for example, a message with both plain and HTML test as well as attachments) there will be multiple Content-Type fields, one for each part of the message.

Content-Disposition

The Content-Disposition header indicates how content is to be displayed, and is generally either inline or attachment. RFC 2183 is an extended discussion of this field. The range of permitted values is shown in the table below.

# A tibble: 16 × 2
   value                  reference
   <chr>                  <chr>    
 1 inline                 RFC 2183 
 2 attachment             RFC 2183 
 3 form-data              RFC 7578 
 4 signal                 RFC 3204 
 5 alert                  RFC 3261 
 6 icon                   RFC 3261 
 7 render                 RFC 3261 
 8 recipient-list-history RFC 5364 
 9 session                RFC 3261 
10 aib                    RFC 3893 
11 early-session          RFC 3959 
12 recipient-list         RFC 5363 
13 notification           RFC 5438 
14 by-reference           RFC 5621 
15 info-package           RFC 6086 
16 recording-session      RFC 7866 

There are also various parameters which can be specified for this field, with options extracted from the table below.

# A tibble: 10 × 2
   name              reference
   <chr>             <chr>    
 1 filename          RFC 2183 
 2 creation-date     RFC 2183 
 3 modification-date RFC 2183 
 4 read-date         RFC 2183 
 5 size              RFC 2183 
 6 name              RFC 7578 
 7 voice             RFC 2421 
 8 handling          RFC 3204 
 9 preview-type      RFC 7763 
10 reaction          RFC 9078 

Content-Transfer-Encoding

The Content-Transfer-Encoding field is documented in RFC 2045 Section 6. It specifies the way that the content has been encoded. Why is encoding necessary? RFC 821, which established SMTP, specified that data would be represented by 7-bit ASCII characters (the first 128 characters in the ASCII character set).

The TCP connection supports the transmission of 8-bit bytes. The SMTP data is 7-bit ASCII characters. Each character is transmitted as an 8-bit byte with the high-order bit cleared to zero. RFC 821

Furthermore, the maximum length of a line was limited to 1000 characters.

The maximum total length of a text line including the CRLF is 1000 characters (but not counting the leading dot duplicated for transparency). RFC 821

The majority of data that you’d want to include in an email doesn’t comply with either of these requirements. In order for you to be able to use fancy Unicode characters and attach cute cat pictures, those data need to be encoded in such a way that they can still be transmitted over SMTP.

The most common values for this field are

  • 7bit (default) — Content is already in 7-bit ASCII (no encoding).
  • quoted-printableQuoted-Printable encoding uses the equal sign (=) as an escape character to transform non-ASCII characters into something that can be represented by ASCII. For example, the character é (e-acute), which is represented in UTF-8 encoding by the bytes 0xC3 0xA9, is encoded as =C3=A9. This encoding also limits line length to 76 characters, where lines are terminated by a soft line break represented by =. Quoted-Printable encoding is most often used to encode text data.
  • base64Base64 encoding encodes data into 6-bit digits which are packed in groups of four, so that 4 Base64 digits represent 24 bits or 3 bytes. Base64 is most often used to encode binary data.

To illustrate the difference between these various coding mechanisms, suppose that you wanted to send the following message in an email:

J'interdis aux marchands de vanter trop leurs marchandises. Car ils se font vite pédagogues et t'enseignent comme but ce qui n'est par essence qu'un moyen, et te trompant ainsi sur la route à suivre les voilà bientôt qui te dégradent, car si leur musique est vulgaire ils te fabriquent pour te la vendre une âme vulgaire.

Using Quoted-Printable encoding the email would look like this:

Content-Type:                 text/plain; 
                              charset=utf-8; 
                              format=flowed
Content-Transfer-Encoding:    quoted-printable
Content-MD5:                  XjpBVdSoL+frxc/IjptLkA==

J'interdis aux marchands de vanter trop leurs marchandises. Car ils se font=
 vite p=C3=A9dagogues et t'enseignent comme but ce qui n'est par essence qu=
'un moyen, et te trompant ainsi sur la route =C3=A0 suivre les voil=C3=A0 b=
ient=C3=B4t qui te d=C3=A9gradent, car si leur musique est vulgaire ils te =
fabriquent pour te la vendre une =C3=A2me vulgaire.

All non-ASCII characters have been encoded and the text has been wrapped to a width of only 76 characters. Most of the content is still legible though. By contrast, Base64 encoding would look like this:

Content-Type:                 text/plain; 
                              charset=utf-8; 
                              format=flowed
Content-Transfer-Encoding:    base64
Content-MD5:                  XjpBVdSoL+frxc/IjptLkA==

SidpbnRlcmRpcyBhdXggbWFyY2hhbmRzIGRlIHZhbnRlciB0cm9wIGxldXJzIG1hcmNoYW5kaXNl
cy4gQ2FyIGlscyBzZSBmb250IHZpdGUgcMOpZGFnb2d1ZXMgZXQgdCdlbnNlaWduZW50IGNvbW1l
IGJ1dCBjZSBxdWkgbidlc3QgcGFyIGVzc2VuY2UgcXUndW4gbW95ZW4sIGV0IHRlIHRyb21wYW50
IGFpbnNpIHN1ciBsYSByb3V0ZSDDoCBzdWl2cmUgbGVzIHZvaWzDoCBiaWVudMO0dCBxdWkgdGUg
ZMOpZ3JhZGVudCwgY2FyIHNpIGxldXIgbXVzaXF1ZSBlc3QgdnVsZ2FpcmUgaWxzIHRlIGZhYnJp
cXVlbnQgcG91ciB0ZSBsYSB2ZW5kcmUgdW5lIMOibWUgdnVsZ2FpcmUu

Again it’s all ASCII characters with a line width of only 76 characters. But now the content is completely illegible (unless you can Base64 decode in your head!).

Content-MD5

MD5 is an algorithm that produces 128-bit hash values. For the purposes of email it’s used as a checksum to verify data integrity. The Content-MD5 header field is a Base64 encoded representation of the MD5 hash of the message contents.

In the two examples above (for Quoted-Printable and Base64 content encoding) you’ll notice that the value of the Content-MD5 field, XjpBVdSoL+frxc/IjptLkA==, is the same. This is because the contents of the message is the same despite them being encoded in different ways. Decoding the Base64 yields the raw MD5 hash, 5e3a4155d4a82fe7ebc5cfc88e9b4b90. Check this against the results from the MD5 Hash Generator.

Checking hash results using the MD5 Hash Generator.

Transit Header Fields

Now let’s look at the header fields introduced during delivery.

Message-ID

The Message-ID field uniquely identifies a specific message.

Message-ID: <615fbc44.1c69fb81.55d71.5951@mx.google.com>

The format of the identifier is similar to that of an email address, consisting of two components: an unique identifier (for example, 615fbc44.1c69fb81.55d71.5951) and the domain name of the mail server (for example, mx.google.com).

The Message-ID is important for linking messages together into threads, where the In-Reply-To and References headers are used to reference earlier messages via their Message-ID.

Received

There are multiple Received entries in the header. These trace the route that the message took from Alice to Bob.

Received: from 10.214.167.142
by atlas103.free.mail.gq1.yahoo.com with HTTPS; Fri, 8 Oct 2021 03:34:29 +0000
Received: from 209.85.221.51 (EHLO mail-wr1-f51.google.com)
by 10.214.167.142 with SMTPs
(version=TLS1_2 cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256);
Fri, 08 Oct 2021 03:34:29 +0000
Received: by mail-wr1-f51.google.com with SMTP id o20so25174853wro.3
for <bob@yahoo.com>; Thu, 07 Oct 2021 20:34:29 -0700 (PDT)
Received: from allieyoo (host-92-12-241-137.as13285.net. [92.12.241.137])
by smtp.gmail.com with ESMTPSA id u5sm1069389wrg.57.2021.10.07.20.34.28
for <bob@yahoo.com>
(version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
Thu, 07 Oct 2021 20:34:28 -0700 (PDT)

The message was sent initially to the Gmail SMTP server smtp.gmail.com (IP 64.233.184.109). This is the server configured by Alice for her outgoing email. Where is that server? Good question. I used https://ipgeolocation.io/ to find out (slightly abridged output from their API below).

{
  "ip": "64.233.184.109",
  "continent_name": "North America",
  "country_name": "United States",
  "state_prov": "California",
  "district": "Old Mountain View",
  "city": "Mountain View",
  "zipcode": "94041-1238",
  "latitude": "37.39500",
  "longitude": "-122.08167",
  "isp": "Google LLC",
  "organization": "Google LLC",
  "time_zone": {
    "name": "America/Los_Angeles",
    "offset": -8,
    "is_dst": true,
    "dst_savings": 1
  }
}

Alice’s message went from her computer in the UK to a server in Mountain View, California on the west coast of the USA. That’s quite a big leap! We can use the other Received fields to track its progress from there. Next stop was mail-wr1-f51.google.com (IP 209.85.221.51), also located in Mountain View, California. From there it went to a machine on a private network (IP 10.214.167.142) before finally being delivered to atlas103.free.mail.gq1.yahoo.com, presumably another server located on a private network.

The Received field can also include information on the encryption (see references to TLS) and the protocol (see ESMTPSA, SMTP and SMTPs) being used.

# A tibble: 18 × 2
   protocol   description                                        
   <chr>      <chr>                                              
 1 SMTP       "Simple Mail Transfer Protocol"                    
 2 ESMTP      "SMTP with Service Extensions"                     
 3 ESMTPA     "ESMTP with AUTH"                                  
 4 ESMTPS     "ESMTP with STARTTLS"                              
 5 ESMTPSA    "ESMTP with both STARTTLS and AUTH"                
 6 LMTP       "Local Mail Transfer Protocol"                     
 7 LMTPA      "LMTP with AUTH"                                   
 8 LMTPS      "LMTP with STARTTLS"                               
 9 LMTPSA     "LMTP with both STARTTLS and AUTH"                 
10 MMS        "Multimedia Messaging Service"                     
11 UTF8SMTP   "ESMTP with SMTPUTF8"                              
12 UTF8SMTPA  "ESMTP with SMTPUTF8 and AUTH"                     
13 UTF8SMTPS  "ESMTP with SMTPUTF8 and STARTTLS"                 
14 UTF8SMTPSA "ESMTP with SMTPUTF8 and both STARTTLS and  \nAUTH"
15 UTF8LMTP   "LMTP with SMTPUTF8"                               
16 UTF8LMTPA  "LMTP with SMTPUTF8 and AUTH"                      
17 UTF8LMTPS  "LMTP with SMTPUTF8 and STARTTLS"                  
18 UTF8LMTPSA "LMTP with SMTPUTF8 and both STARTTLS and   \nAUTH"

There can also be an X-Received field which is a custom field containing information similar to the Received field.

X-Originating-Ip

The X-Originating-Ip field identifies the IP address of the sender. This is relevant, for example, when you send an email using a web email client. In this case the web client communicates with the SMTP server, but the content of this field will be your actual IP address.

Delivered-To and Return-Path

The Delivered-To field records the email address to which the message was actually delivered. Somewhat surprisingly, this can be different to the addresses specified in the To, Cc or Bcc fields. How does this happen? An email message generally passes through numerous servers and processes between the sender and the recipient. At various stages in this process rules might be applied which modify the delivery of the message. Suppose, for example, that Bob (bob@yahoo.com) is one of many clients with whom Alice communicates. If she sends out a general email to all of her clients then she will probably use a mailing list rather than adding the address of each client individually. The message To field would then contain the name of the mailing list, but this would be expanded en route to a list of individual email addresses, and these would appear in the Delivered-To field.

Sometimes an email doesn’t get delivered. Maybe there’s a problem with the recipients email server? Or perhaps the address of the recipient has changed? In this case the email will bounce and a bounce message will be returned which informs the sender that the email did not reach its destination. The Return-Path field specifies the address to which the bounce message should be delivered. If, for example, you’re sending email to a large mailing list and some of the addresses on that list are unreliable then it can be useful to specify a different Return-Path address so that bounce messages don’t end up cluttering your inbox. Although you can specify a value for Return-Path it’s possible that your email server will override this.

Address not found message.

Authentication-Results

The Authentication-Results header contains information on what authentication methods have been applied to a message. It will often include information on the following protocols, all of which are intended to detected forged sender addresses (also known as “email spoofing”):

  • DKIM (DomainKeys Identified Mail)
  • SPF (Sender Policy Framework) and
  • DMARC (Domain-based Message Authentication, Reporting and Conformance).
Authentication-Results: atlas103.free.mail.gq1.yahoo.com;
 dkim=pass header.i=@gmail-com.20210112.gappssmtp.com header.s=20210112;
 spf=none smtp.mailfrom=gmail.com;
 dmarc=unknown header.from=gmail.com;

In the example above the results for each of these methods are listed. Valid results for each method are:

  • DKIM — pass, fail or none
  • SPF — pass, fail, softfail, neutral, none, temperror or permerror
  • DMARC — pass, fail, bestguesspass, none or unknown.

DKIM-Signature

DKIM (DomainKeys Identified Mail) is an authentication protocol. A DKIM Signature verifies the DNS domain of the email sender. The DNS entry for a mail server is associated with a public key, that is published and freely available. The DKIM Signature is a signed with the corresponding private key. Using the public key it’s then possible to establish the validity of the sender’s email address.

DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail-com.20210112.gappssmtp.com; s=20210112;
        h=message-id:date:mime-version:to:from:subject:content-disposition
         :content-transfer-encoding:content-md5;
        bh=krdyOAo/jiepPlfm3uymwB2gf1qtzni7L7sg3hCmaSU=;
        b=8J0ymoL07NMNgk/0NXGCujWtAZ62KdnEk3HwxZpQS99M4PD4/MKKYhjrJxzt5QJGUq
         erS+1nXOeHZD5k7IVlUo7rDJZbDQdt4FFh1wOEaWc8CUPqBu3hJDSgDdWmQRVlsntnnc
         CB6tqF/VC3C4jdoBXX39npp+FFJSBNWcVsZLHdqj1dxhHWbIed3Q98Lfkh+rrb7xHBy4
         cKzdloNNisVPRKQXnNENWRxAF+22fS6DuvfsFyZLctlvgRg8WXGDQACt6WR4prfuV1R3
         thP7NoM7DIIYn3PnepZ3zAbN8P5GG+VOqv64L3sJNUkxrEcNczLdqDDwnyhfUtPKB0wP
         hrig==

The DKIM Signature field is comprised of a number of elements:

  • v — the DKIM version
  • a — the signing algorithm
  • c — algorithm used to canonicalise the header and body (optional)
  • d — Signing Domain Identifier (SDID) is the domain used to sign the email
  • s — a selector
  • h — list of header fields that were signed
  • bh — a hash of the message body
  • b — signature of the header fields and body.

The custom X-Google-DKIM-Signature field is another DKIM signature added by Gmail.

Conclusion

There’s obviously a lot of complexity lurking in those email headers. I’ve really learned a lot while researching this post and I hope that this information will be useful to you too.