If you pull back the curtain and take a look at what a naked email looks like, then you might be astonished.
For a start, despite the fact that an email can embed binary data, the message itself is generally transformed to 7 bit ASCII (the first 128 symbols in ASCII). This means that you can easily read the contents (although not all of it might make much sense). At the top of the message are the headers. Your email client will normally extract some pertinent information from these headers (like the contents from the From, To and Date fields) and display it above the message body. But there’s a lot more information embedded in the headers. In this post we’re going to take a look at what some of those headers mean and what information can be gleaned from them.
Typical Headers
Below is the raw content of a simple (text) email sent from Alice (alice@gmail.com
) to Bob (bob@yahoo.com
).
Delivered-To: bob@yahoo.com
Received: from 10.214.167.142
by atlas103.free.mail.gq1.yahoo.com with HTTPS; Fri, 8 Oct 2021 03:34:29 +0000
Return-Path: <alice@gmail.com>
X-Originating-Ip: [209.85.221.51]
Authentication-Results: atlas103.free.mail.gq1.yahoo.com;
dkim=pass header.i=@gmail-com.20210112.gappssmtp.com header.s=20210112;
spf=none smtp.mailfrom=gmail.com;
dmarc=unknown header.from=gmail.com;
X-Apparently-To: bob@yahoo.com; Fri, 8 Oct 2021 03:34:29 +0000
Received: from 209.85.221.51 (EHLO mail-wr1-f51.google.com)
by 10.214.167.142 with SMTPs
(version=TLS1_2 cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256);
Fri, 08 Oct 2021 03:34:29 +0000
Received: by mail-wr1-f51.google.com with SMTP id o20so25174853wro.3
for <bob@yahoo.com>; Thu, 07 Oct 2021 20:34:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail-com.20210112.gappssmtp.com; s=20210112;
h=message-id:date:mime-version:to:from:subject:content-disposition
:content-transfer-encoding:content-md5;
bh=krdyOAo/jiepPlfm3uymwB2gf1qtzni7L7sg3hCmaSU=;
b=8J0ymoL07NMNgk/0NXGCujWtAZ62KdnEk3HwxZpQS99M4PD4/MKKYhjrJxzt5QJGUq
erS+1nXOeHZD5k7IVlUo7rDJZbDQdt4FFh1wOEaWc8CUPqBu3hJDSgDdWmQRVlsntnnc
CB6tqF/VC3C4jdoBXX39npp+FFJSBNWcVsZLHdqj1dxhHWbIed3Q98Lfkh+rrb7xHBy4
cKzdloNNisVPRKQXnNENWRxAF+22fS6DuvfsFyZLctlvgRg8WXGDQACt6WR4prfuV1R3
thP7NoM7DIIYn3PnepZ3zAbN8P5GG+VOqv64L3sJNUkxrEcNczLdqDDwnyhfUtPKB0wP
hrig==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20210112;
h=x-gm-message-state:message-id:date:mime-version:to:from:subject
:content-disposition:content-transfer-encoding:content-md5;
bh=krdyOAo/jiepPlfm3uymwB2gf1qtzni7L7sg3hCmaSU=;
b=ul2T8UXULfKNYI7WHtEG1+zImSks02kwY4jormij3ejAZpdUnYQeNuiBFWAZV+2SWU
qfLLEEPLcU4Wp9CwX+CbdOZZVM2UKo00EDUjJ3eyOnv0MQy0aBajV27x6L5gKWcURuP4
rPwGEOInoRE3scYbEwmimebQC8xNQD2kCVJH2HfEII3g5L5U8EVk4UH2HxT7LOwwYfx6
Gwlhqf4z7xdJc56ywP7UjRXepIEKo2GRwUxs7BrM/v22380Yge6enehyfJCvRmQe+34z
6Qh0eowKDRbKawIl9RHE6/h1OWBw9mC3I/VzHoXV7+DoXUUYJ883TuTSR47cQ7sqx3+Z
24KQ==
Received: from allieyoo (host-92-12-241-137.as13285.net. [92.12.241.137])
by smtp.gmail.com with ESMTPSA id u5sm1069389wrg.57.2021.10.07.20.34.28
for <bob@yahoo.com>
(version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
Thu, 07 Oct 2021 20:34:28 -0700 (PDT)
Message-ID: <615fbc44.1c69fb81.55d71.5951@mx.google.com>
Date: Thu, 07 Oct 2021 20:34:28 -0700 (PDT)
X-Google-Original-Date: Fri, 08 Oct 2021 03:34:27 GMT
From: alice@gmail.com
To: bob@yahoo.com
Subject: Hello
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
Content-MD5: ZajifYh5KDgxtmS9i38K1A==
Content-Length: 14
Content-Language: en-GB
Hello, World!
If you skimmed through the raw message contents then (hopefully before your eyes glazed over) you would have noted that the headers consist of key-value pairs. For example, Date: Thu, 07 Oct 2021 20:34:28 -0700 (PDT)
has key Date
and value Thu, 07 Oct 2021 20:34:28 -0700 (PDT)
.
You might also have noticed that a number of the header keys start with X-
. These are custom headers inserted by the various processes that handle the mail while it’s being delivered.
Original Message Content
For reference, here is the content of the message that was actually sent by Alice.
Date: Fri, 08 Oct 2021 03:34:27 GMT
From: alice@gmail.com
To: bob@yahoo.com
Subject: Hello
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
Content-MD5: ZajifYh5KDgxtmS9i38K1A==
Hello, World!
The message was generated and sent using the {emayili}
package. Clearly a lot of headers are introduced in the process of delivering the message!
Number of Headers
Some of the header records occur more than once. For example, the Return-Path
field. This is not uncommon. However, there are some rules about which headers may be repeated. These rules also indicates which headers are optional and which are mandatory. The table below (extracted from RFC 5322) indicates the limits on the number of times each type of header should appear in an email. The range of headers included in the table is only a subset of all possible headers. But it gives a fair idea of what the rules are. For example, the From
field must be specified, as must the Date
field. The To
, Cc
and Bcc
fields are all optional though. In principle, you can create a valid email with no recipients.
Source Header Fields
First we’ll take a look at the header fields which are part of the original (source) message as sent. These fields are all populated by the email client.
Date
The Date
field specifies the date and time at which the message was sent.
You might note that the date specified in the source message was Fri, 08 Oct 2021 03:34:27 GMT
(the time in Alice’s timezone), while the date in the delivered message was Thu, 07 Oct 2021 20:34:28 -0700 (PDT)
(the time in the timezone of the email server). The format of the date and time was originally defined in RFC 822 Section 5 and subsequently clarified in RFC 2822 Section 3.3. However you will find that there is quite a variety in the formats used in practice.
The custom X-Google-Original-Date
field retains the date and time in their original source format.
From
and To
The From
and To
fields specify the email addresses of the sender and recipient of the message. The format for the addresses is detailed in RFC 822 Section 6.
In addition to these fields you can also have Cc
(carbon copy.) and Bcc
(blind carbon copy.) fields, which specify additional recipients who are not the main intended recipient. The details of those listed under Bcc
are not visible to any of the recipients.
Subject
The Subject
field doesn’t really require much explanation: it tells you what the message is about.
MIME-Version
Multipurpose Internet Mail Extensions (MIME) is a format which allows email messages to contain content beyond mere text. MIME is described in great detail in RFC 1341. The version of MIME is typically 1.0 because it appears that’s the only version available!
Content-Type
The Content-Type
field specifies the media type (or MIME type) of data in the message. The type is specified as a two-part identifier with form <type>/<subtype>
. For example, text/plain
or text/html
for plain text or HTML content, and image/png
for a PNG image. An extensive list of a wide variety of media types can be found here.
The media type is often followed by one or more parameters. For example, in Alice’s email the Content-Type
field is text/plain; charset=utf-8; format=flowed
, where the charset
and format
parameters indicate UTF-8 encoding of the text and flowed format (lines are not wrapped).
In a multi-part message (for example, a message with both plain and HTML test as well as attachments) there will be multiple Content-Type
fields, one for each part of the message.
Content-Disposition
The Content-Disposition
header indicates how content is to be displayed, and is generally either inline or attachment. RFC 2183 is an extended discussion of this field. The range of permitted values is shown in the table below.
# A tibble: 16 × 2
value reference
<chr> <chr>
1 inline RFC 2183
2 attachment RFC 2183
3 form-data RFC 7578
4 signal RFC 3204
5 alert RFC 3261
6 icon RFC 3261
7 render RFC 3261
8 recipient-list-history RFC 5364
9 session RFC 3261
10 aib RFC 3893
11 early-session RFC 3959
12 recipient-list RFC 5363
13 notification RFC 5438
14 by-reference RFC 5621
15 info-package RFC 6086
16 recording-session RFC 7866
There are also various parameters which can be specified for this field, with options extracted from the table below.
# A tibble: 10 × 2
name reference
<chr> <chr>
1 filename RFC 2183
2 creation-date RFC 2183
3 modification-date RFC 2183
4 read-date RFC 2183
5 size RFC 2183
6 name RFC 7578
7 voice RFC 2421
8 handling RFC 3204
9 preview-type RFC 7763
10 reaction RFC 9078
Content-Transfer-Encoding
The Content-Transfer-Encoding
field is documented in RFC 2045 Section 6. It specifies the way that the content has been encoded. Why is encoding necessary? RFC 821, which established SMTP, specified that data would be represented by 7-bit ASCII characters (the first 128 characters in the ASCII character set).
The TCP connection supports the transmission of 8-bit bytes. The SMTP data is 7-bit ASCII characters. Each character is transmitted as an 8-bit byte with the high-order bit cleared to zero. RFC 821
Furthermore, the maximum length of a line was limited to 1000 characters.
The maximum total length of a text line including the CRLF is 1000 characters (but not counting the leading dot duplicated for transparency). RFC 821
The majority of data that you’d want to include in an email doesn’t comply with either of these requirements. In order for you to be able to use fancy Unicode characters and attach cute cat pictures, those data need to be encoded in such a way that they can still be transmitted over SMTP.
The most common values for this field are
7bit
(default) — Content is already in 7-bit ASCII (no encoding).quoted-printable
— Quoted-Printable encoding uses the equal sign (=
) as an escape character to transform non-ASCII characters into something that can be represented by ASCII. For example, the character é (e-acute), which is represented in UTF-8 encoding by the bytes0xC3
0xA9
, is encoded as=C3=A9
. This encoding also limits line length to 76 characters, where lines are terminated by a soft line break represented by=
. Quoted-Printable encoding is most often used to encode text data.base64
— Base64 encoding encodes data into 6-bit digits which are packed in groups of four, so that 4 Base64 digits represent 24 bits or 3 bytes. Base64 is most often used to encode binary data.
To illustrate the difference between these various coding mechanisms, suppose that you wanted to send the following message in an email:
J'interdis aux marchands de vanter trop leurs marchandises. Car ils se font vite pédagogues et t'enseignent comme but ce qui n'est par essence qu'un moyen, et te trompant ainsi sur la route à suivre les voilà bientôt qui te dégradent, car si leur musique est vulgaire ils te fabriquent pour te la vendre une âme vulgaire.
Using Quoted-Printable encoding the email would look like this:
Content-Type: text/plain;
charset=utf-8;
format=flowed
Content-Transfer-Encoding: quoted-printable
Content-MD5: XjpBVdSoL+frxc/IjptLkA==
J'interdis aux marchands de vanter trop leurs marchandises. Car ils se font=
vite p=C3=A9dagogues et t'enseignent comme but ce qui n'est par essence qu=
'un moyen, et te trompant ainsi sur la route =C3=A0 suivre les voil=C3=A0 b=
ient=C3=B4t qui te d=C3=A9gradent, car si leur musique est vulgaire ils te =
fabriquent pour te la vendre une =C3=A2me vulgaire.
All non-ASCII characters have been encoded and the text has been wrapped to a width of only 76 characters. Most of the content is still legible though. By contrast, Base64 encoding would look like this:
Content-Type: text/plain;
charset=utf-8;
format=flowed
Content-Transfer-Encoding: base64
Content-MD5: XjpBVdSoL+frxc/IjptLkA==
SidpbnRlcmRpcyBhdXggbWFyY2hhbmRzIGRlIHZhbnRlciB0cm9wIGxldXJzIG1hcmNoYW5kaXNl
cy4gQ2FyIGlscyBzZSBmb250IHZpdGUgcMOpZGFnb2d1ZXMgZXQgdCdlbnNlaWduZW50IGNvbW1l
IGJ1dCBjZSBxdWkgbidlc3QgcGFyIGVzc2VuY2UgcXUndW4gbW95ZW4sIGV0IHRlIHRyb21wYW50
IGFpbnNpIHN1ciBsYSByb3V0ZSDDoCBzdWl2cmUgbGVzIHZvaWzDoCBiaWVudMO0dCBxdWkgdGUg
ZMOpZ3JhZGVudCwgY2FyIHNpIGxldXIgbXVzaXF1ZSBlc3QgdnVsZ2FpcmUgaWxzIHRlIGZhYnJp
cXVlbnQgcG91ciB0ZSBsYSB2ZW5kcmUgdW5lIMOibWUgdnVsZ2FpcmUu
Again it’s all ASCII characters with a line width of only 76 characters. But now the content is completely illegible (unless you can Base64 decode in your head!).
Content-MD5
MD5 is an algorithm that produces 128-bit hash values. For the purposes of email it’s used as a checksum to verify data integrity. The Content-MD5
header field is a Base64 encoded representation of the MD5 hash of the message contents.
In the two examples above (for Quoted-Printable and Base64 content encoding) you’ll notice that the value of the Content-MD5
field, XjpBVdSoL+frxc/IjptLkA==, is the same. This is because the contents of the message is the same despite them being encoded in different ways. Decoding the Base64 yields the raw MD5 hash, 5e3a4155d4a82fe7ebc5cfc88e9b4b90. Check this against the results from the MD5 Hash Generator.
Transit Header Fields
Now let’s look at the header fields introduced during delivery.
Message-ID
The Message-ID
field uniquely identifies a specific message.
Message-ID: <615fbc44.1c69fb81.55d71.5951@mx.google.com>
The format of the identifier is similar to that of an email address, consisting of two components: an unique identifier (for example, 615fbc44.1c69fb81.55d71.5951
) and the domain name of the mail server (for example, mx.google.com
).
The Message-ID
is important for linking messages together into threads, where the In-Reply-To
and References
headers are used to reference earlier messages via their Message-ID
.
Received
There are multiple Received
entries in the header. These trace the route that the message took from Alice to Bob.
Received: from 10.214.167.142
by atlas103.free.mail.gq1.yahoo.com with HTTPS; Fri, 8 Oct 2021 03:34:29 +0000
Received: from 209.85.221.51 (EHLO mail-wr1-f51.google.com)
by 10.214.167.142 with SMTPs
(version=TLS1_2 cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256);
Fri, 08 Oct 2021 03:34:29 +0000
Received: by mail-wr1-f51.google.com with SMTP id o20so25174853wro.3
for <bob@yahoo.com>; Thu, 07 Oct 2021 20:34:29 -0700 (PDT)
Received: from allieyoo (host-92-12-241-137.as13285.net. [92.12.241.137])
by smtp.gmail.com with ESMTPSA id u5sm1069389wrg.57.2021.10.07.20.34.28
for <bob@yahoo.com>
(version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
Thu, 07 Oct 2021 20:34:28 -0700 (PDT)
The message was sent initially to the Gmail SMTP server smtp.gmail.com
(IP 64.233.184.109). This is the server configured by Alice for her outgoing email. Where is that server? Good question. I used https://ipgeolocation.io/ to find out (slightly abridged output from their API below).
{
"ip": "64.233.184.109",
"continent_name": "North America",
"country_name": "United States",
"state_prov": "California",
"district": "Old Mountain View",
"city": "Mountain View",
"zipcode": "94041-1238",
"latitude": "37.39500",
"longitude": "-122.08167",
"isp": "Google LLC",
"organization": "Google LLC",
"time_zone": {
"name": "America/Los_Angeles",
"offset": -8,
"is_dst": true,
"dst_savings": 1
}
}
Alice’s message went from her computer in the UK to a server in Mountain View, California on the west coast of the USA. That’s quite a big leap! We can use the other Received
fields to track its progress from there. Next stop was mail-wr1-f51.google.com
(IP 209.85.221.51), also located in Mountain View, California. From there it went to a machine on a private network (IP 10.214.167.142) before finally being delivered to atlas103.free.mail.gq1.yahoo.com
, presumably another server located on a private network.
The Received
field can also include information on the encryption (see references to TLS
) and the protocol (see ESMTPSA
, SMTP
and SMTPs
) being used.
# A tibble: 18 × 2
protocol description
<chr> <chr>
1 SMTP "Simple Mail Transfer Protocol"
2 ESMTP "SMTP with Service Extensions"
3 ESMTPA "ESMTP with AUTH"
4 ESMTPS "ESMTP with STARTTLS"
5 ESMTPSA "ESMTP with both STARTTLS and AUTH"
6 LMTP "Local Mail Transfer Protocol"
7 LMTPA "LMTP with AUTH"
8 LMTPS "LMTP with STARTTLS"
9 LMTPSA "LMTP with both STARTTLS and AUTH"
10 MMS "Multimedia Messaging Service"
11 UTF8SMTP "ESMTP with SMTPUTF8"
12 UTF8SMTPA "ESMTP with SMTPUTF8 and AUTH"
13 UTF8SMTPS "ESMTP with SMTPUTF8 and STARTTLS"
14 UTF8SMTPSA "ESMTP with SMTPUTF8 and both STARTTLS and \nAUTH"
15 UTF8LMTP "LMTP with SMTPUTF8"
16 UTF8LMTPA "LMTP with SMTPUTF8 and AUTH"
17 UTF8LMTPS "LMTP with SMTPUTF8 and STARTTLS"
18 UTF8LMTPSA "LMTP with SMTPUTF8 and both STARTTLS and \nAUTH"
There can also be an X-Received
field which is a custom field containing information similar to the Received
field.
X-Originating-Ip
The X-Originating-Ip
field identifies the IP address of the sender. This is relevant, for example, when you send an email using a web email client. In this case the web client communicates with the SMTP server, but the content of this field will be your actual IP address.
Delivered-To
and Return-Path
The Delivered-To
field records the email address to which the message was actually delivered. Somewhat surprisingly, this can be different to the addresses specified in the To
, Cc
or Bcc
fields. How does this happen? An email message generally passes through numerous servers and processes between the sender and the recipient. At various stages in this process rules might be applied which modify the delivery of the message. Suppose, for example, that Bob (bob@yahoo.com
) is one of many clients with whom Alice communicates. If she sends out a general email to all of her clients then she will probably use a mailing list rather than adding the address of each client individually. The message To
field would then contain the name of the mailing list, but this would be expanded en route to a list of individual email addresses, and these would appear in the Delivered-To
field.
Sometimes an email doesn’t get delivered. Maybe there’s a problem with the recipients email server? Or perhaps the address of the recipient has changed? In this case the email will bounce and a bounce message will be returned which informs the sender that the email did not reach its destination. The Return-Path
field specifies the address to which the bounce message should be delivered. If, for example, you’re sending email to a large mailing list and some of the addresses on that list are unreliable then it can be useful to specify a different Return-Path
address so that bounce messages don’t end up cluttering your inbox. Although you can specify a value for Return-Path
it’s possible that your email server will override this.
Authentication-Results
The Authentication-Results
header contains information on what authentication methods have been applied to a message. It will often include information on the following protocols, all of which are intended to detected forged sender addresses (also known as “email spoofing”):
- DKIM (DomainKeys Identified Mail)
- SPF (Sender Policy Framework) and
- DMARC (Domain-based Message Authentication, Reporting and Conformance).
Authentication-Results: atlas103.free.mail.gq1.yahoo.com;
dkim=pass header.i=@gmail-com.20210112.gappssmtp.com header.s=20210112;
spf=none smtp.mailfrom=gmail.com;
dmarc=unknown header.from=gmail.com;
In the example above the results for each of these methods are listed. Valid results for each method are:
- DKIM —
pass
,fail
ornone
- SPF —
pass
,fail
,softfail
,neutral
,none
,temperror
orpermerror
- DMARC —
pass
,fail
,bestguesspass
,none
orunknown
.
DKIM-Signature
DKIM (DomainKeys Identified Mail) is an authentication protocol. A DKIM Signature verifies the DNS domain of the email sender. The DNS entry for a mail server is associated with a public key, that is published and freely available. The DKIM Signature is a signed with the corresponding private key. Using the public key it’s then possible to establish the validity of the sender’s email address.
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail-com.20210112.gappssmtp.com; s=20210112;
h=message-id:date:mime-version:to:from:subject:content-disposition
:content-transfer-encoding:content-md5;
bh=krdyOAo/jiepPlfm3uymwB2gf1qtzni7L7sg3hCmaSU=;
b=8J0ymoL07NMNgk/0NXGCujWtAZ62KdnEk3HwxZpQS99M4PD4/MKKYhjrJxzt5QJGUq
erS+1nXOeHZD5k7IVlUo7rDJZbDQdt4FFh1wOEaWc8CUPqBu3hJDSgDdWmQRVlsntnnc
CB6tqF/VC3C4jdoBXX39npp+FFJSBNWcVsZLHdqj1dxhHWbIed3Q98Lfkh+rrb7xHBy4
cKzdloNNisVPRKQXnNENWRxAF+22fS6DuvfsFyZLctlvgRg8WXGDQACt6WR4prfuV1R3
thP7NoM7DIIYn3PnepZ3zAbN8P5GG+VOqv64L3sJNUkxrEcNczLdqDDwnyhfUtPKB0wP
hrig==
The DKIM Signature field is comprised of a number of elements:
v
— the DKIM versiona
— the signing algorithmc
— algorithm used to canonicalise the header and body (optional)d
— Signing Domain Identifier (SDID) is the domain used to sign the emails
— a selectorh
— list of header fields that were signedbh
— a hash of the message bodyb
— signature of the header fields and body.
The custom X-Google-DKIM-Signature
field is another DKIM signature added by Gmail.
Conclusion
There’s obviously a lot of complexity lurking in those email headers. I’ve really learned a lot while researching this post and I hope that this information will be useful to you too.