How can you be sure that the contents of an email haven’t been tampered with? The best approach would probably be to have a digital signature on each component of the message. Perhaps I’ll look at integrating that into {emayili}
some time in the future. However, today I’m writing about the first step in that direction: MD5 checksums.
The Content-MD5 Header Field
RFC 1864 describes the Content-MD5
header field. The MD5 algorithm is used to generate a hash for each component of the message and that hash is included in the Content-MD5
header field.
Sounds pretty simple, right? Well, the devil’s in the details (although, in this case, it’s not particularly devilish). The MD5 algorithm produces a 128 bit digest. These bits are translated into 16 bytes (or octets). And those bytes are then Base64 encoded. The final result is a 24 character string (including padding).
First let’s explore this in the shell.
MD5 & Base64 in the Shell
The md5sum
command-line tool can be used to generate an MD5 hash. What’s the MD5 hash for a simple “Hello, World!” message?
echo "Hello, World!" | md5sum
bea8252ff4e80f41719ea13cdf007273 -
Superficially this looks good, but there’s a subtlety that could trip us up: echo
will implicitly append a line feed character to the end of the message. And that also gets factored into the hash. We don’t want that. The -n
flag will suppress the trailing line feed.
echo -n "Hello, World!" | md5sum
65a8e27d8879283831b664bd8b7f0ad4 -
Now we can Base64 encode the result.
echo -n "Hello, World!" | md5sum | base64
NjVhOGUyN2Q4ODc5MjgzODMxYjY2NGJkOGI3ZjBhZDQgIC0K
Okay, hold on. The result was supposed to be only 24 characters long. Something’s not right.
The problem is that we are Base64 encoding the characters in the hexadecimal representation of the bytes rather than the bytes themselves. We’re going to need different tools.
echo -n "Check Integrity!" | base64
Q2hlY2sgSW50ZWdyaXR5IQ==
I’m still not quite sure what the point of that was.
The openssl
tool can also be used to generate an MD5 hash. And it can do the Base64 encoding too. Let’s start with the MD5 hash.
echo -n "Hello, World!" | openssl dgst -md5
MD5(stdin)= 65a8e27d8879283831b664bd8b7f0ad4
That’s consistent with the earlier result. Now, we’ll use the -binary
flag to get binary output (a series of bytes rather than the hexadecimal representation of those bytes). We’ll pipe that back into openssl
again and then do the Base64 encoding.
echo -n "Hello, World!" | openssl dgst -md5 -binary | openssl enc -base64
ZajifYh5KDgxtmS9i38K1A==
Count the characters? There are 24, just as required.
Now Repeat in R
How about repeating the process now in R? The {digest}
library has a function for producing an MD5 hash (along with a bunch of other digest types).
library(digest)
library(base64enc)
Let’s generate the MD5 hash for the same message.
digest("Hello, World!", algo = "md5", serialize = FALSE)
[1] "65a8e27d8879283831b664bd8b7f0ad4"
Let’s take the long way around getting the required hash. First, we’ll break the hash down into a series of two-digit hexadecimal numbers.
hash <- digest("Hello, World!", algo = "md5", serialize = FALSE) %>%
substring(
first = seq(1, nchar(.), 2),
last = seq(2, nchar(.), 2)
)
[1] "65" "a8" "e2" "7d" "88" "79" "28" "38" "31" "b6" "64" "bd" "8b" "7f" "0a" "d4"
Now convert each of those to an integer. The 16
is for base 16 (hexadecimal).
hash <- strtoi(hash, 16)
[1] 101 168 226 125 136 121 40 56 49 182 100 189 139 127 10 212
Finally, Base64 encode those bytes!
base64encode(hash)
[1] "ZajifYh5KDgxtmS9i38K1A=="
As edifying as that was, it was a most circuitous route. Fortunately, we can get there more directly.
digest("Hello, World!", algo = "md5", serialize = FALSE, raw = TRUE) %>%
base64encode()
[1] "ZajifYh5KDgxtmS9i38K1A=="
MD5 in {emayili}
This functionality has now been baked into {emayili}
.
library(emayili)
options(envelope.details = TRUE)
options(envelope.invisible = FALSE)
packageVersion("emayili")
[1] '0.6.1'
We’ll try it out using the same simple message.
envelope() %>%
text("Hello, World!")
Date: Sun, 08 Sep 2024 06:49:01 GMT
X-Mailer: {emayili}-0.9.1
MIME-Version: 1.0
Content-Type: text/plain;
charset=utf-8;
format=flowed
Content-Transfer-Encoding: 7bit
Content-MD5: ZajifYh5KDgxtmS9i38K1A==
Hello, World!
The Content-MD5
header field contains the Base64 encoded MD5 hash of the message body. With that you can verify that the message content has not been modified (although you’ll probably leave this to your mail client).