Email in PDF – Pushing the (email) envelope.

https://catalog.archives.gov/id/45022

When NARA released its revised Format Guidance for the Transfer of Permanent Electronic Records in 2014, we identified the file formats acceptable for use by Federal agencies when transferring permanent email messages to NARA. These formats include EML, MBOX, MSG, and PST. Notable for its omission is PDF. This might seem surprising since regardless of the application you use to send and receive email, it almost certainly provides the ability to save a message as a PDF.

Like an email printed to a piece of paper, the PDF may be useful for knowing what the author said. However, it will likely lack sufficient metadata and the interactive features provided by an email client to serve as an authentic representation of the original.

PDF is a commonly used and incredibly flexible format that provides support for a wide variety of data types including text, images, audio, and even video. But migration between formats is complicated. PDF has lots of places to put the metadata about email messages and the accounts they belong to. Important header fields that ensure messages are delivered to the correct mailboxes, equivalent to the To and From addresses on a paper envelope, are often lost. And message threads and the connections to attachments are often broken. At the moment it is up to each vendor to decide how to approach these issues since no one has defined what the migration from an email format to a PDF should look like–until now.

In January 2021 the EA-PDF Working Group through the University of Illinois at Urbana-Champaign published A Specification for Using PDF to Package and Represent Email. Building off of the work of the Task Force on Technical Approaches for Email Archives’ report The Future of Email Archives, this report represents input from organizations with expertise in preserving email, including NARA and the PDF vendor community. It aims to define exactly how a PDF of an email message should be constructed so it can be considered an authentic representation of the original message worthy of preservation for the future.

Publication of this report alone does not make PDF a great format for email and follow-on work is planned to provide vendors with a specification they can implement in future versions of email clients.