This is a major revision of an article that I wrote in 2013. I’m developing a paperless workflow for my home and office. I want to save all my documents in PDF/A-1b PDF/A-2b archival format so I will be able to open them for years to come. The PDFs should be searchable, meaning they contain not only images of documents, but strings of text. This allows the documents to be indexed so I can quickly find documents when I type in Windows Explorer’s search box.

Note The 2013 approach created PDF/A-1b files, which do not support transparency. Increasingly, files that I receive include transparent fonts; when these were converted to PDF/A-1b, the fonts were rasterized, the files were no longer searchable, and the the files were very large. PDF/A-2b, with its support for transparency, solves all that.

There are basically three types of documents that need to be archived:

Set up the Batch Components

Caveat This approach should create valid PDF/A documents, but even among experts, there is some disagreement about the PDF/A standard. Use this approach at your own risk. If you have Adobe Acrobat Professional, you can use its “pre-flight” validation to check the output. Or you may want to try a free online validator like the one at PDF-Tools.com. For more background on the process, see this this superuser article and this Ghostscript bug report. In brief testing with PDF/A-2b, I found that one of three test files failed validation, complaining about CMYK colorspace even though I specified RGB. Maybe the source document (an AT&T Internet bill) had some reference to CMYK.

The underlying technology for this batch file is the same as for the CutePDF process, so if you have already followed the other post, you can skip the identical steps.

1. Download the latest GNU Affero-licensed version of Ghostscript here (version 9.25 as of this writing) (version 9.54 as of April 22, 2021). I found that the 32-bit version works fine even under 64-bit Windows 7 or 10. Install Ghostscript but customize the directory so it doesn’t change if you get a later version, I use C:\Program Files (x86)\gs\latest. At the end of the install, go ahead and let it Generate cidfmap for Windows CJK TrueType fonts.

Batch Convert PDFA 1

2. Create an empty folder on your C: drive called C:\GS_PDFA (Ghostscript PDF/A).

3. Go to Control Panel > System and Security > System. Click on Advanced system settings then Environment Variables. Under System variables, highlight Path, click Edit and add C:\GS_PDFA to the Path (shown here in Windows 10):

Batch Convert PDFA 4

Batch Convert PDFA 5

Batch Convert PDFA 6

4. Download PDFAbatch_1.5.zip and unzip it into C:\GS_PDFA. This will give you four files:

Note that PDFA_def.sys is the same file described in the CutePDF post, so it’s okay to overwrite it.

Update April 22, 2021 Some updates for compatibility with Ghostscript 9.54. See Release Notes.txt.

5. Locate the path to Ghostscript’s gswin32c.exe on your system. pdfa.cmd assumes it is in C:\Program Files (x86)\gs\latest\bin\. If it is somewhere else, update line 66 of pdfa.cmd to point to the correct path.

6. Download the Adobe ICC profiles here. An ICC profile describes a “color space.” We’ll use the simplest one, Adobe RGB (1998). From the downloaded zip archive, extract AdobeRGB1998.icc to the C:\GS_PDFA folder. Again, this is the same file used in the CutePDF post so it’s okay to overwrite it. (You can use a different profile, e.g. sRGB_IEC61966-2-1_no black_scaling.icc from www.color.org; you’ll need to modify PDFA_def.ps accordingly.)

That’s it! You’re now ready to convert PDF files to PDF/A.

Use the Batch File

Since the batch file is in your path, you should be able to open a command prompt anywhere on your system, type pdfa <filename>, and watch it convert the file to PDF/A. Some notes and advanced usage:

Usage

pdfa file1 [file2^|-sb] [file3^|-sb] [file4^|-sb] [file5^|-sb]

Usage Examples

1. If you have a PDF utility bill, open a command prompt where the PDF file resides and use this command:

pdfa “Utility Bill”

Output

Utility Bill.pdf – the PDF/A document
Utility Bill.old.pdf – the original PDF document

2. If you have a credit card statement with two reconciliation reports to attach, use the following command:

pdfa CCstatement recon1 recon2

Output

CCstatement.pdf – the combined PDF/A document
CCstatement.old.pdf
recon1.old.pdf
recon2.old.pdf

3. If you have a tax return that includes bookmarks, use the following command:

pdfa “Tax Return” –sb

Output

Tax Return.pdf – the PDF/A document, should open with bookmarks panel in Adobe Reader
Tax Return.old.pdf

Add a File Explorer Context Menu

I use this so much that I needed a way to run the batch directly from File Explorer without having to open a command prompt. This turns out to be pretty simple to set up.

1. In File Explorer, go to %AppData%\Microsoft\Windows\SendTo.

Batch Convert PDFA 7

2. Add a shortcut to C:\GS_PDFA\pdfa.cmd. Name it “PDFA Batch File”. (While you’re here, you might want to remove Send To items that you’ll never use.)

3. Now, in File Explorer, Ctrl-click to select up to five PDF documents in the order in which you want to concatenate them. Right-click on the first one and choose Send to > PDFA Batch File:

Batch Convert PDFA 8

A command window will appear briefly as it converts the file(s), then the completed file will open in your default PDF viewer:

Batch Convert PDFA 9

Reference

A few notes for future reference:

Official document on creating PDF/A:
https://www.ghostscript.com/doc/current/Ps2pdf.htm#PDFA

Notes on parameters to use for creating PDF/A:
https://bugs.ghostscript.com/show_bug.cgi?id=699582#c2

Notes on why transparent fonts produce non-searchable PDFs:
https://bugs.ghostscript.com/show_bug.cgi?id=692773#c3

In general, the best way to see if a Ghostscript problem has been reported and solved is to search the bug tracker at https://bugs.ghostscript.com/query.cgi. Change the Status to All to see open and closed bugs. Then search for the string “PDFA” (no slash). You can sort the results by Change date to see the more recent issues.