Concatenate/Combine PDF Files in Linux

I know there are lots of ways to do this. This is not a HOW-TO, but just sharing a script I made for doing this. It's a decent example of how to write a command-line utility in Python.

Since I write my cover letters and resumes in LaTeX I always need to concatenate the two together before sending it to an employer. This is very important so that the person on the other end prints out both and it makes their life easier by not having to worry about two files. For the longest time I just had a simple script that looked like this:

#Concatenate two PDF files and output to the third argument
echo "**** Converting PDF files to PostScript"
acroread -toPostScript -size letter -pairs $1 /tmp/$
acroread -toPostScript -size letter -pairs $2 /tmp/$
echo "**** Concatenating PostScript files"
a2ps -q --columns=1 -o /tmp/$ /tmp/$ /tmp/$
echo "**** Converting PostScript file to PDF"
ps2pdf /tmp/$ $3
echo "**** File saved to:" $3
echo "**** Cleaning up"
rm -f /tmp/$
rm -f /tmp/$
rm -f /tmp/$

I am just including that code because it shows clearly what is going on. The input PDF files are converted to Postscript files using acroread, the Postscript files are concatenated using a2ps, then the output Postscript file s converted to PDF using ps2pdf.

I decided to make it in to a full-blown Python script since I don't know bash/sh very well and I wanted it to be able to handle errors and an unlimited number of input files. There was a case in the past where I had to concatenate more than two input files, which involved running this script multiple times. I know there are many other simpler ways of doing this but I wanted to get more practice writing a unix command line utility in python using subprocess and optparse (and making the file importable so it can be used by other scripts, which many people forget to do). The script is a lot more complicated now but it's so much better functionality-wise. Enjoy!




pdfjoin does this out of the box as a single command line. It's included in the pdfjam package of Ubuntu and it's home page is here.

For example:

  pdfjoin fileA.pdf fileB.pdf --outfile fileC.pdf

Yeah, I realized this after I wrote the script. The advantage to the way I do it is that I know it works as I've used it for many years. It's important that it works because I use it for resumes and cover letters and I can't afford the PDF to not work on certain PDF viewers or for some strange artifacts to appear in my resume or cover letter. Joining my cover letter to my resume is something I do late at night and I'm not always keen on reading through both again after I've joined them! I'll give it a try though.

The best part of about pdfjoin and pdfpages is that it is easily installed/updateable with Ubuntu.

Interesting that the command-line syntax is identical.

I just tried pdfjoin on a bunch of PDFs and produced bad output. The first page worked, and the rest of the pages failed to open in Adobe Acrobat for Linux. So pdfjoin is officially out and my solution is starting to look a lot better.

To join

gs -q -sDEVICE={ps,pdf}write -sOutputFile=out.pdf -dBATCH -dNOPAUSE in0.{ps,pdf} in1.{ps,pdf} ...

To split

gs -sPAPERSIZE=a4 -sDEVICE={ps,pdf}write -sOutputFile='out%03d.{ps,pdf}' -dNOPAUSE in.{ps,pdf} -c quit

Just my .02$

Yes, I noticed that as well. :-) I can't remember if I ever tried that command or not.

I use pyPdf ( to do this. Pure python library, no need to execute external programs.

I know you began the post by saying you know there a lot of ways to do this, but I couldn't resist pointing out pdfpages just in case you don't already know about it.

It was meant to do what you're doing and it's just a latex package.

thanks for the post. Python always rocks.

I realize this discussion is solely about the use of python but those who are not inclined to do any sort of programming, in particular, on the Mac, I've written a small GUI to do batch concatenations of PDF with Batch PDF Merger for Mac

Thanks for posting...

Add new comment