比较简单的方法是 PyPDF2 。 网上有很多相关教程。

但是 PyPDF2 并没有对文件大小进行优化/压缩。

所以需要一种压缩 PDF 的方法(如果你以后用 Latex 编译 PDF, 也可以用这个 对 PDF 文件压缩,不然 Latex 的原生文件非常大)。

http://blog.sciencenet.cn/blog-467089-773990.htm

终于用latex写完博士论文第一稿了,编译后发现足足有94.1MB!这么大,怎么发给各位老师修改,免不了要压缩一下。稍加搜索,就找到了linux里面的ghostscript工具,可以实现pdf文件的压缩。

ghostscript -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf

压缩后,图片分辨率明显变低了,有点看不清楚。如果需要清楚些,修改一个参数即可:-dPDFSETTINGS=/printer,修改后的文件大小为24MB。还可以使用其他命令:

https://www.ghostscript.com/doc/current/VectorDevices.htm

-dPDFSETTINGS=configuration

Presets the “distiller parameters” to one of four predefined settings: /screen selects low-resolution output similar to the Acrobat Distiller “Screen Optimized” setting.

/ebook selects medium-resolution output similar to the Acrobat Distiller “eBook” setting.

/printer selects output similar to the Acrobat Distiller “Print Optimized” setting.

/prepress selects output similar to Acrobat Distiller “Prepress Optimized” setting.

/default selects output intended to be useful across a wide variety of uses, possibly at the expense of a larger output file.


关于 Ghostscript 的提示:

http://milan.kupcevic.net/ghostscript-ps-pdf/

PDF Creation and Manipulation

Basic Usage

Convert PostScript to PDF:

gs -q -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=fileout.pdf filein.ps

Merge/combine PDF and/or PostScript files:

gs -q -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=fileout.pdf filein.ps filein2.pdf

Extract a page from a PostScript or a PDF document:

gs -q -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -dFirstPage=3 -dLastPage=3 -sOutputFile=fileout.pdf filein.ps

Additional Options

PDF optimization level selection options

-dPDFSETTINGS=/screen   (screen-view-only quality, 72 dpi images)
-dPDFSETTINGS=/ebook    (low quality, 150 dpi images)
-dPDFSETTINGS=/printer  (high quality, 300 dpi images)
-dPDFSETTINGS=/prepress (high quality, color preserving, 300 dpi imgs)
-dPDFSETTINGS=/default  (almost identical to /screen)

Paper size selection options

-sPAPERSIZE=letter
-sPAPERSIZE=a4
-dDEVICEWIDTHPOINTS=w -dDEVICEHEIGHTPOINTS=h (point=1/72 of an inch)
-dFIXEDMEDIA (force paper size over the PostScript defined size)

Other options

-dEmbedAllFonts=true
-dSubsetFonts=false
-dFirstPage=pagenumber
-dLastPage=pagenumber
-dAutoRotatePages=/PageByPage
-dAutoRotatePages=/All
-dAutoRotatePages=/None
-r1200 (resolution for pattern fills and fonts converted to bitmaps)
-sPDFPassword=password

Embedding PDFmarks

PDFmarks Create a file named “pdfmarks” with this content:

[ /Title (Document title)
  /Author (Author name)
  /Subject (Subject description)
  /Keywords (comma, separated, keywords)
  /ModDate (D:20061204092842)
  /CreationDate (D:20061204092842)
  /Creator (application name or creator note)
  /Producer (PDF producer name or note)
  /DOCINFO pdfmark

then combine the file with a PostScript or a PDF file

gs -q -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=withmarks.pdf \
    nomarks.ps pdfmarks

You can also add a couple of named destinations to the “pdfmarks” file

[ /Dest /NamedDest1 /Page 1 /View [/XYZ 20 620 1.8] /DEST pdfmark
[ /Dest /NamedDest2 /Page 2 /View [/FitH 15] /DEST pdfmark

or a few bookmarks

[/Count -2 /Dest /NamedDest1 /Title (Preface) /OUT pdfmark
[ /Action /GoTo /Dest /NamedDest1 /Title (Audience) /OUT pdfmark
[ /Action /GoTo /Dest /NamedDest2 /Title (Content) /OUT pdfmark
[/Count 3 /Page 2 /View [/XYZ 10 160 1.0] /Title (Part 1) /OUT pdfmark
[ /Page 2 /View [/XYZ 10 160 1.0] /Title (A first one) /OUT pdfmark
[ /Page 3 /View [/XYZ 0 500 NULL] /Title (The second one) /OUT pdfmark
[ /Page 6 /View [/FitH 220] /Title (The third thing) /OUT pdfmark
[ /PageMode /UseOutlines /DOCVIEW pdfmark

For more information about pdfmarks see pdfmark Reference Manual.