Скрипт Python для удаления пустых страниц с помощью pyPDF

I am trying to write a couple of python scripts using pyPDF to split PDF pages into six separate pages, order them correctly (usually printed front and back, so every other page needs to have its subpages ordered differently), and remove resulting blank pages at the end of the output document.

I wrote the following script to cut the PDF pages up and reorder them. Cuts each page into two columns and each column into three pages. I am not very experienced with python, so please excuse anything I'm not doing correctly.

#!/usr/bin/env python
import copy, sys
from pyPdf import PdfFileWriter, PdfFileReader
input = PdfFileReader(sys.stdin)
output = PdfFileWriter()

for i in range(0,input.getNumPages(),2):
    p = input.getPage(i)
    q = copy.copy(p)
    r = copy.copy(p)
    s = copy.copy(p)
    t = copy.copy(p)
    u = copy.copy(p)
    (x, y) = p.mediaBox.lowerLeft
    (w, h) = p.mediaBox.upperRight

    p.mediaBox.lowerLeft = (x, 2 * h / 3)
    p.mediaBox.upperRight = (w / 2, h)

    q.mediaBox.lowerLeft = (w / 2, 2 * h / 3)
    q.mediaBox.upperRight = (w, h)

    r.mediaBox.lowerLeft = (x, h / 3)
    r.mediaBox.upperRight = (w / 2, 2 * h / 3)

    s.mediaBox.lowerLeft = (w / 2, h / 3)
    s.mediaBox.upperRight = (w, 2 * h / 3)

    t.mediaBox.lowerLeft = (x, y)
    t.mediaBox.upperRight = (w / 2, h / 3)

    u.mediaBox.lowerLeft = (w / 2, y)
    u.mediaBox.upperRight = (w, h / 3)

    a = input.getPage(i+1)
    b = copy.copy(a)
    c = copy.copy(a)
    d = copy.copy(a)
    e = copy.copy(a)
    f = copy.copy(a)
    (x, y) = a.mediaBox.lowerLeft
    (w, h) = a.mediaBox.upperRight

    a.mediaBox.lowerLeft = (x, 2 * h / 3)
    a.mediaBox.upperRight = (w / 2, h)

    b.mediaBox.lowerLeft = (w / 2, 2 * h / 3)
    b.mediaBox.upperRight = (w, h)

    c.mediaBox.lowerLeft = (x, h / 3)
    c.mediaBox.upperRight = (w / 2, 2 * h / 3)

    d.mediaBox.lowerLeft = (w / 2, h / 3)
    d.mediaBox.upperRight = (w, 2 * h / 3)

    e.mediaBox.lowerLeft = (x, y)
    e.mediaBox.upperRight = (w / 2, h / 3)

    f.mediaBox.lowerLeft = (w / 2, y)
    f.mediaBox.upperRight = (w, h / 3)

    output.addPage(p)
    output.addPage(b)
    output.addPage(q)
    output.addPage(a)
    output.addPage(r)
    output.addPage(d)
    output.addPage(s)
    output.addPage(c)
    output.addPage(t)
    output.addPage(f)
    output.addPage(u)
    output.addPage(e)

output.write(sys.stdout)

Then I use the following script to remove the blank pages.

#!/usr/bin/env python
import copy, sys
from pyPdf import PdfFileWriter, PdfFileReader
input = PdfFileReader(sys.stdin)
output = PdfFileWriter()

for i in range(0,input.getNumPages()):
    p = input.getPage(i)

    text = p.extractText()

    if (len(text) > 10):
        output.addPage(p)

output.write(sys.stdout)

The problem seems to be that while the pages are visibly cropped down, the text draw commands are still there. None of these pages are scanned, so if they are blank, they are really blank. Does anyone have any thoughts on something I could do differently or possibly an entirely different approach to take to remove the blank pages? I would really appreciate any help.

7 голосов | спросил rpeck1682 10 J0000006Europe/Moscow 2011, 21:53:36

0 ответов


Похожие вопросы

Популярные теги

security × 330linux × 316macos × 2827 × 268performance × 244command-line × 241sql-server × 235joomla-3.x × 222java × 189c++ × 186windows × 180cisco × 168bash × 158c# × 142gmail × 139arduino-uno × 139javascript × 134ssh × 133seo × 132mysql × 132