Python中处理PDF文件可以通过多个库实现,以下是几个常用的库及其功能:
PyPDF2
用于读取和合并PDF文件。
安装:`pip install PyPDF2`
示例代码:
from PyPDF2 import PdfFileReader, PdfFileWriter读取PDF文件pdf_file = open('example.pdf', 'rb')pdf_reader = PdfFileReader(pdf_file)合并PDF文件merger = PdfFileWriter()for page_num in range(pdf_reader.numPages):page = pdf_reader.getPage(page_num)merger.addPage(page)with open('newfile.pdf', 'wb') as fout:merger.write(fout)
ReportLab
用于创建和编辑PDF文件。
示例代码:

from reportlab.pdfgen import canvas创建PDF文件pdf_file = canvas.Canvas('example.pdf')编辑PDF内容pdf_file.drawString(100, 750, 'Hello, World!')保存PDF文件pdf_file.save('example.pdf')
pdfminer3k
用于提取PDF中的文本内容。
安装:`pip install pdfminer3k`
示例代码:
from io import StringIOfrom pdfminer.converter import TextConverterfrom pdfminer.layout import LAParamsfrom pdfminer.pdfinterp import PDFResourceManager, process_pdfdef read_pdf(pdf):rsrcmgr = PDFResourceManager()retstr = StringIO()laparams = LAParams()device = TextConverter(rsrcmgr, retstr, laparams=laparams)process_pdf(rsrcmgr, device, pdf)device.close()content = retstr.getvalue()retstr.close()return content.split('\n')if __name__ == '__main__':pdf_content = read_pdf('example.pdf')for line in pdf_content:print(line)
