python 将 pdf 转换为 html-凯发线上登陆下载网址

python 将 pdf 转换为 html

html 是用于网页的标准标记语言。将 pdf 文档转换为 html 格式可以方便将文档直接嵌入到网页中，使其可在 web 浏览器中轻松访问和查看，无需额外的软件或插件。本文将演示如何使用 spire.pdf for python 在 python 程序中将 pdf 转换为 html。

安装 spire.pdf for python

本教程需要用到 spire.pdf for python 和 plum-dispatch v1.7.4。可以通过以下 pip 命令将它们轻松安装到 vs code 中。

pip install spire.pdf

如果您不清楚如何安装，请参考：如何在 vs code 中安装 spire.pdf for python

用 python 将 pdf 文档转换为 html

使用 spire.pdf for python 提供的 pdfdocument.savetofile() 方法可以将 pdf 文档转换为 html 格式。以下是详细操作步骤：

创建 pdfdocument 类的对象。
使用 pdfdocument.loadfromfile() 方法加载 pdf 文档。
使用 pdfdocument.savetofile() 方法将文档保存为 html 格式。

python

from spire.pdf.common import *
from spire.pdf import *
# 创建pdfdocument类的对象
doc = pdfdocument()
# 载入pdf文档
doc.loadfromfile("示例.pdf")
# 将文档保存为pdf文件
doc.savetofile("output/pdf转html.html", fileformat.html)
doc.close()

python 将 pdf 转换为 html

将 pdf 转换为 html 并设置转换选项

pdfconvertoptions 类的 setpdftohtmloptions() 方法可以在转换 pdf 文件为 html 时设置转换选项。该方法接受以下参数：

useembeddedsvg (bool)：指示是否在生成的 html 文件中嵌入 svg。
useembeddedimg (bool)：指示是否在生成的 html 文件中嵌入图像。此选项仅适用于 useembeddedsvg 设置为 false 时。
maxpageonefile (bool)：指定每个 html 文件中包含的最大页面数。此选项仅适用于 useembeddedsvg 设置为 false 时。
usehighqualityembeddedsvg (bool)：指示是否在生成的 html 文件中使用高质量的嵌入 svg。此选项适用于 useembeddedsvg 设置为 true 时。

以下是将 pdf 转换为 html 时设置转换选项的操作步骤：

创建 pdfdocument 类的对象。
使用 pdfdocument.loadfromfile() 方法加载 pdf 文档。
使用 pdfdocument.convertoptions 属性获取 pdfconvertoptions 对象。
使用 pdfconvertoptions.setpdftohtmloptions() 方法指定 pdf 到 html 的转换选项。
使用 pdfdocument.savetofile() 方法将文档保存为 html 格式。

python

from spire.pdf.common import *
from spire.pdf import *
# 创建pdfdocument类的对象
doc = pdfdocument()
# 载入pdf文档
doc.loadfromfile("示例.pdf")
# 将转换选项设置为在html中嵌入图片并输出为单页html
pdftohtmloptions = doc.convertoptions
pdftohtmloptions.setpdftohtmloptions(false, true, 1, false)
# 将文档保存为html格式
doc.savetofile("output/pdf转html设置选项.html", fileformat.html)
doc.close()

用 python 将 pdf 文档转换为 html 流

除了将 pdf 文档转换为 html 文件外，还可以使用 pdfdocument.savetostream() 方法将其保存到 html 流中。具体步骤如下：

创建 pdfdocument 类的对象。
使用 pdfdocument.loadfromfile() 方法加载 pdf 文档。
创建 stream 类的对象。
使用 pdfdocument.savetostream() 方法将 pdf 文档保存到 html 流中。

python

from spire.pdf.common import *
from spire.pdf import *
# pdfdocument类的对象
doc = pdfdocument()
# 载入pdf文档
doc.loadfromfile("示例.pdf")
# 将文档保存到html流
filestream = stream("output/pdf转html流.html")
doc.savetostream(filestream, fileformat.html)
filestream.close()
doc.close()

申请临时 license

如果您希望删除结果文档中的评估消息，或者摆脱功能限制，请该email地址已收到反垃圾邮件插件保护。要显示它您需要在浏览器中启用javascript。获取有效期 30 天的临时许可证。

python 将 pdf 转换为 html-凯发线上登陆下载网址