Convert HTML to PDF in Python

I've been experimenting quite a lot with HTML to PDF libraries at Transformy. Since we're a Python shop (Transformy was built with Django ❤️) I've mostly looked at HTML to PDF libraries built with Python. Or at the very least libraries that have a Python wrapper.
This might be interesting to a lot of people that want to implement HTML to PDF functionality into their products so I'm doing writeup of the main Python HTML to PDF tools out there, how to use them, and their pros and cons.
Spoiler: there's not really one solution that stands out, and it will mostly depend on your use case and needs which one you should go for.
tl;dr pdfkit and pyppeteer are the obvious choices
The obvious pick is pdfkit. It's a wrapper on top of wkhtmltopdf and is a good solution for a lost of use cases. However, it is quite old and not activelty maintained anymore, so it might come with some quirks.
If you need something really robust that can handle pages with complicated CSS or JavaScript heavy pages, you should probably go with pyppeteer. This one runs a headless browser to render the pages. It's very capable, but it's also a heavier solution (headless or not, you'll still be running a browser for every HTML to PDF conversion.)
pdfkit
pdfkit is a Python wrapper around wkhtmltopdf. I've written an in-depth tutorial on wkhtmltopdf before: it's a command line tool for HTML to PDF generation and it is pretty much the go-to tool for the job.
Getting pdfkit set up takes a bit of work. Since it's a wrapper around a command line tool, you'll first need to make sure that wkhmltopdf is installed on your system. You can read how you can do that for Windows/Ubuntu/macOS here. Once wkhtmltopdf is installed, you can install pdfkit with pip.
pip install pdfkit
Next, import the library in your project:
import pdfkit
Once installed, it's straightforward. You can pass a URL and convert it to a PDF:
pdfkit.from_url('https://transformy.io', 'transformy.pdf')
Or you can provide an html file to convert:
pdfkit.from_file('invoice.html', 'invoice.pdf')
Even a string works:
# Convert HTML string
html = '<h1>Test Document</h1><p>This is a test.</p>'
pdfkit.from_string(html, 'test.pdf')
What is also pretty nice: pdfkit allows you to pass any of the options/parameters that come with wkhtmltopdf out of the box. I mention this because there's other Python wrappers out there which haven't implemented all options, only the ones that whoever made the wrapper found useful. You can find a full list of the available options here.
options = {
'page-size': 'A4',
'margin-top': '0.75in',
'encoding': "UTF-8",
'no-outline': None
}
pdfkit.from_url('https://transformy.io', 'transformy.pdf', options=options)
The verdict
pdfkit/wkhtmltopdf is a solid solution. It might be the IBM of the HTML to PDF libraries, no one gets fired for chosing it! (Don't quote me on this.) It's fast, renders most websites well and comes with a lot of options for customization.
The downsides however are that it's not as easy as some of the other options to install. It's also not actively maintained anymore, nor is the web engine on top of which it relies (Qt Webkit.)
Another thing to watch out for: wkhtmltopdf comes with a LGPL3.0 license. Depending on where and how you want to integrate it, this comes with a few risks so be sure to look into that.
Pyppeteer
Pyppeteer is a headless chromium automation library. If you need to convert PDF files from modern, JavaScript heavy websites, it will be your best option.
Installation with pip is straightforward:
pip install pyppeteer
The library comes with a command to download and install chromium:
pyppeteer-install
Since this is running a full fledged browser behind the scenes, there's a lot that you can do with it. We're interested in generating PDF files specifically.
Start with importing the asyncio
library and pyppeteer:
import asyncio
from pyppeteer import launch
Let's first create a PDF form an URL:
async def generate_pdf():
browser = await launch()
page = await browser.newPage()
# Navigate to URL
await page.goto('https://transformy.io/guides')
# Generate PDF
await page.pdf({'path': 'transformy_guides.pdf', 'format': 'A4'})
You can also provide the HTML instead of making it navigate to an URL:
async def generate_pdf():
browser = await launch()
page = await browser.newPage()
await page.setContent('<h1>Hello Transformy</h1>') # bring your own HTML
await page.pdf({
'path': 'transformy.pdf',
'format': 'A4',
'printBackground': True,
'margin': {
'top': '1in',
}
})
await browser.close()
And then we run it:
asyncio.run(generate_pdf())
Pyppeteer is a really good option for more advanced use cases. It's the best option for websites built on top of Javascript frameworks.
It comes with a cost though: you're literally running Chrome under the hood. It's bound to be slower and more resource intensive than other options. And while installation is slightly easier than for wkhtmltopdf, it's still tricky to get it running in Chrome. The API is also async, which might add some complexity to your codebase if you're not currently using asyncio.
WeasyPrint
WeasyPrint is great because it's 100% a Python library and not a wrapper. For one, this makes installation (and deployment) much easier. Here's how to install it with pip:
pip install weasyprint
Here's how to generate your first PDF from an URL:
from weasyprint import HTML
HTML('https://transformy.io').write_pdf('output.pdf')
And here's how you can generate a PDF from an HTML string. A really cool feature is that you can add custom CSS before generating a PDF and do any styling customization you want.
html_content = """
<html>
<head><style>body { font-family: Arial; }</style></head>
<body><h1>Invoice #2025-123</h1><p>Total: $99.99</p></body>
</html>
"""
HTML(string=html_content).write_pdf('invoice.pdf')
# custom CSS styling
css = CSS(string='@page { size: A4; margin: 1cm; }')
HTML(string=html_content).write_pdf('styled_invoice.pdf', stylesheets=[css])
I really like the WeasyPrint API and it's an easy to use option. What I saw during testing though, is that it's definitely not the fastest library. Generating large PDFs in particular is quite slow and SPAs aren't it's strong suit either. For smaller tasks like generating PDFs for your invoices, it can be a good fit.
xhtml2pdf
Another library that is fully written in Python and has been around for a long time. Fully written in Python and really easy to install on all platforms.
xhtml2pdf uses ReportLab under the hood, which needs a rendering backend to work. The xhtml2pdf docs recommend using cairo for the rendering, which I needed to download and install separately. You can get it from the official website. Once you've installed cairo, you will also need to add the pycairo dependency with pip.
pip install xhtml2pdf
pip install xhtml2pdf[pycairo]
Here's how you convert a string of HTML into a PDF file with xhtml2pdf:
from xhtml2pdf import pisa
html = """
<html>
<head>
<style>
@page { size: A4; margin: 1cm; }
body { font-family: Helvetica; }
</style>
</head>
<body>
<h1>Report</h1>
<p>This is a simple PDF generated with xhtml2pdf.</p>
</body>
</html>
"""
# Create PDF
with open("htmlpage.pdf", "wb") as pdf_file:
pisa_status = pisa.CreatePDF(html, dest=pdf_file)
You can also use a file as input:
# From file
with open("transformy.html", "r") as source_html:
with open("transformy.pdf", "wb") as pdf_file:
pisa_status = pisa.CreatePDF(source_html.read(), dest=pdf_file)
And it even comes with a command-line interface:
xhtml2pdf "https://transformy.io/guides/" transformy_guides.pdf
This is another library that is a good pick for lightweight tasks. If you're using Django, it's really easy to integrate too: xhtml2pdf allows you to define a link_callback
function which you can pass to the pisa.CreatePDF
method so that it builds correct URLs for your generated PDFs. It also comes with some powerful tools that give you control over headers, footers and other elements you'd want to add to pages.
The main downsides however are CSS support which is pretty much limited to CSS 2.1 with only some of the functionalities of CSS 3.0 available (here's the list of supported CSS properties). And it comes with some constraints from ReportLab in regards to how images can be positioned.
FPDF2
FPDF2 is the odd one in this list. It's not an HTML to PDF converter, but more of a general library for PDF creation written in Python. It does come with some basic HTML rendering support though, so I think it's interesting to take a look at this one if you think you might need more than just conversion from HTML.
Here's how to install it:
pip install fpdf2
You can generate a PDF from HTML using the write_html
function:
from fpdf import FPDF, HTMLMixin
class MyPDF(FPDF, HTMLMixin):
pass
pdf = MyPDF()
pdf.add_page()
pdf.set_font("Arial", size=12)
# Add HTML
html = """
<h1>Transformy</h1>
<p>This is the <b>transformy blog</b></p>
"""
pdf.write_html(html)
pdf.output("transformy_blog.pdf")
FPDF2 also makes it easy to globally restyle HTML tags:
pdf = FPDF()
pdf.add_page()
pdf.write_html("""
# html goes here
""", tag_styles={
"h1": FontFace(color="#948b8b", size_pt=32),
"h2": FontFace(color="#948b8b", size_pt=24),
})
pdf.output("html_styled.pdf")
Or do even more with TextStyle
instead of FontFace
:
TextStyle(color="#ccc", font_style="I", t_margin=5, b_margin=5, l_margin=10),
It has a straightforward API which I enjoyed using and it's easy to use. Although I didn't create any benchmarks, from my limited tests it performed really well, in particular for custom PDF creation which comes with a lot of features.
HTML support however is limited to only a subset of HTML tags and the same goes for CSS. And it might go without saying, but, don't use this one for Javascript websites.
Conclusion
Here's a recap of all the libraries I tested and reviewed in this tutorial.
Again, there's no one size its all solution, but hopefully one of these tools works for your use case.
Library | Speed | Good use cases | Learn More |
---|---|---|---|
pdfkit | Fast | General purpose PDFs, business reports, invoices, static websites, documentation | pdfkit docs |
Pyppeteer | Slow | JavaScript-heavy sites, pages with modern or complex CSS | Pyppeteer docs |
WeasyPrint | Medium | Simple documents, custom styling, CSS customization needs | WeasyPrint docs |
xhtml2pdf | Fast | Lightweight PDFs | xhtml2pdf docs |
FPDF2 | Fast | Custom PDF creation with basic HTML | FPDF2 docs |
Don't hesitate to shoot me a message and tell me with which one you end up going, or let me know if I missed anything important.