HTML to PDF in JavaScript with Puppeteer - Complete Guide
I’ll be honest - when I first tried to generate PDFs from HTML with Puppeteer, I thought it would be a five-minute task. Three hours later, I was still wrestling with page breaks, headers that wouldn’t align, and PDFs that looked nothing like what I saw in the browser.
Puppeteer is a powerful Javascript library that can be used to control headless browsers like Chrome or Firefox. And you can use those headless browsers to do pretty much anything that a browser can do, like converting HTML to PDF conversion.
And that’s exactly what we’re going to do with it in this tutorial. There’s a bit of a learning curve, but it really is the best solution to generating pixel perfect PDF files.
Why Puppeteer for HTML to PDF?
There are other options out there for converting HTML to PDF, and some are actually pretty good depending on your use case. I’ve covered the most popular (libraries for HTML to PDF conversion in Javascript)[/guides/html-to-pdf-javascript/] here.
The truth is, if you want pixel-perfect PDFs, your best bet is always going to be a headless browser. It’s the only option that will give you full CSS and modern Javascript support.
The trade-off offcourse is that you’re running a full browser “just” to print a PDF. It’s bound to be a resource hog and harder to scale.
Installation and Setup
With all that said, let’s get to installing and actually using Puppeteer. I’m going to assume you already have node and npm installed on your machine.
npm install puppeteer
For production, you might want to install puppeteer-core instead. This doesn’t install Chromium and you can configure in your Docker setup which version of Chromium you want to install:
npm install puppeteer-core
Quick Start: Your First PDF
Now, lets print our first PDF with a headless browser.
const puppeteer = require('puppeteer');
async function generateSimplePDF() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Basic HTML content
const htmlContent = `
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<style>
body { font-family: Arial, sans-serif; margin: 40px; }
h1 { color: #333; }
</style>
</head>
<body>
<h1>My First Puppeteer PDF</h1>
<p>This PDF was generated with Puppeteer!</p>
</body>
</html>
`;
await page.setContent(htmlContent);
await page.pdf({
path: 'simple-example.pdf',
format: 'A4',
});
await browser.close();
console.log('PDF generated successfully!');
}
generateSimplePDF();
And that’s it! You should now have a PDF file called simple-example.pdf in the folder where you ran the script from.
Converting from URL
In the first example we used an HTML string. But you can do the same thing with an URL. We just need to tell the browser to navigate to the page first.
async function convertUrlToPDF(url, outputPath) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Set viewport for consistent rendering
await page.setViewport({
width: 1280,
height: 720,
deviceScaleFactor: 1
});
try {
// Navigate + timeout
await page.goto(url, {
waitUntil: ['networkidle0', 'domcontentloaded'],
timeout: 30000
});
// Add an extra timeout to make sure any dynamic content finishes loading
await page.waitForTimeout(2000);
const pdf = await page.pdf({
path: outputPath,
format: 'A4',
});
console.log(`PDF saved to ${outputPath}`);
return pdf;
} catch (error) {
console.error('Error generating PDF from URL:', error);
throw error;
} finally {
await browser.close();
}
}
// Usage
convertUrlToPDF('https://transformy.io/guides/', 'transformy-guides.pdf');
The waitUntil options are crucial for making sure the entire page has finished loading. I usually use networkidle0 (waits until no network requests for 500ms) for JavaScript-heavy sites, and domcontentloaded for simpler pages.
Advanced Options and Configuration
Now let’s get into the configurations that separate amateur from professional PDF generation:
async function advancedPDFGeneration() {
const browser = await puppeteer.launch({
headless: true,
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-background-timer-throttling',
'--disable-backgrounding-occluded-windows',
'--disable-renderer-backgrounding'
]
});
const page = await browser.newPage();
// Set custom paper size
await page.setContent(htmlContent);
const pdfOptions = {
// Paper format
format: 'A4', // or 'Letter', 'Legal', 'Tabloid', 'Ledger'
// Or custom dimensions
// width: '8.5in',
// height: '11in',
// Margins
margin: {
top: '1in',
right: '0.5in',
bottom: '1in',
left: '0.5in'
},
// Print options
printBackground: true,
landscape: false,
// Page ranges (useful for large documents)
pageRanges: '1-5,8,11-13',
// Scale (0.1 to 2)
scale: 1,
// Prefer CSS page size
preferCSSPageSize: false,
// Tagged PDF for accessibility
tagged: true,
// Outline (PDF bookmarks)
outline: true
};
const pdf = await page.pdf(pdfOptions);
await browser.close();
return pdf;
}
CSS Print Media Queries
One thing that caught me off guard initially: Puppeteer uses print media queries by default. You can leverage this:
/* Screen styles */
.sidebar { display: block; }
/* Print styles - will be used in PDF */
@media print {
.sidebar { display: none; }
.page-break { page-break-before: always; }
.no-print { display: none; }
/* Force exact page sizes */
@page {
size: A4;
margin: 0.5in;
}
}
Headers and Footer Templates
This is where Puppeteer gets really powerful. You can add dynamic headers and footers:
async function pdfWithHeaderFooter() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setContent(htmlContent);
const pdf = await page.pdf({
format: 'A4',
printBackground: true,
margin: {
top: '100px', // Make room for header
bottom: '100px', // Make room for footer
left: '20mm',
right: '20mm'
},
// Header template
displayHeaderFooter: true,
headerTemplate: `
<div style="width: 100%; padding: 0 20mm; font-size: 10px; display: flex; justify-content: space-between; align-items: center; border-bottom: 1px solid #ccc;">
<div>My Company Report</div>
<div>Generated on <span class="date"></span></div>
</div>
`,
// Footer template
footerTemplate: `
<div style="width: 100%; padding: 0 20mm; font-size: 10px; display: flex; justify-content: space-between; align-items: center; border-top: 1px solid #ccc;">
<div>Confidential</div>
<div>Page <span class="pageNumber"></span> of <span class="totalPages"></span></div>
</div>
`
});
await browser.close();
return pdf;
}
Important notes about headers/footers:
- Use inline styles only (external CSS won’t work)
- Available variables:
.date,.title,.url,.pageNumber,.totalPages - Headers/footers are separate HTML documents, so they don’t inherit styles from your main content
- Make sure your top/bottom margins are large enough to accommodate them
More sophisticated header/footer example:
const headerTemplate = `
<div style="width: 100%; padding: 10px 20mm 0; font-family: Arial, sans-serif; font-size: 9px; color: #666;">
<table style="width: 100%; border-collapse: collapse;">
<tr>
<td style="text-align: left; width: 33%;">
<img src="data:image/png;base64,iVBORw0KG..." style="height: 20px;" />
</td>
<td style="text-align: center; width: 34%; font-weight: bold;">
INVOICE REPORT
</td>
<td style="text-align: right; width: 33%;">
<span class="date"></span>
</td>
</tr>
</table>
<hr style="margin: 8px 0; border: none; border-top: 1px solid #ddd;">
</div>
`;
Performance Considerations and Optimization
Here’s where my trial-and-error experience really pays off. These optimizations made the difference between a system that could barely handle 5 concurrent PDFs and one that processes hundreds:
1. Browser Instance Management
Don’t create a new browser for every PDF:
class PDFGenerator {
constructor() {
this.browser = null;
this.pagePool = [];
this.maxPages = 5; // Adjust based on your memory constraints
}
async initialize() {
this.browser = await puppeteer.launch({
headless: true,
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-background-timer-throttling',
'--disable-backgrounding-occluded-windows',
'--disable-renderer-backgrounding',
'--memory-pressure-off',
'--max_old_space_size=4096'
]
});
// Pre-create pages
for (let i = 0; i < this.maxPages; i++) {
const page = await this.browser.newPage();
this.pagePool.push(page);
}
}
async getPage() {
if (this.pagePool.length > 0) {
return this.pagePool.pop();
}
// If pool is empty, create a new page
return await this.browser.newPage();
}
async releasePage(page) {
// Clear the page content to free memory
await page.goto('about:blank');
if (this.pagePool.length < this.maxPages) {
this.pagePool.push(page);
} else {
await page.close();
}
}
async generatePDF(htmlContent, options = {}) {
const page = await this.getPage();
try {
await page.setContent(htmlContent, {
waitUntil: 'networkidle0',
timeout: 15000
});
const pdf = await page.pdf({
format: 'A4',
printBackground: true,
...options
});
return pdf;
} finally {
await this.releasePage(page);
}
}
async close() {
// Close all pages in pool
await Promise.all(this.pagePool.map(page => page.close()));
this.pagePool = [];
if (this.browser) {
await this.browser.close();
}
}
}
// Usage
const pdfGen = new PDFGenerator();
await pdfGen.initialize();
// Generate multiple PDFs efficiently
const pdf1 = await pdfGen.generatePDF(htmlContent1);
const pdf2 = await pdfGen.generatePDF(htmlContent2);
// Clean up when done
await pdfGen.close();
2. Memory Optimization Tips
async function memoryOptimizedPDF(htmlContent) {
const browser = await puppeteer.launch({
headless: true,
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-extensions',
'--disable-gpu',
'--disable-default-apps',
'--memory-pressure-off',
'--disable-background-timer-throttling',
'--disable-backgrounding-occluded-windows',
'--disable-renderer-backgrounding',
'--disable-features=TranslateUI',
'--disable-ipc-flooding-protection'
]
});
const page = await browser.newPage();
// Optimize page settings
await page.setViewport({
width: 1280,
height: 720,
deviceScaleFactor: 1
});
// Disable images and CSS if not needed for faster processing
// await page.setRequestInterception(true);
// page.on('request', (req) => {
// if(req.resourceType() == 'stylesheet' || req.resourceType() == 'image'){
// req.abort();
// } else {
// req.continue();
// }
// });
await page.setContent(htmlContent, {
waitUntil: 'domcontentloaded', // Faster than networkidle0 for static content
timeout: 10000
});
const pdf = await page.pdf({
format: 'A4',
printBackground: true,
// Omit the 'path' option to get buffer instead of writing to disk
});
await page.close();
await browser.close();
return pdf;
}
3. Error Handling and Timeouts
Robust error handling is crucial in production:
async function robustPDFGeneration(htmlContent, maxRetries = 3) {
let browser;
let attempt = 0;
while (attempt < maxRetries) {
try {
browser = await puppeteer.launch({
headless: true,
timeout: 30000,
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
const page = await browser.newPage();
// Set timeout for the page
page.setDefaultTimeout(20000);
page.setDefaultNavigationTimeout(20000);
await page.setContent(htmlContent, {
waitUntil: 'domcontentloaded',
timeout: 15000
});
const pdf = await page.pdf({
format: 'A4',
printBackground: true,
timeout: 30000
});
return pdf;
} catch (error) {
console.error(`PDF generation attempt ${attempt + 1} failed:`, error);
if (browser) {
await browser.close().catch(console.error);
}
attempt++;
if (attempt >= maxRetries) {
throw new Error(`PDF generation failed after ${maxRetries} attempts: ${error.message}`);
}
// Wait before retrying
await new Promise(resolve => setTimeout(resolve, 1000 * attempt));
}
}
}
Performance Benchmarks (from my experience)
Here’s what I’ve found in real-world usage:
- Simple HTML (1 page): ~200-500ms
- Complex HTML with CSS (5 pages): ~1-3 seconds
- JavaScript-heavy content: ~3-10 seconds
- Large documents (50+ pages): ~10-30 seconds
Memory usage scales with document complexity and concurrent operations. Plan for ~50-200MB per active browser instance.
Putting It All Together
Here’s a complete, production-ready example that incorporates everything we’ve covered:
const puppeteer = require('puppeteer');
const fs = require('fs').promises;
class ProductionPDFService {
constructor(options = {}) {
this.browser = null;
this.maxConcurrent = options.maxConcurrent || 3;
this.activeJobs = 0;
this.defaultOptions = {
format: 'A4',
printBackground: true,
margin: {
top: '20mm',
bottom: '20mm',
left: '15mm',
right: '15mm'
}
};
}
async initialize() {
if (!this.browser) {
this.browser = await puppeteer.launch({
headless: true,
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--memory-pressure-off'
]
});
}
}
async generatePDF(htmlContent, options = {}) {
// Wait if too many concurrent jobs
while (this.activeJobs >= this.maxConcurrent) {
await new Promise(resolve => setTimeout(resolve, 100));
}
this.activeJobs++;
try {
await this.initialize();
const page = await this.browser.newPage();
await page.setContent(htmlContent, {
waitUntil: 'networkidle0',
timeout: 15000
});
const pdfOptions = { ...this.defaultOptions, ...options };
const pdf = await page.pdf(pdfOptions);
await page.close();
return pdf;
} finally {
this.activeJobs--;
}
}
async close() {
if (this.browser) {
await this.browser.close();
this.browser = null;
}
}
}
// Usage example
const pdfService = new ProductionPDFService({ maxConcurrent: 5 });
// Generate a professional invoice
async function generateInvoice(invoiceData) {
const htmlTemplate = `
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<style>
body { font-family: Arial, sans-serif; margin: 0; }
.container { max-width: 800px; margin: 0 auto; padding: 40px; }
/* Add your styles here */
</style>
</head>
<body>
<div class="container">
<!-- Your invoice template here -->
</div>
</body>
</html>
`;
return await pdfService.generatePDF(htmlTemplate, {
headerTemplate: `<div style="font-size: 10px; padding: 5px;">Invoice #${invoiceData.number}</div>`,
footerTemplate: `<div style="font-size: 10px; padding: 5px; text-align: center;">Page <span class="pageNumber"></span></div>`,
displayHeaderFooter: true,
margin: { top: '60px', bottom: '40px', left: '20mm', right: '20mm' }
});
}
// Don't forget to close the service
process.on('SIGTERM', () => pdfService.close());
process.on('SIGINT', () => pdfService.close());
Final Thoughts
Puppeteer is the Swiss Army knife of HTML to PDF conversion. Yes, it’s resource-intensive, and yes, it has a learning curve. But once you understand how to use it properly, you can generate PDFs that look exactly like you want them to.
The key lessons I’ve learned:
- Always reuse browser instances in production
- Use proper error handling and timeouts
- Leverage CSS print media queries
- Test your PDFs with real data, not just “Hello World”
- Monitor memory usage in production
Start with the simple examples in this guide, then gradually add the advanced features as you need them. And remember - when in doubt, test it in an actual Chrome browser first. What works there will work in Puppeteer.
Got questions or run into issues? The Puppeteer community is pretty helpful, and the documentation is solid once you get the hang of it.