HTML to PDF in JavaScript with Puppeteer - Complete Guide

9 August 2025
This post thumbnail

I’ll be honest - when I first tried to generate PDFs from HTML with Puppeteer, I thought it would be a five-minute task. Three hours later, I was still wrestling with page breaks, headers that wouldn’t align, and PDFs that looked nothing like what I saw in the browser.

Puppeteer is a powerful Javascript library that can be used to control headless browsers like Chrome or Firefox. And you can use those headless browsers to do pretty much anything that a browser can do, like converting HTML to PDF conversion.

And that’s exactly what we’re going to do with it in this tutorial. There’s a bit of a learning curve, but it really is the best solution to generating pixel perfect PDF files.

Why Puppeteer for HTML to PDF?

There are other options out there for converting HTML to PDF, and some are actually pretty good depending on your use case. I’ve covered the most popular (libraries for HTML to PDF conversion in Javascript)[/guides/html-to-pdf-javascript/] here.

The truth is, if you want pixel-perfect PDFs, your best bet is always going to be a headless browser. It’s the only option that will give you full CSS and modern Javascript support.

The trade-off offcourse is that you’re running a full browser “just” to print a PDF. It’s bound to be a resource hog and harder to scale.

Installation and Setup

With all that said, let’s get to installing and actually using Puppeteer. I’m going to assume you already have node and npm installed on your machine.

npm install puppeteer

For production, you might want to install puppeteer-core instead. This doesn’t install Chromium and you can configure in your Docker setup which version of Chromium you want to install:

npm install puppeteer-core

Quick Start: Your First PDF

Now, lets print our first PDF with a headless browser.

const puppeteer = require('puppeteer');

async function generateSimplePDF() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  
  // Basic HTML content
  const htmlContent = `
    <!DOCTYPE html>
    <html>
    <head>
      <meta charset="UTF-8">
      <style>
        body { font-family: Arial, sans-serif; margin: 40px; }
        h1 { color: #333; }
      </style>
    </head>
    <body>
      <h1>My First Puppeteer PDF</h1>
      <p>This PDF was generated with Puppeteer!</p>
    </body>
    </html>
  `;
  
  await page.setContent(htmlContent);
  
  await page.pdf({
    path: 'simple-example.pdf',
    format: 'A4',
  });
  
  await browser.close();
  console.log('PDF generated successfully!');
}

generateSimplePDF();

And that’s it! You should now have a PDF file called simple-example.pdf in the folder where you ran the script from.

Converting from URL

In the first example we used an HTML string. But you can do the same thing with an URL. We just need to tell the browser to navigate to the page first.

async function convertUrlToPDF(url, outputPath) {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  
  // Set viewport for consistent rendering
  await page.setViewport({
    width: 1280,
    height: 720,
    deviceScaleFactor: 1
  });
  
  try {
    // Navigate + timeout
    await page.goto(url, {
      waitUntil: ['networkidle0', 'domcontentloaded'],
      timeout: 30000
    });
    
    // Add an extra timeout to make sure any dynamic content finishes loading
    await page.waitForTimeout(2000);
    
    const pdf = await page.pdf({
      path: outputPath,
      format: 'A4',
    });
    
    console.log(`PDF saved to ${outputPath}`);
    return pdf;
    
  } catch (error) {
    console.error('Error generating PDF from URL:', error);
    throw error;
  } finally {
    await browser.close();
  }
}

// Usage
convertUrlToPDF('https://transformy.io/guides/', 'transformy-guides.pdf');

The waitUntil options are crucial for making sure the entire page has finished loading. I usually use networkidle0 (waits until no network requests for 500ms) for JavaScript-heavy sites, and domcontentloaded for simpler pages.

Advanced Options and Configuration

Now let’s get into the configurations that separate amateur from professional PDF generation:

async function advancedPDFGeneration() {
  const browser = await puppeteer.launch({
    headless: true,
    args: [
      '--no-sandbox',
      '--disable-setuid-sandbox',
      '--disable-dev-shm-usage',
      '--disable-background-timer-throttling',
      '--disable-backgrounding-occluded-windows',
      '--disable-renderer-backgrounding'
    ]
  });
  
  const page = await browser.newPage();
  
  // Set custom paper size
  await page.setContent(htmlContent);
  
  const pdfOptions = {
    // Paper format
    format: 'A4', // or 'Letter', 'Legal', 'Tabloid', 'Ledger'
    
    // Or custom dimensions
    // width: '8.5in',
    // height: '11in',
    
    // Margins
    margin: {
      top: '1in',
      right: '0.5in',
      bottom: '1in',
      left: '0.5in'
    },
    
    // Print options
    printBackground: true,
    landscape: false,
    
    // Page ranges (useful for large documents)
    pageRanges: '1-5,8,11-13',
    
    // Scale (0.1 to 2)
    scale: 1,
    
    // Prefer CSS page size
    preferCSSPageSize: false,
    
    // Tagged PDF for accessibility
    tagged: true,
    
    // Outline (PDF bookmarks)
    outline: true
  };
  
  const pdf = await page.pdf(pdfOptions);
  
  await browser.close();
  return pdf;
}

CSS Print Media Queries

One thing that caught me off guard initially: Puppeteer uses print media queries by default. You can leverage this:

/* Screen styles */
.sidebar { display: block; }

/* Print styles - will be used in PDF */
@media print {
  .sidebar { display: none; }
  .page-break { page-break-before: always; }
  .no-print { display: none; }
  
  /* Force exact page sizes */
  @page {
    size: A4;
    margin: 0.5in;
  }
}

This is where Puppeteer gets really powerful. You can add dynamic headers and footers:

async function pdfWithHeaderFooter() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  
  await page.setContent(htmlContent);
  
  const pdf = await page.pdf({
    format: 'A4',
    printBackground: true,
    margin: {
      top: '100px',    // Make room for header
      bottom: '100px', // Make room for footer
      left: '20mm',
      right: '20mm'
    },
    
    // Header template
    displayHeaderFooter: true,
    headerTemplate: `
      <div style="width: 100%; padding: 0 20mm; font-size: 10px; display: flex; justify-content: space-between; align-items: center; border-bottom: 1px solid #ccc;">
        <div>My Company Report</div>
        <div>Generated on <span class="date"></span></div>
      </div>
    `,
    
    // Footer template
    footerTemplate: `
      <div style="width: 100%; padding: 0 20mm; font-size: 10px; display: flex; justify-content: space-between; align-items: center; border-top: 1px solid #ccc;">
        <div>Confidential</div>
        <div>Page <span class="pageNumber"></span> of <span class="totalPages"></span></div>
      </div>
    `
  });
  
  await browser.close();
  return pdf;
}

Important notes about headers/footers:

  • Use inline styles only (external CSS won’t work)
  • Available variables: .date, .title, .url, .pageNumber, .totalPages
  • Headers/footers are separate HTML documents, so they don’t inherit styles from your main content
  • Make sure your top/bottom margins are large enough to accommodate them

More sophisticated header/footer example:

const headerTemplate = `
<div style="width: 100%; padding: 10px 20mm 0; font-family: Arial, sans-serif; font-size: 9px; color: #666;">
  <table style="width: 100%; border-collapse: collapse;">
    <tr>
      <td style="text-align: left; width: 33%;">
        <img src="data:image/png;base64,iVBORw0KG..." style="height: 20px;" />
      </td>
      <td style="text-align: center; width: 34%; font-weight: bold;">
        INVOICE REPORT
      </td>
      <td style="text-align: right; width: 33%;">
        <span class="date"></span>
      </td>
    </tr>
  </table>
  <hr style="margin: 8px 0; border: none; border-top: 1px solid #ddd;">
</div>
`;

Performance Considerations and Optimization

Here’s where my trial-and-error experience really pays off. These optimizations made the difference between a system that could barely handle 5 concurrent PDFs and one that processes hundreds:

1. Browser Instance Management

Don’t create a new browser for every PDF:

class PDFGenerator {
  constructor() {
    this.browser = null;
    this.pagePool = [];
    this.maxPages = 5; // Adjust based on your memory constraints
  }
  
  async initialize() {
    this.browser = await puppeteer.launch({
      headless: true,
      args: [
        '--no-sandbox',
        '--disable-setuid-sandbox',
        '--disable-dev-shm-usage',
        '--disable-background-timer-throttling',
        '--disable-backgrounding-occluded-windows',
        '--disable-renderer-backgrounding',
        '--memory-pressure-off',
        '--max_old_space_size=4096'
      ]
    });
    
    // Pre-create pages
    for (let i = 0; i < this.maxPages; i++) {
      const page = await this.browser.newPage();
      this.pagePool.push(page);
    }
  }
  
  async getPage() {
    if (this.pagePool.length > 0) {
      return this.pagePool.pop();
    }
    
    // If pool is empty, create a new page
    return await this.browser.newPage();
  }
  
  async releasePage(page) {
    // Clear the page content to free memory
    await page.goto('about:blank');
    
    if (this.pagePool.length < this.maxPages) {
      this.pagePool.push(page);
    } else {
      await page.close();
    }
  }
  
  async generatePDF(htmlContent, options = {}) {
    const page = await this.getPage();
    
    try {
      await page.setContent(htmlContent, {
        waitUntil: 'networkidle0',
        timeout: 15000
      });
      
      const pdf = await page.pdf({
        format: 'A4',
        printBackground: true,
        ...options
      });
      
      return pdf;
    } finally {
      await this.releasePage(page);
    }
  }
  
  async close() {
    // Close all pages in pool
    await Promise.all(this.pagePool.map(page => page.close()));
    this.pagePool = [];
    
    if (this.browser) {
      await this.browser.close();
    }
  }
}

// Usage
const pdfGen = new PDFGenerator();
await pdfGen.initialize();

// Generate multiple PDFs efficiently
const pdf1 = await pdfGen.generatePDF(htmlContent1);
const pdf2 = await pdfGen.generatePDF(htmlContent2);

// Clean up when done
await pdfGen.close();

2. Memory Optimization Tips

async function memoryOptimizedPDF(htmlContent) {
  const browser = await puppeteer.launch({
    headless: true,
    args: [
      '--no-sandbox',
      '--disable-setuid-sandbox',
      '--disable-dev-shm-usage',
      '--disable-extensions',
      '--disable-gpu',
      '--disable-default-apps',
      '--memory-pressure-off',
      '--disable-background-timer-throttling',
      '--disable-backgrounding-occluded-windows',
      '--disable-renderer-backgrounding',
      '--disable-features=TranslateUI',
      '--disable-ipc-flooding-protection'
    ]
  });
  
  const page = await browser.newPage();
  
  // Optimize page settings
  await page.setViewport({
    width: 1280,
    height: 720,
    deviceScaleFactor: 1
  });
  
  // Disable images and CSS if not needed for faster processing
  // await page.setRequestInterception(true);
  // page.on('request', (req) => {
  //   if(req.resourceType() == 'stylesheet' || req.resourceType() == 'image'){
  //     req.abort();
  //   } else {
  //     req.continue();
  //   }
  // });
  
  await page.setContent(htmlContent, {
    waitUntil: 'domcontentloaded', // Faster than networkidle0 for static content
    timeout: 10000
  });
  
  const pdf = await page.pdf({
    format: 'A4',
    printBackground: true,
    // Omit the 'path' option to get buffer instead of writing to disk
  });
  
  await page.close();
  await browser.close();
  
  return pdf;
}

3. Error Handling and Timeouts

Robust error handling is crucial in production:

async function robustPDFGeneration(htmlContent, maxRetries = 3) {
  let browser;
  let attempt = 0;
  
  while (attempt < maxRetries) {
    try {
      browser = await puppeteer.launch({
        headless: true,
        timeout: 30000,
        args: ['--no-sandbox', '--disable-setuid-sandbox']
      });
      
      const page = await browser.newPage();
      
      // Set timeout for the page
      page.setDefaultTimeout(20000);
      page.setDefaultNavigationTimeout(20000);
      
      await page.setContent(htmlContent, {
        waitUntil: 'domcontentloaded',
        timeout: 15000
      });
      
      const pdf = await page.pdf({
        format: 'A4',
        printBackground: true,
        timeout: 30000
      });
      
      return pdf;
      
    } catch (error) {
      console.error(`PDF generation attempt ${attempt + 1} failed:`, error);
      
      if (browser) {
        await browser.close().catch(console.error);
      }
      
      attempt++;
      
      if (attempt >= maxRetries) {
        throw new Error(`PDF generation failed after ${maxRetries} attempts: ${error.message}`);
      }
      
      // Wait before retrying
      await new Promise(resolve => setTimeout(resolve, 1000 * attempt));
    }
  }
}

Performance Benchmarks (from my experience)

Here’s what I’ve found in real-world usage:

  • Simple HTML (1 page): ~200-500ms
  • Complex HTML with CSS (5 pages): ~1-3 seconds
  • JavaScript-heavy content: ~3-10 seconds
  • Large documents (50+ pages): ~10-30 seconds

Memory usage scales with document complexity and concurrent operations. Plan for ~50-200MB per active browser instance.

Putting It All Together

Here’s a complete, production-ready example that incorporates everything we’ve covered:

const puppeteer = require('puppeteer');
const fs = require('fs').promises;

class ProductionPDFService {
  constructor(options = {}) {
    this.browser = null;
    this.maxConcurrent = options.maxConcurrent || 3;
    this.activeJobs = 0;
    this.defaultOptions = {
      format: 'A4',
      printBackground: true,
      margin: {
        top: '20mm',
        bottom: '20mm',
        left: '15mm',
        right: '15mm'
      }
    };
  }
  
  async initialize() {
    if (!this.browser) {
      this.browser = await puppeteer.launch({
        headless: true,
        args: [
          '--no-sandbox',
          '--disable-setuid-sandbox',
          '--disable-dev-shm-usage',
          '--memory-pressure-off'
        ]
      });
    }
  }
  
  async generatePDF(htmlContent, options = {}) {
    // Wait if too many concurrent jobs
    while (this.activeJobs >= this.maxConcurrent) {
      await new Promise(resolve => setTimeout(resolve, 100));
    }
    
    this.activeJobs++;
    
    try {
      await this.initialize();
      
      const page = await this.browser.newPage();
      
      await page.setContent(htmlContent, {
        waitUntil: 'networkidle0',
        timeout: 15000
      });
      
      const pdfOptions = { ...this.defaultOptions, ...options };
      const pdf = await page.pdf(pdfOptions);
      
      await page.close();
      return pdf;
      
    } finally {
      this.activeJobs--;
    }
  }
  
  async close() {
    if (this.browser) {
      await this.browser.close();
      this.browser = null;
    }
  }
}

// Usage example
const pdfService = new ProductionPDFService({ maxConcurrent: 5 });

// Generate a professional invoice
async function generateInvoice(invoiceData) {
  const htmlTemplate = `
    <!DOCTYPE html>
    <html>
    <head>
      <meta charset="UTF-8">
      <style>
        body { font-family: Arial, sans-serif; margin: 0; }
        .container { max-width: 800px; margin: 0 auto; padding: 40px; }
        /* Add your styles here */
      </style>
    </head>
    <body>
      <div class="container">
        <!-- Your invoice template here -->
      </div>
    </body>
    </html>
  `;
  
  return await pdfService.generatePDF(htmlTemplate, {
    headerTemplate: `<div style="font-size: 10px; padding: 5px;">Invoice #${invoiceData.number}</div>`,
    footerTemplate: `<div style="font-size: 10px; padding: 5px; text-align: center;">Page <span class="pageNumber"></span></div>`,
    displayHeaderFooter: true,
    margin: { top: '60px', bottom: '40px', left: '20mm', right: '20mm' }
  });
}

// Don't forget to close the service
process.on('SIGTERM', () => pdfService.close());
process.on('SIGINT', () => pdfService.close());

Final Thoughts

Puppeteer is the Swiss Army knife of HTML to PDF conversion. Yes, it’s resource-intensive, and yes, it has a learning curve. But once you understand how to use it properly, you can generate PDFs that look exactly like you want them to.

The key lessons I’ve learned:

  • Always reuse browser instances in production
  • Use proper error handling and timeouts
  • Leverage CSS print media queries
  • Test your PDFs with real data, not just “Hello World”
  • Monitor memory usage in production

Start with the simple examples in this guide, then gradually add the advanced features as you need them. And remember - when in doubt, test it in an actual Chrome browser first. What works there will work in Puppeteer.

Got questions or run into issues? The Puppeteer community is pretty helpful, and the documentation is solid once you get the hang of it.

{# Client-side syntax highlighting. The static build baked PrismJS token spans at build time; Ghost outputs plain
, so we re-add the spans here. main.css
       already styles .token.* so no Prism theme CSS is needed. #}