HTML to PDF Conversion with Puppeteer Sharp

9 August 2025
This post thumbnail

I’ll be honest—I thought PDF generation would be straightforward until I tried it. After bouncing between libraries and wrestling with rendering issues, I finally discovered PuppeteerSharp. It’s like having Chrome do the heavy lifting while you sit back and watch.

Here’s everything I learned about using PuppeteerSharp for HTML to PDF conversion, including the gotchas I wish someone had warned me about.

Why PuppeteerSharp?

PuppeteerSharp is the .NET port of Google’s Puppeteer library. Think of it as Chrome automation—it spins up a headless browser and does exactly what a real browser would do. That means modern CSS (flexbox, grid, animations), JavaScript frameworks, custom fonts—everything just works.

The trade-off? It’s heavier on resources than other solutions. But when you need pixel-perfect PDFs that match what users see in their browser, nothing else comes close.

Installation and Setup

First, install the NuGet package:

dotnet add package PuppeteerSharp

When you first run PuppeteerSharp, it needs to download Chromium. This happens automatically, but you can trigger it manually:

using PuppeteerSharp;

// Downloads Chromium if not already present
await new BrowserFetcher().DownloadAsync();

Pro tip: In production environments, consider downloading Chromium during your build process rather than at runtime. It’s about 100MB, so you don’t want users waiting for that on first use.

Quick Start: URL to PDF

Let’s start simple. Here’s how to convert any webpage to PDF:

using PuppeteerSharp;

public async Task<byte[]> ConvertUrlToPdf(string url)
{
    // Ensure Chromium is available
    await new BrowserFetcher().DownloadAsync();
    
    // Launch headless browser
    using var browser = await Puppeteer.LaunchAsync(new LaunchOptions
    {
        Headless = true,
        Args = new[] { "--no-sandbox", "--disable-setuid-sandbox" } // Needed for some hosting environments
    });
    
    // Create new page
    using var page = await browser.NewPageAsync();
    
    // Navigate to URL
    await page.GoToAsync(url, WaitUntilNavigation.Networkidle0);
    
    // Generate PDF
    var pdfBytes = await page.PdfAsync(new PdfOptions
    {
        Format = PaperFormat.A4,
        PrintBackground = true,
        MarginOptions = new MarginOptions
        {
            Top = "1in",
            Bottom = "1in",
            Left = "0.5in",
            Right = "0.5in"
        }
    });
    
    return pdfBytes;
}

The WaitUntilNavigation.Networkidle0 is crucial—it waits until the network is idle for 500ms, ensuring images and async content have loaded.

Converting HTML Strings

Often you’ll want to convert HTML you’ve generated in memory rather than a live URL:

public async Task<byte[]> ConvertHtmlToPdf(string html)
{
    await new BrowserFetcher().DownloadAsync();
    
    using var browser = await Puppeteer.LaunchAsync(new LaunchOptions { Headless = true });
    using var page = await browser.NewPageAsync();
    
    // Set the HTML content
    await page.SetContentAsync(html);
    
    // Wait for any dynamic content (optional)
    await page.WaitForTimeoutAsync(1000);
    
    var pdfBytes = await page.PdfAsync(new PdfOptions
    {
        Format = PaperFormat.A4,
        PrintBackground = true,
        PreferCSSPageSize = true // Respects CSS @page rules
    });
    
    return pdfBytes;
}

Here’s a more complete example with styling:

var html = @"
<!DOCTYPE html>
<html>
<head>
    <meta charset='utf-8'>
    <style>
        body { 
            font-family: 'Arial', sans-serif; 
            margin: 0; 
            padding: 20px;
            color: #333;
        }
        .invoice-header { 
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            color: white;
            padding: 20px;
            margin: -20px -20px 20px -20px;
            border-radius: 8px;
        }
        .invoice-table { 
            width: 100%; 
            border-collapse: collapse; 
            margin-top: 20px;
        }
        .invoice-table th, .invoice-table td { 
            border: 1px solid #ddd; 
            padding: 12px; 
            text-align: left; 
        }
        .invoice-table th { 
            background-color: #f8f9fa; 
            font-weight: bold;
        }
        .total-row { 
            background-color: #e9ecef; 
            font-weight: bold; 
        }
    </style>
</head>
<body>
    <div class='invoice-header'>
        <h1>Invoice #INV-2025-001</h1>
        <p>Date: " + DateTime.Now.ToString("MMMM dd, yyyy") + @"</p>
    </div>
    
    <h2>Bill To:</h2>
    <p>Acme Corporation<br>
    123 Business St<br>
    Business City, BC 12345</p>
    
    <table class='invoice-table'>
        <thead>
            <tr>
                <th>Description</th>
                <th>Quantity</th>
                <th>Unit Price</th>
                <th>Total</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td>Website Development</td>
                <td>1</td>
                <td>$2,500.00</td>
                <td>$2,500.00</td>
            </tr>
            <tr>
                <td>Monthly Hosting</td>
                <td>12</td>
                <td>$25.00</td>
                <td>$300.00</td>
            </tr>
            <tr class='total-row'>
                <td colspan='3'>Total</td>
                <td>$2,800.00</td>
            </tr>
        </tbody>
    </table>
</body>
</html>";

var pdfBytes = await ConvertHtmlToPdf(html);
await File.WriteAllBytesAsync("invoice.pdf", pdfBytes);

Advanced PDF Options

PuppeteerSharp gives you fine-grained control over PDF generation:

var pdfOptions = new PdfOptions
{
    // Page format
    Format = PaperFormat.A4, // A0-A6, Letter, Legal, Tabloid, Ledger
    Width = "8.5in", // Custom width (overrides Format)
    Height = "11in", // Custom height (overrides Format)
    
    // Orientation
    Landscape = false,
    
    // Margins
    MarginOptions = new MarginOptions
    {
        Top = "1in",
        Bottom = "1in", 
        Left = "0.75in",
        Right = "0.75in"
    },
    
    // Content options
    PrintBackground = true, // Include background colors and images
    PreferCSSPageSize = true, // Use CSS @page rules
    
    // Page ranges
    PageRanges = "1-3", // Only generate specific pages
    
    // Scale
    Scale = 1.0, // 0.1 to 2.0
    
    // Display options
    DisplayHeaderFooter = true,
    HeaderTemplate = "<div style='font-size: 10px; margin: auto;'>Custom Header</div>",
    FooterTemplate = "<div style='font-size: 10px; margin: auto;'>Page <span class='pageNumber'></span> of <span class='totalPages'></span></div>"
};

Custom Headers and Footers

Headers and footers deserve special attention because they’re tricky to get right:

public async Task<byte[]> CreatePdfWithHeaderFooter(string html)
{
    await new BrowserFetcher().DownloadAsync();
    
    using var browser = await Puppeteer.LaunchAsync(new LaunchOptions { Headless = true });
    using var page = await browser.NewPageAsync();
    
    await page.SetContentAsync(html);
    
    var headerTemplate = @"
        <div style='font-size: 10px; padding: 5px; width: 100%; 
                    display: flex; justify-content: space-between; align-items: center;
                    border-bottom: 1px solid #ddd; color: #666;'>
            <span>Company Confidential</span>
            <span>Generated on " + DateTime.Now.ToString("MM/dd/yyyy") + @"</span>
        </div>";
    
    var footerTemplate = @"
        <div style='font-size: 10px; padding: 5px; width: 100%; 
                    display: flex; justify-content: center; align-items: center;
                    border-top: 1px solid #ddd; color: #666;'>
            <span>Page <span class='pageNumber'></span> of <span class='totalPages'></span></span>
        </div>";
    
    var pdfBytes = await page.PdfAsync(new PdfOptions
    {
        Format = PaperFormat.A4,
        DisplayHeaderFooter = true,
        HeaderTemplate = headerTemplate,
        FooterTemplate = footerTemplate,
        MarginOptions = new MarginOptions
        {
            Top = "0.8in", // Needs space for header
            Bottom = "0.8in", // Needs space for footer
            Left = "0.5in",
            Right = "0.5in"
        },
        PrintBackground = true
    });
    
    return pdfBytes;
}

Key things about headers and footers:

  • They only render if DisplayHeaderFooter = true
  • You need adequate top/bottom margins
  • Use class='pageNumber' and class='totalPages' for page numbers
  • All styling must be inline—external stylesheets won’t work

Handling JavaScript and Dynamic Content

One of PuppeteerSharp’s biggest advantages is handling JavaScript-rendered content:

public async Task<byte[]> ConvertSpaPageToPdf(string url)
{
    await new BrowserFetcher().DownloadAsync();
    
    using var browser = await Puppeteer.LaunchAsync(new LaunchOptions { Headless = true });
    using var page = await browser.NewPageAsync();
    
    // Navigate to the page
    await page.GoToAsync(url);
    
    // Wait for specific content to load
    await page.WaitForSelectorAsync("#dashboard-charts");
    
    // Or wait for a specific condition
    await page.WaitForFunctionAsync(@"
        () => document.querySelectorAll('.chart-container canvas').length >= 3
    ");
    
    // Optional: Wait a bit more for animations
    await page.WaitForTimeoutAsync(2000);
    
    var pdfBytes = await page.PdfAsync(new PdfOptions
    {
        Format = PaperFormat.A4,
        PrintBackground = true,
        WaitForFonts = true // Wait for web fonts to load
    });
    
    return pdfBytes;
}

You can also inject custom JavaScript before generating the PDF:

// Hide elements you don't want in the PDF
await page.EvaluateExpressionAsync(@"
    document.querySelectorAll('.no-print').forEach(el => el.style.display = 'none');
    document.querySelector('#sidebar').style.display = 'none';
");

// Trigger print-specific styling
await page.EmulateMediaTypeAsync(MediaType.Print);

var pdfBytes = await page.PdfAsync(new PdfOptions
{
    Format = PaperFormat.A4,
    PrintBackground = true
});

Performance Optimization

PuppeteerSharp can be resource-intensive. Here are the techniques I use to keep it performant:

Browser Reuse

Don’t create a new browser instance for every PDF:

public class PdfService : IDisposable
{
    private IBrowser _browser;
    private readonly SemaphoreSlim _semaphore = new(Environment.ProcessorCount);
    
    public async Task InitializeAsync()
    {
        await new BrowserFetcher().DownloadAsync();
        _browser = await Puppeteer.LaunchAsync(new LaunchOptions 
        { 
            Headless = true,
            Args = new[] 
            {
                "--no-sandbox",
                "--disable-setuid-sandbox",
                "--disable-dev-shm-usage", // Overcome limited resource problems
                "--disable-gpu",
                "--no-first-run",
                "--no-zygote",
                "--single-process" // Use only in Docker/containers
            }
        });
    }
    
    public async Task<byte[]> GeneratePdfAsync(string html)
    {
        await _semaphore.WaitAsync();
        
        try
        {
            using var page = await _browser.NewPageAsync();
            await page.SetContentAsync(html);
            return await page.PdfAsync(new PdfOptions { Format = PaperFormat.A4 });
        }
        finally
        {
            _semaphore.Release();
        }
    }
    
    public void Dispose()
    {
        _browser?.Dispose();
        _semaphore?.Dispose();
    }
}

Memory Management

// Set viewport to reduce memory usage
await page.SetViewportAsync(new ViewPortOptions
{
    Width = 1024,
    Height = 768
});

// Disable images if not needed
await page.SetRequestInterceptionAsync(true);
page.Request += async (sender, e) =>
{
    if (e.Request.ResourceType == ResourceType.Image)
        await e.Request.AbortAsync();
    else
        await e.Request.ContinueAsync();
};

Caching Strategies

For repeated conversions, consider caching:

public class CachedPdfService
{
    private readonly IMemoryCache _cache;
    private readonly PdfService _pdfService;
    
    public async Task<byte[]> GetOrCreatePdfAsync(string cacheKey, string html)
    {
        if (_cache.TryGetValue(cacheKey, out byte[] cachedPdf))
            return cachedPdf;
        
        var pdf = await _pdfService.GeneratePdfAsync(html);
        
        _cache.Set(cacheKey, pdf, TimeSpan.FromMinutes(30));
        
        return pdf;
    }
}

Troubleshooting Common Issues

Fonts Not Loading

Add a wait for fonts and ensure font paths are correct:

var pdfOptions = new PdfOptions
{
    Format = PaperFormat.A4,
    WaitForFonts = true,
    PrintBackground = true
};

For custom fonts, use absolute URLs or base64 encoding in your CSS.

Missing Images

Images with relative URLs often break. Use absolute URLs or base64:

await page.SetContentAsync(html, new NavigationOptions
{
    WaitUntil = new[] { WaitUntilNavigation.Networkidle0 }
});

Memory Issues in Production

Monitor memory usage and consider:

var launchOptions = new LaunchOptions
{
    Headless = true,
    Args = new[]
    {
        "--no-sandbox",
        "--disable-setuid-sandbox",
        "--disable-dev-shm-usage",
        "--disable-accelerated-2d-canvas",
        "--no-first-run",
        "--no-zygote",
        "--disable-gpu",
        "--memory-pressure-off"
    }
};

Page Breaking Issues

Use CSS to control page breaks:

@media print {
    .page-break {
        page-break-before: always;
    }
    
    .no-break {
        page-break-inside: avoid;
    }
    
    @page {
        margin: 1in;
        size: A4;
    }
}

Docker Deployment

Running PuppeteerSharp in Docker requires additional dependencies:

FROM mcr.microsoft.com/dotnet/aspnet:8.0 AS base
WORKDIR /app

# Install Chrome dependencies
RUN apt-get update && apt-get install -y \
    wget \
    gnupg \
    libnss3 \
    libatk-bridge2.0-0 \
    libdrm2 \
    libxkbcommon0 \
    libxcomposite1 \
    libxdamage1 \
    libxrandr2 \
    libgbm1 \
    libxss1 \
    libasound2 \
    libatspi2.0-0 \
    libgtk-3-0 \
    && rm -rf /var/lib/apt/lists/*

FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build
WORKDIR /src
COPY ["YourApp.csproj", "."]
RUN dotnet restore "YourApp.csproj"
COPY . .
RUN dotnet build "YourApp.csproj" -c Release -o /app/build

FROM build AS publish
RUN dotnet publish "YourApp.csproj" -c Release -o /app/publish

FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "YourApp.dll"]

When Not to Use PuppeteerSharp

PuppeteerSharp isn’t always the right choice:

  • High-volume conversions: The memory overhead adds up quickly
  • Simple HTML: If you’re just converting basic HTML without modern CSS or JavaScript, lighter alternatives like DinkToPdf might be better
  • Tight resource constraints: Each browser instance uses significant memory
  • Real-time generation: The startup time for each conversion can be noticeable

Wrapping Up

PuppeteerSharp has become my go-to solution when I need PDFs that match exactly what users see in their browsers. Yes, it’s heavier than other options, but when you’re dealing with complex layouts, modern CSS, or JavaScript-rendered content, it’s often the only thing that works reliably.

The key is understanding its resource requirements and planning accordingly. Reuse browser instances, monitor memory usage, and consider caching for repeated conversions.

For most business applications generating reports, invoices, or documentation, the trade-off is worth it. You get pixel-perfect results without having to wrestle with CSS compatibility issues or debug why your flexbox layout looks broken in the PDF.

{# Client-side syntax highlighting. The static build baked PrismJS token spans at build time; Ghost outputs plain
, so we re-add the spans here. main.css
       already styles .token.* so no Prism theme CSS is needed. #}