HTML to PDF in Java: A Practical Guide for 2026
You already have the HTML. The Java service is in place. The product team assumes PDF output is a solved problem.
Then the significant issues show up.
Your invoice breaks a table across two pages. The exported report drops a custom font. A dashboard rendered by client-side code turns into a blank page. The PDF looks fine in one sample file and falls apart in the first real customer document. That's the normal starting point for HTML to PDF in Java.
The hard part isn't getting a PDF. It's getting a PDF that survives production input, pagination, branding, and modern frontend behavior. Some Java approaches stay fully inside the JVM and keep deployment simple. Others lean on a browser engine because that's the only practical way to render JavaScript-heavy pages. And sometimes the right move is to stop owning the renderer at all.
Table of Contents
- Starting Your Java HTML to PDF Journey
- Comparing Java PDF Generation Methods
- Pure Java Rendering with OpenHTMLToPDF
- High-Fidelity Conversion with iText and pdfHTML
- Harnessing Headless Browsers with Selenium
- Advanced PDF Control and Performance Tuning
- When to Use a Dedicated API Integrating Transformy.io
Starting Your Java HTML to PDF Journey
Many organizations land here for the same reason. They need invoices, statements, tickets, labels, or internal reports, and HTML is already the easiest format to produce from templates. Converting that HTML into PDF sounds straightforward until fidelity, pagination, and frontend rendering get involved.
The first decision usually has nothing to do with PDF itself. It starts with the source document. If your HTML is mostly server-rendered content with predictable CSS, a JVM-native path can work well. If the page depends on client-side rendering, flex layouts, grid layouts, or asynchronous data, the choice changes quickly.
A useful framing comes from a broader comparison of conversion methods: Java teams have to choose based on whether the source relies on client-side rendering, flex or grid, or async data, because native engines are fast but aren't full browsers like Chrome, which makes browser-based conversion necessary when JavaScript support matters, as noted in this comparison of HTML to PDF conversion methods.
That's why “best library” is the wrong question. The practical question is this:
- Static document input: Are you converting controlled HTML from your own templates?
- Dynamic application views: Does the page only become complete after scripts run?
- Operational limits: Can you ship browser binaries, or must everything stay pure Java?
- Document behavior: Do you need clean page breaks, embedded fonts, and repeatable output?
Practical rule: Pick the rendering model that matches how your HTML becomes complete. Don't try to force a non-browser renderer to act like a browser.
There's another step many teams skip. Before you render anything, make the source HTML readable and stable. If the markup is hard to inspect, layout bugs take longer to isolate. A quick formatter helps when you're debugging nested tables, long inline styles, or template output. This guide on mastering readable HTML is useful for that exact cleanup pass.
If you're building a broader document pipeline and want to see how related transformation guides are organized, the Transformy guide index is a practical directory of HTML-to-PDF topics across stacks.
Comparing Java PDF Generation Methods
The fastest way to get stuck with HTML to PDF in Java is to optimize for the wrong thing. Teams often pick the lightest dependency first, then discover they needed browser rendering. Or they start with a browser-driven setup, then realize the operational overhead is too high for simple documents.

The decision table
| Method | Rendering Engine | JS Support | External Dependencies | Best For |
|---|---|---|---|---|
| Pure Java libraries | In-process HTML and CSS renderer | Limited or none | None beyond JVM dependencies | Controlled templates, reports, invoices |
| Headless browsers | Full browser engine | Strong | Browser binary and driver/runtime | JavaScript-heavy pages, SPA views, app screens |
| Dedicated APIs | Remote managed renderer | Depends on service behavior | Network call to external service | Teams that want less renderer maintenance |
The important trade-off is mechanical. A pure Java renderer usually keeps deployment cleaner and fits well in backend services. A browser-based approach renders what users see in modern web apps, but the runtime is heavier and operationally fussier. An API cuts local complexity, but moves rendering outside your process.
What usually drives the choice
For most Java backends, these are the primary selection criteria:
- Input complexity: Static templates and predictable markup fit in-process rendering better than script-driven pages.
- Deployment tolerance: If your platform team doesn't want browser binaries in containers, that immediately narrows the field.
- Failure mode preference: Some teams prefer local conversion failures they can debug in-process. Others prefer isolated HTTP failures over maintaining rendering infrastructure.
- Output expectations: Searchable text, accessibility features, and print-style structure matter more than screenshot-like output for business documents.
A common mistake is treating all HTML as if it were the same class of input. It isn't. A shipping label template and a hydrated analytics view may both be “HTML,” but they behave like different products at render time.
When the source is your own template, control the HTML and simplify the CSS. When the source is a live application view, assume you'll need a browser unless proven otherwise.
Licensing and long-term maintenance also matter, but they usually show up after rendering fidelity and deployment constraints. If the PDF must match the app view, fidelity decides first. If the document comes from backend-owned templates, simplicity often wins.
Pure Java Rendering with OpenHTMLToPDF
If you want HTML to PDF in Java without dragging in external binaries, a pure JVM library is the cleanest place to start. OpenHTMLToPDF describes itself as a pure-Java library for the JVM based on Flying Saucer and Apache PDFBox 2, with support for CSS 2.1, SVG images, and accessibility features including WCAG, Section 508, and PDF/UA, as described in the OpenHTMLToPDF project documentation.

That positioning tells you exactly where it fits. It's for teams that want in-process rendering, a Java-native deployment model, and document-oriented HTML rather than browser-accurate execution of client-side apps.
A minimal example
A basic conversion flow looks like this:
import com.openhtmltopdf.pdfboxout.PdfRendererBuilder;
import java.io.FileOutputStream;
import java.io.OutputStream;
import java.nio.charset.StandardCharsets;
public class HtmlToPdfExample {
public static void main(String[] args) throws Exception {
String html = """
<html>
<head>
<style>
body { font-family: sans-serif; }
h1 { color: #1f2937; }
table { width: 100%; border-collapse: collapse; }
th, td { border: 1px solid #ccc; padding: 8px; }
</style>
</head>
<body>
<h1>Invoice</h1>
<table>
<tr><th>Item</th><th>Amount</th></tr>
<tr><td>Service</td><td>$100</td></tr>
</table>
</body>
</html>
""";
try (OutputStream os = new FileOutputStream("output.pdf")) {
PdfRendererBuilder builder = new PdfRendererBuilder();
builder.withHtmlContent(html, null);
builder.toStream(os);
builder.run();
}
}
}
This style of API is what many Java teams want. No browser process. No external command. No platform-specific installation. The application renders and writes the PDF inside the same JVM.
Where it fits well
OpenHTMLToPDF works best when you own the markup and can keep it print-oriented.
- Transactional documents: Invoices, statements, receipts, certificates, and internal reports are usually a good match.
- Stable layouts: Content that can be expressed with document-style CSS is easier to keep consistent.
- Pure Java deployments: It's useful when infrastructure policies favor standard Maven dependencies over external runtime components.
Where it gets difficult is equally important.
- Client-side rendering: It doesn't execute browser JavaScript.
- Modern app layouts: Complex frontend behavior and browser-specific layout details may not map cleanly.
- Screen-first HTML: Pages designed for responsive app views often need a print-specific version before conversion.
If you use this route, don't feed it the same HTML your frontend ships to the browser and assume that's enough. Build PDF-oriented templates or at least a print-oriented variant. That's where pure Java rendering becomes reliable instead of frustrating.
High-Fidelity Conversion with iText and pdfHTML
Some Java stacks need more than “good enough.” They need standards-aware PDF output, a well-defined conversion API, and tighter control over resources. That's where iText's pdfHTML sits. iText describes pdfHTML as an add-on for Java and C# (.NET) that converts HTML and CSS into standards-compliant PDFs that are accessible, searchable, and usable for indexing, and it demonstrates the modern conversion path with HtmlConverter.convertToPdf(...) in its pdfHTML product overview.
That matters because it reflects a broader shift in Java PDF generation. Instead of hand-building paragraphs, tables, and coordinates one object at a time, the application streams HTML through a converter that preserves more document semantics.
A direct conversion example
A minimal file-based example looks like this:
import com.itextpdf.html2pdf.HtmlConverter;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
public class ITextHtmlToPdfExample {
public static void main(String[] args) {
try (InputStream html = new FileInputStream("input.html");
OutputStream pdf = new FileOutputStream("output.pdf")) {
HtmlConverter.convertToPdf(html, pdf);
} catch (Exception e) {
e.printStackTrace();
}
}
}
For real applications, you'll usually move beyond the basic call and configure conversion properties, resource lookup, and fonts more explicitly.
import com.itextpdf.html2pdf.ConverterProperties;
import com.itextpdf.html2pdf.HtmlConverter;
import java.io.ByteArrayInputStream;
import java.io.FileOutputStream;
import java.nio.charset.StandardCharsets;
public class ITextConfiguredExample {
public static void main(String[] args) {
String html = """
<html>
<body>
<h1>Monthly Report</h1>
<p>Generated from Java.</p>
</body>
</html>
""";
ConverterProperties properties = new ConverterProperties();
try (FileOutputStream out = new FileOutputStream("report.pdf")) {
HtmlConverter.convertToPdf(
new ByteArrayInputStream(html.getBytes(StandardCharsets.UTF_8)),
out,
properties
);
} catch (Exception e) {
e.printStackTrace();
}
}
}
What tends to break first
The most common production issue isn't the converter call itself. It's resource handling.
- Fonts: If you don't package required fonts with the application, fallback rendering can distort alignment and table layout.
- Malformed HTML: Broken tags and invalid structure can trigger failures that look mysterious unless you catch and log conversion errors properly.
- Relative assets: CSS, images, and font files need predictable resolution paths in server environments.
Field note: If the PDF looks “mostly right” except for spacing, table alignment, or text wrapping, check fonts before you touch the layout code.
This approach is usually strongest when the document is backend-owned, structured, and expected to remain searchable and semantically meaningful. It's less about pretending to be a browser and more about producing controlled PDF output from HTML with a conversion pipeline that fits enterprise Java applications.
Harnessing Headless Browsers with Selenium
When the input page depends on JavaScript to become complete, a non-browser renderer often isn't enough. That's the dividing line. If the content appears only after frontend code runs, then the practical answer for HTML to PDF in Java is often to automate a browser and print the rendered page.

This isn't elegant in the same way a pure Java dependency is elegant. It is, however, aligned with reality. A browser can execute scripts, wait for async data, apply modern layout behavior, and print the final DOM state.
Printing a rendered page from Java
A common pattern is Selenium driving a headless Chromium session, then using the browser's print capability.
import org.openqa.selenium.PrintsPage;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import org.openqa.selenium.print.PrintOptions;
import java.io.FileOutputStream;
import java.util.Base64;
public class SeleniumPdfExample {
public static void main(String[] args) throws Exception {
ChromeOptions options = new ChromeOptions();
options.addArguments("--headless=new");
WebDriver driver = new ChromeDriver(options);
try {
driver.get("https://example.com/report");
// In a real app, wait until the page is actually ready.
Thread.sleep(3000);
PrintOptions printOptions = new PrintOptions();
String base64Pdf = ((PrintsPage) driver).print(printOptions).getContent();
byte[] pdfBytes = Base64.getDecoder().decode(base64Pdf);
try (FileOutputStream fos = new FileOutputStream("browser-output.pdf")) {
fos.write(pdfBytes);
}
} finally {
driver.quit();
}
}
}
In production, replace Thread.sleep(...) with an explicit wait that checks for a page-ready signal. That can be a DOM element, a JavaScript flag, or the disappearance of a loading state.
Why this approach changes the trade-off
The upside is straightforward:
- Rendered JavaScript: The browser sees the page after hydration and async content loading.
- Modern layout behavior: Browser engines handle the CSS features frontend teams already use.
- Closer visual parity: If users can print it from the browser correctly, headless rendering often follows that same path.
The cost is operational.
- Runtime weight: You have to manage browser installation and compatible automation setup.
- Infrastructure complexity: Containers, CI, and server environments need more careful configuration.
- Resource usage: Browser-driven rendering is heavier than in-process document conversion.
Don't treat browser PDF generation like a utility method. Treat it like a managed subsystem with startup, health, cleanup, and isolation concerns.
This route is usually the right one when the PDF is effectively a printed web application view. If the page is a SPA screen, a dashboard, or a workflow summary assembled in the client, trying to avoid the browser usually creates more work than it saves.
Advanced PDF Control and Performance Tuning
The problems that consume the most time in production aren't usually “how do I create a PDF file.” They're “why did this heading land at the bottom of a page,” “why did the table split in the middle,” and “why does the font look different on one server.”
One issue comes up more than any other: page breaks. Accurate page-break control for real-world HTML in Java remains a persistent pain point, and support discussions repeatedly show that page breaks often don't match developer expectations, as reflected in this discussion of HTML to PDF conversion issues.
Page breaks that behave better
Start with print-oriented CSS. Even when a library supports HTML and CSS well, pagination still needs help.
@page {
margin: 20mm;
}
h1, h2, h3 {
page-break-after: avoid;
}
table, img,.card {
page-break-inside: avoid;
}
.section {
page-break-before: auto;
}
.page-break {
page-break-before: always;
}
These rules won't solve every layout edge case, but they improve the baseline. The most useful habits are simple:
- Keep headings with content: Avoid a heading at the bottom of a page with its body on the next one.
- Protect table rows when possible: Splitting a row usually makes the document feel broken.
- Create print wrappers: Group related blocks in a parent container and mark that block as break-sensitive.
Operational habits that prevent ugly output
Font handling is the next major source of surprises. If a document depends on brand fonts, package them with the application and reference them explicitly. Don't rely on host-installed fonts being present everywhere.
Headers and footers also need an explicit strategy. Depending on the rendering path, that may mean CSS for repeated content, template-level layout wrappers, or browser-print options. The key is to decide whether headers are part of the HTML content or part of the PDF rendering layer. Mixing both usually creates duplicate or misaligned output.
For performance, a few habits pay off consistently:
- Use streams instead of temp files: Keep conversion in memory where practical.
- Reuse stable assets: Fonts, CSS, logos, and templates shouldn't be reloaded unnecessarily.
- Separate document types: A simple invoice pipeline shouldn't share the same rendering path as a heavy app snapshot if their needs differ.
- Validate input HTML: Broken markup wastes debugging time later in the renderer.
Small HTML fixes often produce bigger PDF gains than renderer-level tuning. Cleaner markup, explicit widths, and print CSS usually beat heroic post-processing.
The practical takeaway is that polished PDFs come from document discipline, not just library choice. The renderer matters, but the source HTML and print rules matter just as much.
When to Use a Dedicated API Integrating Transformy.io
Self-managed rendering gives you control, but it also gives you ownership of every failure mode. Browser setup, font packaging, resource resolution, queueing, retries, and output consistency all become your problem. For some teams that's acceptable. For others, PDF generation is a support burden attached to a product that does something else.
That's when a dedicated API becomes a reasonable architecture choice. Instead of embedding the renderer in the Java service, the application sends HTML to an external endpoint and receives PDF bytes back. The Java side stays simple: build the payload, authenticate, send, and store the result.
![]()
The case for offloading rendering
This model makes sense when any of these are true:
- Your team doesn't want renderer maintenance: You'd rather own document inputs than browser infrastructure.
- Multiple services need PDFs: A central API can reduce duplicated conversion logic.
- You need a narrow Java integration surface: HTTP is easier to standardize than per-service rendering stacks.
There's also a workflow angle. Teams that convert content into multiple publishable formats often centralize transformation tasks instead of keeping them in each application. If you also process scraped or cleaned web content before document generation, an efficient HTML to Markdown API can fit into the same kind of pipeline.
If you're documenting internal integrations or browsing related site resources, the Transformy sitemap index is the relevant entry point.
A simple Java HTTP example
A plain Java client can post HTML and write the binary response to disk:
import java.io.IOException;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.nio.file.Files;
import java.nio.file.Path;
public class ApiPdfExample {
public static void main(String[] args) throws IOException, InterruptedException {
String json = """
{
"html": "<html><body><h1>Report</h1><p>Generated from Java</p></body></html>"
}
""";
HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("https://api.example.com/html-to-pdf"))
.header("Content-Type", "application/json")
.header("Authorization", "Bearer YOUR_API_KEY")
.POST(HttpRequest.BodyPublishers.ofString(json))
.build();
HttpResponse<byte[]> response = client.send(
request,
HttpResponse.BodyHandlers.ofByteArray()
);
if (response.statusCode() == 200) {
Files.write(Path.of("api-output.pdf"), response.body());
} else {
throw new RuntimeException("PDF generation failed with status: " + response.statusCode());
}
}
}
In practice, you'd replace the placeholder endpoint with your actual provider and add normal production concerns: timeout settings, retry policy, response validation, and structured error logging. One option in this category is transformy.io, which provides an HTML-to-PDF API model rather than an embedded Java renderer.
The trade-off is clear. You give up some local control, but you remove a lot of renderer-specific maintenance from the application. If PDF generation isn't a core competency for the team, that can be the most sensible choice.
If you need HTML to PDF in Java for controlled templates, start with a pure Java renderer. If your pages depend on client-side execution, use a headless browser. If you're tired of owning the rendering stack, use an API.
There isn't a universal winner. There's only the approach that matches your HTML, your deployment rules, and the amount of PDF infrastructure your team wants to maintain.
Written with the Outrank tool