How to Create PDF from URL/HTML with Puppeteer

In this article, we’ll demonstrate how to create a PDF file from a website URL or HTML file.

We’re going to demonstrate two scenarios here,

Convert website URL to PDF file
Convert HTML to PDF file

Now, We’ll be using Puppeteer for PDF generation. Puppeteer is Node.js-based feature-rich library to work with the Chromium browser in memory itself.

Puppeteer has a built-in function to export a browser page to a PDF. Let’s review both scenarios here.

Convert Website URL to PDF

Following is NodeJs source code to convert URL to PDF using Puppeteer.

const puppeteer = require("puppeteer");
 
(async () => {
 
    // Create browser instance
    const browser = await puppeteer.launch();
 
    // Create a new page
    const page = await browser.newPage();
 
    // Website URL to export as pdf
    const website_url = 'https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Keyed_collections';
 
    // Open URL in current page
    await page.goto(website_url, {
        waitUntil: 'networkidle2'
    });
 
    // Save PDF File
    await page.pdf({ path: './result_from_url.pdf', format: 'a4' });
 
    // Close browser instance
    await browser.close();
})();

Output is as follows.

Here, we have stored the browser URL in the website_url variable. It’s used to navigate a browser page there.

 // Open URL in current page
    await page.goto(website_url, {
        waitUntil: 'networkidle2'
    });

Lastly, we’re exporting the current page to PDF using the page.pdf method.

// Save PDF File
await page.pdf({ path: './result_from_url.pdf', format: 'a4' });

Convert HTML to PDF

Below is the source code for HTML file to PDF conversion using puppeteer in NodeJs.

const puppeteer = require("puppeteer");
const fs = require("fs");
 
(async () => {
 
    // Create browser instance
    const browser = await puppeteer.launch();
 
    // Create a new page
    const page = await browser.newPage();
 
    // Get HTML content
    const html = fs.readFileSync('./sample.html', 'utf-8');
 
    // Set HTML as page content
    await page.setContent(html, { waitUntil: 'domcontentloaded' });
 
    // Save PDF File
    await page.pdf({ path: './result_from_html.pdf', format: 'a4' });
 
    // Close browser instance
    await browser.close();
})();

Output is as follows.

Now, we have a sample.html file containing all HTML we want to export to PDF. First, we’re reading all content of the HTML file and storing it in the variable html. Then, this variable is used to set page content. In other words, it’s similar to opening a local HTML file in a browser.

// Get HTML content
    const html = fs.readFileSync('./sample.html', 'utf-8');
 
    // Set HTML as page content
    await page.setContent(html, { waitUntil: 'domcontentloaded' });

In the end, we’re exporting the browser page to PDF using the page.pdf method.

// Save PDF File
await page.pdf({ path: './result_from_html.pdf', format: 'a4' });

More about the Puppeteer-based approach

Puppeteer is a great library to do browser automation tasks and website scraping. It’s easy to set up locally and has a feature-rich library.

Working with Puppeteer in a local development environment and deploying it to a production server; are at separate levels of difficulty.

It can be tricky to set up in a production server as its nodeJs package has dependencies on a headless Chromium browser (which usually comes along with the package). In most cases, it requires a dedicated server with a good amount of CPU resources.

Additionally, this is NodeJs based library; hence can not be used directly with projects which are in other languages/environments

Alternate PDF.co based approach

PDF.co provides rest-based endpoints to perform various PDF-related tasks. One of the features is HTML to PDF. We can either convert URL to PDF or HTML to PDF.

As PDF.co is a rest-based service, it can easily be consumed by any language/environment. Also, it’s hosted on robust secure servers; hence we don’t need to worry about expensive hosting.

These are the PDF.co endpoints for HTML to PDF generation.

URL to PDF
HTML to PDF
PDF from HTML Template (PDF Report Generation based on HTML)

Let’s take a look at the CURL example on the URL to PDF using PDF.co.

curl --location --request POST 'https://api.pdf.co/v1/pdf/convert/from/url' \
--header 'x-api-key: ' \
--header 'Content-Type: application/json' \
--data-raw '{
    "url": "https://wikipedia.org/wiki/Wikipedia:Contact_us",
    "async": false,
}'

Summary

In this article, we’ve discussed various ways to generate PDF from HTML/URL. Please try these samples in your machine for more exposure.
Thank you!

How to Create PDF from URL/HTML with Puppeteer

Convert Website URL to PDF

Convert HTML to PDF

More about the Puppeteer-based approach

Alternate PDF.co based approach

Summary

Related Posts:

How to Add PDF.co Custom Connector For Power Automate Cloud

How to Convert HTML to PDF Using PDF.co Web API in Python

Explore Tags