In this article, we’ll demonstrate how to create a PDF file from a website URL or HTML file.
We’re going to demonstrate two scenarios here,
Now, We’ll be using Puppeteer for PDF generation. Puppeteer is Node.js-based feature-rich library to work with the Chromium browser in memory itself.
Puppeteer has a built-in function to export a browser page to a PDF. Let’s review both scenarios here.
Convert Website URL to PDF
Following is NodeJs source code to convert URL to PDF using Puppeteer.
const puppeteer = require("puppeteer"); (async () => { // Create browser instance const browser = await puppeteer.launch(); // Create a new page const page = await browser.newPage(); // Website URL to export as pdf const website_url = 'https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Keyed_collections'; // Open URL in current page await page.goto(website_url, { waitUntil: 'networkidle2' }); // Save PDF File await page.pdf({ path: './result_from_url.pdf', format: 'a4' }); // Close browser instance await browser.close(); })();
Output is as follows.
Here, we have stored the browser URL in the website_url variable. It’s used to navigate a browser page there.
// Open URL in current page await page.goto(website_url, { waitUntil: 'networkidle2' });
Lastly, we’re exporting the current page to PDF using the page.pdf method.
// Save PDF File await page.pdf({ path: './result_from_url.pdf', format: 'a4' });
Convert HTML to PDF
Below is the source code for HTML file to PDF conversion using puppeteer in NodeJs.
const puppeteer = require("puppeteer"); const fs = require("fs"); (async () => { // Create browser instance const browser = await puppeteer.launch(); // Create a new page const page = await browser.newPage(); // Get HTML content const html = fs.readFileSync('./sample.html', 'utf-8'); // Set HTML as page content await page.setContent(html, { waitUntil: 'domcontentloaded' }); // Save PDF File await page.pdf({ path: './result_from_html.pdf', format: 'a4' }); // Close browser instance await browser.close(); })();
Output is as follows.
Now, we have a sample.html file containing all HTML we want to export to PDF. First, we’re reading all content of the HTML file and storing it in the variable html. Then, this variable is used to set page content. In other words, it’s similar to opening a local HTML file in a browser.
// Get HTML content const html = fs.readFileSync('./sample.html', 'utf-8'); // Set HTML as page content await page.setContent(html, { waitUntil: 'domcontentloaded' });
In the end, we’re exporting the browser page to PDF using the page.pdf method.
// Save PDF File await page.pdf({ path: './result_from_html.pdf', format: 'a4' });
More about the Puppeteer-based approach
Puppeteer is a great library to do browser automation tasks and website scraping. It’s easy to set up locally and has a feature-rich library.
Working with Puppeteer in a local development environment and deploying it to a production server; are at separate levels of difficulty.
It can be tricky to set up in a production server as its nodeJs package has dependencies on a headless Chromium browser (which usually comes along with the package). In most cases, it requires a dedicated server with a good amount of CPU resources.
Additionally, this is NodeJs based library; hence can not be used directly with projects which are in other languages/environments
Alternate PDF.co based approach
PDF.co provides rest-based endpoints to perform various PDF-related tasks. One of the features is HTML to PDF. We can either convert URL to PDF or HTML to PDF.
As PDF.co is a rest-based service, it can easily be consumed by any language/environment. Also, it’s hosted on robust secure servers; hence we don’t need to worry about expensive hosting.
These are the PDF.co endpoints for HTML to PDF generation.
- URL to PDF
- HTML to PDF
- PDF from HTML Template (PDF Report Generation based on HTML)
Let’s take a look at the CURL example on the URL to PDF using PDF.co.
curl --location --request POST 'https://api.pdf.co/v1/pdf/convert/from/url' \ --header 'x-api-key: ' \ --header 'Content-Type: application/json' \ --data-raw '{ "url": "https://wikipedia.org/wiki/Wikipedia:Contact_us", "async": false, }'
Summary
In this article, we’ve discussed various ways to generate PDF from HTML/URL. Please try these samples in your machine for more exposure.
Thank you!