PDF to HTML API Benefits
Retains the exact format and original layout
Using the PDF.co API platform, you can convert your PDF file to HTML while retaining the exact layout and format as it was in the original PDF.
Support for Multiple Languages
Our Web API supports several languages, so you can choose PDF to HTML Converter API in JavaScript, Python, Java, C#, PHP, .NET and ASP.NET, Visual Basic among other programming languages. The PDF to HTML source codes are available for use, as long as you have your API key.
Our PDF to HTML API is Secure
Storage and data transfers are encrypted and secure. See the PDF.co Security page for more information and details.
Business Automation Platforms Integrations
If you are not a developer, you can also easily automate your PDF operations via popular business automation platforms: Zapier, Make, Airtable, Salesforce, Google Apps Script, and 300+ more.
PDF to HTML API Sample & Demo
In the demonstration below, you will learn how to convert PDF to HTML code using PDF.co Web API. This PDF sample file will be used for the demo.
We’ll be using those different sample code snippets below for this demo. They can convert the Sample PDF File above into HTML.
The final result will look like this.
Before we proceed to turn PDF into HTML with the code. Let us first check the /v1/pdf/convert/to/html
parameters and its uses.
Endpoint
URL: | https://api.pdf.co/v1/pdf/convert/to/html |
Method: | POST |
Parameter | Description |
url | required. Link to the source file. |
lang | optional. English by default. Sets OCR (image to text extraction) language to be used for scanned PDF when the scanned document is detected or input is PNG, JPG images. Other supported values: eng, spa, deu, fra, jpn, chi_sim, chi_tra, kor. You can also specify two languages to be used on the same page, for example, eng+deu, jpn+kor, or other combinations. |
inline | optional. Must be one of: true to return data as inline or false to return link to the output file (default). |
unwrap | optional. Unwrap lines to a single line within table cells when lineGrouping is enabled. Must be one of true or false . |
pages | optional. Comma-separated list of page indices (or ranges) to process. IMPORTANT: the very first page starts with 0 (zero). To set a range use the dash –, for example: 0, 2-5, 7-. |
rect | optional. Defines coordinates for extraction, e.g. 51.8, 114.8, 235.5, 204.0. Must be a string . |
encrypt | optional. Enable encryption for the output file: true or false |
async | optional. Runs processing asynchronously. Returns jobId to use with job/check : true or false |
name | optional. Output file name. |
profiles | optional. Must be a String. Set custom configuration. See profiles examples here |
lineGrouping | optional. Line grouping with table cells. Set to 1 to enable the grouping. Must be a string . |
Now we are ready to write some codes.
cURL Code Snippet
curl --location --request POST 'https://api.pdf.co/v1/pdf/convert/to/html' \ --header 'x-api-key: YOUR_API_KEY' \ --header 'Content-Type: application/json' \ --data-raw '{ "url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/pdf-to-html/sample.pdf", "inline": false }'
This sample code and other cURL source code samples are available here.
Now let’s see this program in action.
The sample code to convert PDF to HTML in JavaScript is located here.
The sample code for PDF to HTML in Python is here.
The sample code for PDF to HTML conversion in PHP is located here.
The sample code to convert PDF to HTML in Java is located here.
The sample code for PDF to HTML in C# is located here.
NOTE: Use PDF.co Document Classifier to know the source of the document. You can easily create and maintain classification rules with the desktop-based Classifier Testing Tool (see the details here)