PDF to CSV API Benefits
Structured CSV as output
Our AI-based engine is analyzing input PDF documents and re-creates the original structure of tables and text as structured CSV data.
As a result, it can make post-processing and importing output data into a database faster and easier compared to regular PDF to CSV tools.
Supports damaged documents and scanned text
Input PDF files may contain text, scanned images, damaged text, and multiple languages text. PDF.co engine can recognize all these types of text in multiple languages with the help of our built-in OCR (Optical Character Recognition) powered by the additional AI and ML.
Web API Supports Multiple Languages
Web API can convert PDF to CSV files from programming languages such as PHP, Javascript, .NET and ASP.NET, C#, Java, Visual Basic, and many others. Find source code samples in our API documentation.
Business Automation Platforms Integrations
If you are not a developer, you can also easily automate your PDF operations via popular business automation platforms: Zapier, Make, Airtable, Salesforce, Google Apps Script, and 300+ more.
Sample & Demo to Convert PDF Table to CSV
For this demo, I am going to use a Sample PDF File.
We’ll be using the code snippets below which are written in different programming languages and will convert the Sample PDF File above into comma-separated values (CSV). You will follow the steps about how to convert a PDF table to CSV format.
The final structured CSV result will look like this:
"Your Company Name","","","", "Your Address","","","", "City, State Zip","","","", "","","","Invoice No. 123456", "","","","Invoice Date 01/01/2016", "Client Name","","","", "Address","","","", "City, State Zip","","","", "Notes","","","", "Item","Quantity","Price","Total", "Item 1","1","40.00","40.00", "Item 2","2","30.00","60.00", "Item 3","3","20.00","60.00", "Item 4","4","10.00","40.00", "","","TOTAL","200.00",
Output CSV
Before we proceed with the code. Let us first check the /v1/pdf/convert/to/csv
parameters and their uses.
Endpoint for PDF to CSV Conversion
URL: | https://api.pdf.co/v1/pdf/convert/to/csv |
Method: | POST |
Parameter | Description |
url | required. Link to the source file. |
lang | optional. english by default. Sets OCR (image to text extraction) language to be used for scanned PDF when the scanned document is detected or input is PNG, JPG images. Other supported values: eng, spa, deu, fra, jpn, chi_sim, chi_tra, kor. You can also specify two languages to be used on the same page, for example: eng+deu, jpn+kor or other combinations. |
inline | optional. Must be one of: true to return data as inline or false to return link to the output file (default). |
unwrap | optional. Unwrap lines to a single line within table cells when lineGrouping is enabled. Must be one of true or false . |
pages | optional. Comma-separated list of page indices (or ranges) to process. IMPORTANT: the very first page starts with 0 (zero). To set a range use the dash –, for example: 0, 2-5, 7-. |
rect | optional. Defines coordinates for extraction, e.g. 51.8, 114.8, 235.5, 204.0. Must be a string . |
encrypt | optional. Enable encryption for the output file: true or false |
async | optional. Runs processing asynchronously. Returns jobId to use with job/check : true or false |
name | optional. Output file name. |
profiles | optional. Must be a String. Set custom configuration. See profiles examples here |
lineGrouping | optional. Line grouping with table cells. Set to 1to enable the grouping. Must be a string . |
Now we are ready to write some codes.
cURL Code Snippet
curl --location --request POST 'https://api.pdf.co/v1/pdf/convert/to/csv' --header 'Content-Type: application/json' --header 'x-api-key: YOUR_API_KEY' --data-raw '{ "url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/pdf-to-csv/sample.pdf", "lang": "eng", "inline": "true", "unwrap": "", "pages": "0-", "rect": "", "async": "false", "encrypt": "false", "name": "result.csv", "password": "", "lineGrouping": "", "profiles": "" }'
This sample code and other cURL source code samples are available here.
Now let’s see this program in action.
JavaScript source code samples for PDF to CSV API are available in our repository here.
PHP source code samples for PDF to CSV API are available in our repository here.
The sample code for PDF to CSV in Python is here.
Java source code samples for PDF to CSV API are available in our repository here.
C# source code samples for PDF to CSV API are available in our repository here.
NOTE: Use PDF.co Document Classifier to know the source of the document. You can easily create and maintain classification rules with the desktop-based Classifier Testing Tool (see the details here)