Imagine a scenario where we have thousands of PDFs containing invoices of different companies in different formats. Now if we have to prepare reports around these files, we’ll need to know fields such as invoice number, invoice amount, invoice date, and so on. If we try to extract this information manually from x number of invoices, it’ll be a very time-consuming process and error-prone for sure.
Luckily there are few services out there that can help us in these kinds of situations. In this article, we’ll try to extract invoice information from PDF using PDF.co API endpoint named document parser. We’ll create sample solutions in Google Spreadsheet and App Scripts. After we’re finished with this project, we can parse any invoice PDF of any format and we’ll get invoice information right into Spreadsheet, isn’t it cool!
Review Final Spreadsheet
Below is the spreadsheet we’ve designed and the final output that we’re going to achieve.
We have the following highlighted controls here.
PDF.co | This is a custom-generated menu using Google App Script. Upon clicking the submenu “Read Invoice Data”, it’ll process the input PDF URL and write all results. |
PDF.co API Key | It’s the PDF.co API Key, which is required to authenticate PDF.co requests. You will get this API key upon signing up with PDF.co. |
Invoice Input PDF | URL of input PDF Invoice. |
Output
In the below animation, we’re demonstrating fetching basic invoice fields from standard invoice PDF URL.
PDF.co API endpoint for document parser is pretty generic and it tries to identify invoice fields using AI. Hence if we pass input PDF documents in different formats it will have all fields and their position within the document will be different.
PDF.co does a fantastic job in extracting and identifying data from divergent invoice formats. For example, take a look at the following animation. Here, we are using a new invoice sample and running the same program to identify invoice fields.
I believe you’ll be excited to jump right into source code. Before doing that, let’s review the PDF.co endpoint which does the heavy lifting of parsing documents.
PDF.co Endpoint For Document Parser
PDF.co Document parser is an AI-powered API endpoint that can automatically parse PDF, PNG, JPG documents and extract fields, tables, or values. It has an inbuilt template for parsing invoices which we observe in this article. However, we can create our own template based on our requirements and use it to extract values.
API Endpoint
https://api.pdf.co/v1/pdf/documentparser
Sample Input Request
{ "url": "https://bytescout-com.s3-us-west-2.amazonaws.com/files/demo-files/cloud-api/document-parser/sample-invoice.pdf", "outputFormat": "JSON", "templateId": "1", "async": false, "encrypt": "false", "inline": "true", "password": "", "profiles": "", "storeResult": false }
Sample Output
{ "body": { "objects": [ { "name": "companyName", "objectType": "field", "value": "ACME Inc.", "rectangle": [ 0.0, 0.0, 0.0, 0.0 ] } ...
Following animation shows executing a request from a postman. You can get a postman collection of PDF.co APIs from here.
Please use the links below to know more about template generation or Document Parser API itself.
With a review of PDF Document Parser API, I guess we’re ready to dive into Google Apps source code.
Source Code
/* Initial Declarations */ let Cell_PDFcoApiKey, PDFCoApiKey, Cell_InputPDFUrl, InputPDFUrl, Cell_CompanyName, Cell_BillTo, Cell_InvoiceNo, Cell_InvoiceDate, Cell_DueDate, Cell_SubTotal, Cell_Tax, Cell_TotalAmount, Cell_Error; /* * Handle PDF.co Menu on Google Sheet Open Event */ function onOpen() { var spreadsheet = SpreadsheetApp.getActive(); var menuItems = [ {name: 'Read Invoice Data', functionName: 'ParseInvoiceData'} ]; spreadsheet.addMenu('PDF.co', menuItems); } /** * Parse Invoice Data and put it into a spreadsheet */ function ParseInvoiceData() { // Initial Cell reference and clean AssignInitialCellRefAndClean(); if(!ValidateInputValues()){ return; } // Prepare Payload var data = { "url": InputPDFUrl, "outputFormat": "JSON", "templateId": "1", "async": false, "encrypt": "false", "inline": "true", }; // Prepare Request Options var options = { 'method' : 'post', 'contentType': 'application/json', 'headers': { "x-api-key": PDFCoApiKey }, // Convert the JavaScript object to a JSON string. 'payload' : JSON.stringify(data) }; // Get Response // https://developers.google.com/apps-script/reference/url-fetch var pdfCoResponse = UrlFetchApp.fetch('https://api.pdf.co/v1/pdf/documentparser', options); var pdfCoRespContent = pdfCoResponse.getContentText(); var pdfCoRespJson = JSON.parse(pdfCoRespContent); // Display Result if(!pdfCoRespJson.error){ Cell_CompanyName.setValue(ExtractFieldFromResponse(pdfCoRespJson, "companyName")); Cell_BillTo.setValue(ExtractFieldFromResponse(pdfCoRespJson, "companyName2")); Cell_InvoiceNo.setValue(ExtractFieldFromResponse(pdfCoRespJson, "invoiceId")); Cell_InvoiceDate.setValue(ExtractFieldFromResponse(pdfCoRespJson, "dateIssued")); Cell_DueDate.setValue(ExtractFieldFromResponse(pdfCoRespJson, "dateDue")); Cell_SubTotal.setValue(ExtractFieldFromResponse(pdfCoRespJson, "subTotal")); Cell_Tax.setValue(ExtractFieldFromResponse(pdfCoRespJson, "tax")); Cell_TotalAmount.setValue(ExtractFieldFromResponse(pdfCoRespJson, "total")); } else{ Cell_Error.setValue(pdfCoRespJson.message); } } /** * Validate Input Values before request */ function ValidateInputValues(){ if(InputPDFUrl === ""){ Cell_Error.setValue("Please Provice Input Invoice PDF URL"); return false; } if(PDFCoApiKey === ""){ Cell_Error.setValue("Please Provice PDF.co API Key"); return false; } Cell_Error.setValue(""); return true; } /** * Extract field value from JSON response */ function ExtractFieldFromResponse(respJson, fieldName){ let retVal = ""; if(respJson && respJson.body && respJson.body.objects && respJson.body.objects.length > 0){ const objectField = respJson.body.objects.filter(x => x.name === fieldName && x.objectType === "field"); if(objectField && objectField.length > 0){ retVal = objectField[0].value; } } return retVal; } /** * Assign Initial References */ function AssignInitialCellRefAndClean(){ var spreadsheet = SpreadsheetApp.getActive(); // Assign Cell References Cell_PDFcoApiKey = spreadsheet.getRange("B1"); PDFCoApiKey = Cell_PDFcoApiKey.getValue(); Cell_InputPDFUrl = spreadsheet.getRange("B2"); InputPDFUrl = Cell_InputPDFUrl.getValue(); Cell_CompanyName = spreadsheet.getRange("A5"); Cell_BillTo = spreadsheet.getRange("B5"); Cell_InvoiceNo = spreadsheet.getRange("C5"); Cell_InvoiceDate = spreadsheet.getRange("D5"); Cell_DueDate = spreadsheet.getRange("E5"); Cell_SubTotal = spreadsheet.getRange("F5"); Cell_Tax = spreadsheet.getRange("G5"); Cell_TotalAmount = spreadsheet.getRange("H5"); Cell_Error = spreadsheet.getRange("A3:H3"); Cell_CompanyName.setValue(""); Cell_BillTo.setValue(""); Cell_InvoiceNo.setValue(""); Cell_InvoiceDate.setValue(""); Cell_DueDate.setValue(""); Cell_SubTotal.setValue(""); Cell_Tax.setValue(""); Cell_TotalAmount.setValue(""); Cell_Error.setValue(""); }
I believe source code is very easy to understand, let’s review it briefly.
Review Source Code
This sample demonstrates how we can use PDF.co API endpoint for extracting PDF right inside our Google Spreadsheet using AppScript.
We took some predefined cells and assumed that they would contain a specific value. For example, we are using cell “B1” to get PDF.co API value. We used a similar approach to get input values and display output. I am sure we can achieve this better in different ways, but it works for our example.
With that said, we are declaring variables initially for these input/output cells.
/* Initial Declarations */ let Cell_PDFcoApiKey, PDFCoApiKey, Cell_InputPDFUrl, InputPDFUrl, Cell_CompanyName, Cell_BillTo, Cell_InvoiceNo, Cell_InvoiceDate, Cell_DueDate, Cell_SubTotal, Cell_Tax, Cell_TotalAmount, Cell_Error;
These variables are allocated in the method named “AssignInitialCellRefAndClean”. It also clears the value of output cells.
When the spreadsheet is loaded/opened by the user an event “onOpen“ is executed. Hence whatever code is written inside this method is getting called initially. We’re building a menu named “PDF.co” and it’s submenu “Read Invoice Data” then.
var menuItems = [ {name: 'Read Invoice Data', functionName: 'ParseInvoiceData'} ];
Please note that this menu is referring to a function named “ParseInvoiceData”. Hence whenever a sub-menu named “Read Invoice Data” is clicked, it’ll execute the function “ParseInvoiceData”. Nice wiring!
Function “ParseInvoiceData” is the heart of this program! We’re preparing the PDF.co execution payload and handling it’s response here. Also, API request specification is done here. Note that we’re passing the PDF.co API key in the request header. This is very important, as the API key is responsible for the authentication of your request.
// Prepare Request Options var options = { 'method' : 'post', 'contentType': 'application/json', 'headers': { "x-api-key": PDFCoApiKey }, // Convert the JavaScript object to a JSON string. 'payload' : JSON.stringify(data) };
Once we get a response, we parse it using the function “ExtractFieldFromResponse” and fill output cells with it.
With that, we have a job done here! Great!
Summary
It’s a great demo we just witnessed in this article. It’s simply awesome that someone just puts a PDF URL into a spreadsheet cell, clicks a menu, and gets all the information extracted and nicely placed at their respective places!
Now, the following are useful links from this article.
PDF.co | https://pdf.co |
PDF.co Documentation | https://apidocs.pdf.co |
PDF.co Document Parser | https://pdf.co/document-parser |
Consuming API from Google Script | https://developers.google.com/apps-script/reference/url-fetch |
I hope this article will be educational and helpful to you! Thank you!
Similar Pages:
- How to Password-Protect PDF in Google Drive with Google Apps Script and PDF.co
- How to Split PDF in Google Drive Folder with Google Apps Script and PDF.co
- How to Merge PDF URLs with Google Apps Script and PDF.co
- How to Add Text to PDF using Google Apps Script and PDF.co
- Merge Google Drive PDF Files and Save Them Back using Google Apps Script and PDF.co
- How to Add Image to PDF using Google Apps Script and PDF.co
- Google Invoice Parser to Read PDF Invoices and Orders with Google Script and PDF.co
- Convert PDF Invoice to Google Sheet – PDFco & Google Apps Script
- How to Merge all Rows to PDF with Google Apps Script and PDF.co