PDF.co is the API platform for data and text extraction, reading data from PDF, images, and many other sources.
PDF2json is the popular npm node.js open-source package that provides PDF to JSON conversion for use from javascript.
Features Comparison
Feature | PDF.co API | PDF2json |
---|---|---|
PDF to JSON conversion | yes | yes |
preserve the original visual layout | yes | no |
PDF to plain text | yes | yes |
PDF to plain text with the original layout | yes | no |
PDF to CSV and pdf to XML | yes | no |
font name and styles information | yes | no |
virtual grid with columns and rows | yes | no |
scanned pdf, jpg, png support as input | yes | no |
damaged pdf support | yes | no |
OCR with one or more languages | yes | no |
on-premises version | yes | no |
cloud version | yes | no |
If you need a very simple version of PDF to JSON then you can go with the pdf2json npm package.
If you need PDF to JSON conversion that provides a better-structured output, then consider PDF.co API instead.
Visual comparison of results generated by PDF2JSON and PDF.co API:
JSON generated by PDF2JSON npm package on IRS Form 982 PDF:
JSON generated by PDF.co pdf/convert/to/json on IRS Form 982 PDF:
Plain text generated by PDF2JSON npm package on IRS Form 982 PDF:
Plain text generated by PDF.co pdf/convert/to/text on IRS Form 982 PDF: