Extracting text, images, and vector line drawings can be a time-consuming and challenging task to do especially when dealing with large volumes of documents.

PDF.co is a powerful PDF processing platform that allows users to extract text, images, and other information from PDFs quickly and easily. The platform supports a wide range of input formats, including scanned PDFs and even images, and can output data in a variety of formats, such as CSV, JSON, and XML.

In this tutorial, we will cover the process of how to extract text, images, and vector line drawings from a PDF document by using the applications PDF.co and Zapier.

  1. Create a Zap
  2. Add Google Drive App
  3. Connect Google Drive Account
  4. Setup Trigger
  5. Test Trigger
  6. Test Trigger Result
  7. Add PDF.co App
  8. Connect PDF.co Account
  9. Setup Action
  10. Test Action
  11. Test Result
  12. Extracted JSON Output

We will use a sample PDF document to demonstrate the process of extracting text, images, and vector line drawings in JSON format using PDF.co.

Sample PDF Document
Sample PDF Document

Step 1: Create a Zap

  • Start by logging into your Zapier account and clicking on the Create Zap button.

Step 2: Add Google Drive App

  • Next, select the Google Drive app and choose the New File in Folder option. This will trigger the process when a new file is added to the specified folder.

Add Google Drive App

Step 3: Connect Google Drive Account

  • To proceed, connect your Google Drive account to Zapier and grant access to authorize the connection between the two services.

Connect Google Drive Account

Step 4: Setup Trigger

Let’s set up the trigger.

  • First, select My Google Drive as the drive to use.
  • Next, specify the folder name where the source file is located.

Setup Trigger

Step 5: Test Trigger

  • Now, let’s test the trigger to ensure that it was set up correctly.

Test Trigger

Step 6: Test Trigger Result

  • Awesome! The test trigger was successful in retrieving the file from Google Drive. Now, let’s proceed by adding another app to extract text, images, and vector line drawings from the PDF document.

Test Trigger Result

Step 7: Add PDF.co App

  • In this step, we will add the PDF.co app and choose the Custom API Call option.

Add PDF.co App

Step 8: Connect PDF.co Account

  • Now, connect your PDF.co account to Zapier by adding the API Key. You can obtain the API key from your PDF.co dashboard or by signing up at this link.

Connect PDF.co Account

Step 9: Setup Action

Let’s set up the action.

  • First, enter the PDF to JSON2 endpoint to extract text, images, and vector line drawing from a PDF document.
  • Next, select the Web Content Link from Google Drive as the input file.
  • After that, input the JSON code containing the profiles with the save embed image option for image extraction.


{
"profiles": "{ 'SaveImages': 'Embed' }"
}

Setup Action

Step 10: Test Action

  • Now, let’s test the action to ensure that we have set up the PDF.co Custom API Call correctly. This will send a request to PDF.co to extract text, images, and vector line drawing from the PDF document.

Test Action

Step 11: Test Result

  • Congratulations! The test was successful and PDF.co returned a temporary URL with the extracted text, images, and vector line drawing from the PDF document. To view the output, simply copy the URL and paste it into your browser.

Test Result

Step 12: Extracted JSON Output

  • Below is the JSON output containing the extracted text, images, and vector line drawing from the PDF document.
Extracted JSON Output
Extracted JSON Output

In this tutorial, you learned how to extract text, images, and vector line drawings in a PDF document using PDF.co and Zapier. You also learned how to use the PDF.co PDF to JSON2 endpoint to extract text and images from the PDF document.