Google Apps Script: Extract Specific Data From a PDF and insert it into a Google Sheet (Updated Feb 2022)

Google Apps Script: DriveApp, Advanced Drive Service, SpreadsheetApp,  DocumentApp Javascript: spread operator, map, regex

Ah, bureaucracy at its finest.  Have you ever asked for a list from a client, another department or agency and instead of saving into a file that might even be considered marginally useful, they give it to you as a pdf. A… p…d…f.

Gee, thanks.

After some prostrations to the great Google Apps Script gods, I had a though.

“Hey, can’t we convert a PDF to a Google Doc with just a click of the button? Surely the great Google Apps Script devs have made it so we can do it programmatically too.”

And you know what? They bloody well did. The big legends.

The Scenario

I’ve just received a bunch of PDFs. The PFDs are all labelled by the class number. Take a look at the files in my Google Drive:

PDFs of sections in Google Drive

Each PDF file contains a list of student IDs that I need to extract and put into a Google Sheet.

PDF file with student IDs

The aim is to have a list of student IDs in column A and their corresponding sections in column B.

As you can see, we have some pretty standard text in the PDF that should be easy for Google to recognise so that we can extract the IDs.

The list of names in the demo sheets were randomly generated by AI!

NOTE! As always, I have tried to create this tutorial for varying levels. Feel free to follow along, or just grab what you need and get stuck into your own project.

If you are playing along, you can find a copy of the PDF files below. Simply add them to your own Drive before you get started:

PDF files by Section.

Continue reading “Google Apps Script: Extract Specific Data From a PDF and insert it into a Google Sheet (Updated Feb 2022)”