Ah, bureaucracy at its finest. Have you ever asked for a list from a client, another department or agency and instead of saving into a file that might even be considered marginally useful, they give it to you as a pdf. A… p…d…f.
After some prostrations to the great Google Apps Script gods, I had a though.
“Hey, can’t we convert a PDF to a Google Doc with just a click of the button? Surely the great Google Apps Script devs have made it so we can do it programmatically too.”
And you know what? They bloody well did. The big legends.
Table of Contents
I’ve just received a bunch of PDFs. The PFDs are all labelled by the class number. Take a look at the files in my Google Drive:
Each PDF file contains a list of student IDs that I need to extract and put into a Google Sheet.
The aim is to have a list of student IDs in column A and their corresponding sections in column B.
As you can see, we have some pretty standard text in the PDF that should be easy for Google to recognise so that we can extract the IDs.
The list of names in the demo sheets were randomly generated by AI!
NOTE! As always, I have tried to create this tutorial for varying levels. Feel free to follow along, or just grab what you need and get stuck into your own project.
If you are playing along, you can find a copy of the PDF files below. Simply add them to your own Drive before you get started:
One of my recent projects in Google Apps Script required me to search for a file by name and get its ID. This can be problematic in Google Drive because you can have multiple files of the same name in multiple locations. My solution was to also check the file’s parent folder name as well.
I created a function getFileByName() to handle this. The function takes the following parameters:
fileName – The name of the file you are looking for. (required)
fileInFolder – The parent folder of the file you are searching for. (optional)
The file path would look a little something like this:
getFileByName() returns an object containing :
the ID of the file if the file exists or false if it does not.
the ERROR if there is an error or false if there is not.
The returned object would look like the following:
Unfortunately, I could not simply change the last folder name from say, Unit 4 Report to Q4 Unit 4 Report 2018 so it is easily searchable and unique. The other problem is that there are other Unit 4 Reports in other years and quarters so I did not want to accidentally call them instead of the exact one I wanted.