Retrieving a Google Docs body text is quite easy with the help of Google Apps Script.
Well, until it isn’t. Let me explain.
Table of Contents
The Starter Doc
You can grab a copy of the starter sheet here.
Your Google Doc should look a little like this:

(1) Get the Google Doc Body with Document App – Super Easy
My journey started off easy enough with a method I had run 100 times.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
/** * Get the body text of a * Google Doc * * @link yagisanatode.com */ function runsies() { const body = DocumentApp .getActiveDocument() .getBody() .getText() // Send body text to your // desired process console.log(body) } |
Lines 10-11: First, we call the Document App Service and retrieve the active document.
Lines 12-13: Next, all we need to do is call the getBody()
method to retrieve the body data and from that, get the text of the body.
That’s it! Super simple.
You can give it a spin with the starter sheet.

Life was so much simpler when Google’s Document App team first created their DocumentApp Service. There were no ‘Smart’ Chips to deal with for them. Just some text to extract from paragraph or table elements.
If you have been playing along, you would have probably noticed that any data with a Smart Chip is not displayed with the simple approach above.
Why? Well for good or ill Google’s Smart Chips were designed with the permanently-online workforce in mind. They provide a layout that can be hovered over to provide more information and, sometimes, imagery about a topic. They come in a number of flavours that users can generally access by using the ‘@’ symbol or going to Insert > Smart Chips in the menu.
I see the utility but be warned, if you find yourself moving documents between Google Docs, Libre Office Writer or Microsoft Office Document then you will be in for a headache. I digress.
So how do we get that chip data?
(2) Get Google Docs Body Data and Chip Data with Document App
As complete and enticing as this chapter title is, as of writing this the Google Docs development team has only created chip access support for a very, and I mean ‘very’, limited number of Smart Chips.
So what can we extract?
The Video

Chips that Document App Can extract
- Date chips: These are dates that you can statically add (which is a weird use for a chip, just add the date in as normal text, ya sausage!) and dynamic chips, like today’s date.
- Person Chips: This lists the selected person’s email and name. When you hover over this bad boy, it will come up with their racy little avatar and some more actions too.
Google Doc Person Chip - Rich Link Chips: Just like regular links in a Google Doc, a rich link will link you to another Google Workspace Document. It won’t link you to external links like a website and it obviously can’t be read by the
.getText()
method. Further, it cannot even be read by calling the old trustygetLinkUrl()
method (more on this later). Plus, it pretty much displays the same data when you hover over it as a normal link. So what’s the benefit? Well, you get a little document icon at the start and Google Docs will aggressively prompt you to convert any URL to a link. So I guess you are saving time writing a label for the link. Do yourself a favour and stick with the old-school links.
The rest of the chips you can’t extract. Some for good reason, like the very interactive chips, but there are also chips that should be retrievable but no methods as been implemented by Google yet.
What’s the message? Try to avoid Smart Chips in Google Workspace if you are scripting or want to download the file and use it with other software. Avoid the enticing walled garden where you can.
What about extracting URLs from Normal Links?
While we are extracting a limited subset of the smart chips (Yeah, still disgruntled), why don’t we also grab the links from the standard text and add them to our returned text?
Time to look at the code.
The Code
Note, that I have set the code up in a way that you should be able to easily remove any of the JavaScript switch cases that you don’t need.
I’ve also left a bunch of the console.logs()
in the script to help you see what is going on. You can always comment them out or delete them as needed.
Finally, the function used here has a returning statement so you can implement this in your project, though you may wish to abstract the document id selection to a parameter if you are not working within a Google Doc-bound project.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
/** * Get body text and all chips * that are currently supported. * * @see [Get a Google Docs Body Text with Apps] * {@link https://yagisanaode.com/get-a-google-docs-body-text-with-apps/} * @author Scott <scott@yagisanatode.com> * @license MIT License */ function getBodyWithChips() { let text = "" /** * Iterates over the the child elements. * @param {DocumentApp.Body} parent - actually a container class for * body, paragraph, list items and tables. Body autocomplete correctly * though. * */ function elementIterator(parent) { const numChildren = parent.getNumChildren() for (let i = 0; i < numChildren; i++) { const element = parent.getChild(i) // console.log(element.getType()) const type = element.getType() switch (type) { case DocumentApp.ElementType.PARAGRAPH: console.log(`Child ${i} is a paragraph`) elementIterator(element.asParagraph()) break; case DocumentApp.ElementType.LIST_ITEM: console.log(`Child ${i} is a list item`) elementIterator(element.asListItem()) break; case DocumentApp.ElementType.TABLE: console.log(`Child ${i} is a table`) elementIterator(element.asTable()) break; case DocumentApp.ElementType.TABLE_ROW: console.log(`Child ${i} is a table row`) elementIterator(element.asTableRow()) break; case DocumentApp.ElementType.TABLE_CELL: console.log(`Child ${i} is a table cell`) elementIterator(element.asTableCell()) break; case DocumentApp.ElementType.TEXT: console.log(`Child ${i} is text`) /** @type {DocumentApp.Text} */ const textEl = element.asText() let textToAdd = textEl.getText() console.log(textToAdd, textEl.getTextAttributeIndices()) const indicies = textEl.getTextAttributeIndices() // Add URL if a link let offset = 0 // Required when text expands with new link. for(let j = 0; j < indicies.length; j++){ const startPos = indicies[j] const link = textEl.getLinkUrl(startPos) // Check for link if(link){ console.log(link) const stringLength = textToAdd.length const isAtEnd = j+1 === indicies.length const endPos = isAtEnd ? textToAdd.length : indicies[j+1] + offset let textWithLink = textToAdd.substring(0, startPos + offset) + "[" + textToAdd.substring(startPos + offset,endPos) + `](${link})` textToAdd = (isAtEnd) ? textWithLink : textWithLink + textToAdd.substring(endPos) console.log(textToAdd.length, stringLength, textToAdd) offset += textToAdd.length - stringLength } } text += textToAdd // Return a new line if we are the last in the para if(i+1 === numChildren){ text += "\n" } break; case DocumentApp.ElementType.DATE: console.log(`Child ${i} is a Date`) /** @type {DocumentApp.Date} */ const dateEl = element.asDate() const displayDate = dateEl.getDisplayText() const timestamp = dateEl.getTimestamp() const locale = dateEl.getLocale() text += ` DATE[Display: ${displayDate}, Timestamp: ${timestamp}, Locale: ${locale}]` break; case DocumentApp.ElementType.RICH_LINK: console.log(`Child ${i} is a rich link`) /** @type {DocumentApp.RichLink} */ const richLinkEl = element.asRichLink() const title = richLinkEl.getTitle() const uri = richLinkEl.getUrl() const type = richLinkEl.getMimeType() text += ` [${title}](${uri}) type: ${type}` break; case DocumentApp.ElementType.PERSON: console.log(`Child ${i} is a person`) /** @type {DocumentApp.Person} */ const personEl = element.asPerson() const email = personEl.getEmail() const name = personEl.getName() text += ` [${name}]{${email}}` break; case DocumentApp.ElementType.TABLE_OF_CONTENTS: console.log(`Child ${i} is an equation`) /** @type {DocumentApp.TableOfContents} */ const tocEl = element.asTableOfContents() text += `TOC:\n${tocEl.getText()}\n` break; case DocumentApp.ElementType.UNSUPPORTED: console.log(`Child ${i} is a unsupported`) text += "!!UNSUPPORTED REGION!!" break; } } // text += "\n" console.log("End Element Callback: ", text ) return text } const body = DocumentApp .getActiveDocument() .getBody() elementIterator(body) console.log("TEXT:", text) return text } |
If you have found the tutorial helpful, why not shout me a coffee ☕? I'd really appreciate it.
Code Breakdown
Function Setup
The function getBodyWithChips()
returns a text string containing the retrieved data from the file.
We will need to iterate over a number of different elements to access their text data.
Line 12: Sets the text data. This will be appended to as we find the date in the Google Doc.
Next, I have added an internal callback method in the function to help iterate over the different element types in the Google Doc file. More on this later. For now, let’s jump to the bottom of the function.
Line 156-158 – Here we retrieve the DocumentApp
body object in the same way we did in our simple example at the beginning of this tutorial.
Line 161 – Next we call the elementIterator()
function. This function takes our Document App body class object as its initial parameter.
Line 164 – Finally, the text is returned to be used in your calling function.
The elementIterator() function
This function is our driving callback function that will scan each element type and try to retrieve the text from each element.
The function takes the DocumentApp.Body
class object which is based on a standard element interface for Google Doc elements. There is no ‘interface’ type for text completion, so I have just added the ‘Body’ class object here, but in essence, the class object type will change depending on the element being called during the callback.
The function will return a text string back to the contain function once all callbacks are complete.
Line 22 – Upon each iteration, we will need to retrieve the number of child elements in the containing (parent) element using the getNumChildren
function.
Line 25-48 – We then iterate over the number of children, first collecting the child element object with getChild()
and then extracting the element type with getType()
.
line 30 – Using the type as our condition, we use a JavaScript switch statement to determine our case to run for the selected element.
Elements can be accurately retrieved using the ElementType
enumerator for each case as you can see in the example.
Paragraph Element
If an element is a paragraph then it will not contain any text directly but its children will. This means that we will send the paragraph element to the elementIterator()
parsing in the paragraph as its argument.
While we could call the getText() method at this point, we would not be able to extract any URLs from the text.
List Item Element
Similar to the paragraph element, a list item will contain child elements with text in them. Here we parse in the list item element back into our callback function.
Table Element
The direct child of a table element is a table row. We will need to feed this table element back to our callback to iterate over the table’s rows.
Table Row Element
Each table row contains a list of table cell child elements. Again, we must feed the elementIterator()
callback the table row to extract the table cell.
Table Cell Element
A table cell element can contain paragraphs, list elements, other table elements and any other number of element types. As such we need to send this back to our elementIterator()
callback to dig deeper into the element tree to extract our text.
Text Element
Finally! This is what we are trying to extract.
If you have no intention of extracting any potential links from the document then you may replace the contents of this case with a much simpler script:
1 2 3 4 5 6 |
console.log(`Child ${i} is text`) /** @type {DocumentApp.Text} */ const textEl = element.asText() text += textEl.getText() + "\n" break; |
Extracting both the text and its associated URL while correctly maintaining its associated position is a little tricky.
Lines 60-63: Here we first retrieve the element data by defining the element type as text (asText()
) and then get the text from the text element using the getText() method.
Text Element Attributes and Indices
A string of text in Google Docs is separated by its attributes. An attribute might be its font-weight, colour, type or whether it contains a link. These attributes are stored in an array of character indices in DocumentApp
. Where each index has its own unique set of attributes.
This means that we need to iterate over each index position and check if there is a link attribute in it. If there is we will extract it and add it to our text.
For our example, we will store link data like markdown links:
[name]{url}
Line 66: First we need to set an offset to the text string we will build because the end result will be larger than the original text string because we are adding in the URL.
Lines 67-68 Next, we can iterate over the indices and store the start position value as a convenient variable.
Line 69: Then we will call the getLinkUrl()
method. This method takes an index position as an argument. It will return either a URL, if one is found, or null.
Line 71: We can now check if the link is found and if there is store it with its associated text in our string.
Lines 73-74: Before we add the URL to our text we need to keep a record of the previous length of our string so we can later work out the offset. We will also need to check if the current index is the penultimate index with a boolean marker.
Lines 75-77: To get the link text we will need to determine where the text end is located (we already have its starting position). The text indices end range will be the next value in the indices array or, if we are at the penultimate index, the total length of the string.
Because we are adding to the text with our URL we will also need to add our offset values to it.
Lines 79-83: Next, we will add our markdown wrapping characters and the URL to the current string. We will do this with the help of the JavaScript substring()
method. This will return a result that looks a little like this:
[my website]{https://yagisanatode.com}
Lines 85-89: Now that we have our text with our link we can add it to the remainder of the extracted text. If we are at the end of the text, we just need to add the new link and text, but if we are somewhere in the middle, we will also need to append the remaining text that we will have to check for links during the next iteration. However first, we need to update the offset with the the newly text string minus the original string length.
Line 93: With the text and any links added, we can now add the text to our main text
string variable.
Lines 96-98: Finally, if we are at the end of a paragraph, we will also need to generate a new line.
Date Element

This is a fairly new method added by the Google Document App dev team and the first of our Smart Chip elements.
Line 105: As we discovered, we can’t extract Date chips using the asText()
element interface method. However, when we do find a date we can call the asDate()
class.
Lines 106-108: This class can retrieve a number of helpful date details:
getDisplayText()
: The currently displayed or rendered date text.getTimeStamp()
: The actual date-time stamp containing the timezone.getLocale()
: The format type that is based on the selected locale.
For the purpose of this tutorial, I have added in all the available options here. Feel free to remove what you won’t need for your own project.
Line 109: Finally we append our text
variable with each of these returned date results. Again, change this string to display what you need, but the tutorial version will return something like this:
“DATE[Display: 27 Dec 2024, Timestamp: Fri Dec 27 2024 19:00:00 GMT+0700 (Indochina Time), Locale: en-GB]”
Rich Link Element

The rich link element represents a link to a Google Drive file or a YouTube video, or some other non-specific Google resource that the developers were vague about (Bad! Bad documentation writers. Big smacks!).
Lines 116-119: Similar to the date smart chip method, the rich link method can retrieve a few useful bits of text, once we call the asRichLink()
class, like:
getTitle()
: The title of the file or video.getUrl()
: The URL link to the resource.getMimeType()
: The file type of the resource.
Line 120: I have added all three text options here for this tutorial. Feel free to remove what you don’t need and display them within the string how you please.
Here is the resulting text based on the example in the image at the start of this chapter:
“(https://docs.google.com/spreadsheets/u/0/d/1lfb1u–sheetlink–8iLg2TEQ/edit) type: application/vnd.google-apps.spreadsheet”
Person Element

Our final available chip is the Person chip. As you can see from the image above this chip displays information about the selected person including their account name email and avatar image along with some actions like sending emails, starting a Google Chat or Google Meet or scheduling something on their calendar.
Lines 123-132: The asPerson()
class allows us to access the following text data:
getEmail()
: The person’s email address.getName()
: The person’s account name.
You can see in our example, I’ve added the result as a markdown link format:
[${name}]{${email}}
Resulting in this for our current example:
[tester account]{tester@yagisanatode.com}
Table of Contents Element
We can also extract the table of contents data from the Google Doc with the .asTableOfContents()
class.
Lines 134-141: Here, we are just going to extract the text with the getText()
method. While we could retrieve the internal page links too this would not be useful for our body text display.
Unsupported Element
Lines 143-147: Finally, if we can’t access a chip like a drop-down result, variable, voting chip result, or some other chip that we should probably be able to access but there is currently no API class for. Then we can indicate this by identifying the element range as an unsupported element with the DocumentApp.ElementType.UNSUPPORTED enumerator.
I had hoped that the unsupportedElement()
class had a type ID or something at least to explain what the element was, but alas it didn’t so there was no point calling the class methods. Instead, we just add a warning text explaining that the current element range is not supported.
If you have found the tutorial helpful, why not shout me a coffee ☕? I'd really appreciate it.
Why no Equation Element?
I guess, similar to the unsupported element we could have added the equation warning text too or just leave that section out completely as I have done.
For equations with special characters, there is no way to extract the character data, unfortunately. As such, this was left out. I did, however, keep the formula in there to illustrate that it could not be read.
A little hope for the future (Code Snippets and Variables)
I noticed that the Google Docs developers have already included the enumerators for CODE_SNIPPET
and VARIABLE
in their getType()
method. So perhaps, in the near future, I can update this page.
The Text Results
This time our text string results look like this:

As you can see we now have the smart chip data displayed for those chip classes available to us in Document App. Both regular paragraphs and table paragraphs are displayed. Further, the standard links are now showing the URLs in their correct locations. The table of contents is also displayed at the top of the page. We can even see emojis displaying correctly.
This is probably the most complete approach currently available, but what if we want to extract the text from other chips?
(3) The OCR approach to text extraction
Frankly, this approach just feels weird in the context of the Google Workspace ecosystem. In this approach we:
- Convert the document data to a blob
- This is then temporarily converted to a PDF using OCR (Optical Character Recognition)
- This then is fed to a fresh Google Doc file.
- The body text is then read again using
getBody().getText()
. - Once read the Google Doc file is deleted.
A very roundabout way of doing things.
The benefit of this approach is that we can now read ‘most’ of the text in the Chip that is currently on display.
The Code
As you can see below the code is much easier. And indeed, I borrowed some of this code from a previous tutorial of mind that you can explore here:
Before running the code you will need to add Google Drive Advanced Service Version 3. Click Services > select Drive API > check if Version 3 is selected > Click Add.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
/** * Extracts Google Text From Blob data using * The Drive API v3 advanced service * * @see [Get a Google Docs Body Text with Apps] * {@link https://yagisanaode.com/get-a-google-docs-body-text-with-apps/} * @author Scott <scott@yagisanatode.com> * @license MIT License */ function extractTextFromBlob() { // Retrieve the current Google Doc File const blob = DocumentApp.getActiveDocument().getBlob() const resource = { mimeType: MimeType.GOOGLE_DOCS }; cosnt options = { ocrLanguage: "en", }; const file = Drive.Files.create(resource, blob, options) console.log(file) // Extract Text from the new Google Doc file const doc = DocumentApp.openById(file.id); const text = doc.getBody().getText(); console.log(text) Drive.Files.remove(doc.getId()) } |
Code Breakdown
Line 12: First, we extract the blob data from the active document. Blob data is a binary data type stored in memory chunks called ‘sbspaces’.
Lines 14-24: Next, we need to create a new Google Drive file using Drive API advanced service. We will do this with the file.create
method that takes a resource object, which will be our MIME type of Google Doc, and an options object, which will be the OCR language set to English.
What’s occurring here is that by setting the type to a Google Doc and declaring the OCR language, Google Drive API will look at the blob data as an image (typically a PDF) and try and read the text visually before putting the read data into the file. This means that the code for the smart cards will not be read, rather the text of the cards will be read, well, in most cases.
Lines 27-28: Here we grab the newly created Google Doc and read the body text. You could add a return statement with the text variable if you need to draw this into another function.
Line 32: Once we have used and abused our new Google Doc, we trash it like yesterday’s newspaper using the file.remove
method.
The text results

The OCR approach does a pretty solid job of extracting what is being displayed on the page. It will even add ordered and unordered list symbols. It won’t however, display emojis as indicated by this fun little symbol (�) and it won’t capture many of the formula symbols but will capture a few different ones to the basic approach.
So what is our next option? Yeap, time to drown in Google JSON files.
Shhh… just let go…
(4) Retrieve Google Doc Body Text With Doc API Advanced Service
As my last resort, I braced myself to iterate over the bottomless pit of JSON response data spewed forth by the Docs REST API 🤢🤮.
Historically, the advanced service APIs are updated with more regularity over the Apps Script-only services. I guess this is because they can be retrieved by other programming languages and according to the snobs, Apps Script isn’t cool, “it’s just a low-code scripting language”.
Here we are limited to accessing the following smart chips:
- Person
- Rich link
However, we can also retrieve code snippets here as they are considered paragraphs in the Google API. So that’s a win.
The bummer is that there is no property to indicate that another chip type is present like we could with the Document App Unsupported Element so our displayed retrieved text will exclude the clues that something might be missing.
Let’s take a look!
Note that in the video tutorial of this portion of our investigation (when it comes out), we examine the response object in more detail and go over how I found the properties I needed to extract the text. It’s worth a watch.
If you have found the tutorial helpful, why not shout me a coffee ☕? I'd really appreciate it.
The Code
The Docs API uses ‘tabs’ to refer to elements here which took a little getting used to.
First, ensure that you have added the Docs API advanced service to your Apps Script project:
- Click Services from the sidebar.
- Select Google Docs API.
- Select Add.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
/** * A testing function */ function test_getBodyTextAsString() { const docId = "----ADD-YOUR-FILE-ID-HERE--" getBodyTextAsString_chips(docId) } /** * Gets the body text as a string including some limited chip * types like: * - Person * - Rich Links (Youtube or Google Drive File links) * @param {String} docId * @Returns {String} the body text. */ function getBodyTextAsString_chips(docId) { // Retrieve the JSON of selected fields from the API. /** * Creates an iterative list list of table and paragraph fields * **/ function matryoshkaDoll_tablesAndParas() { let text = "" let endBraces = "" for (let i = 0; i < 6; i++) { text += "table.tableRows(tableCells.content(paragraph.elements(textRun.content,textRun.textStyle.link,person,richLink)," endBraces += "))" } text += "table" + endBraces console.log(text) return text } const document = Docs.Documents.get(docId, { "fields": `title,tabs(documentTab.body.content(paragraph.elements(textRun.content,textRun.textStyle.link,person,richLink),${matryoshkaDoll_tablesAndParas()}))`, "includeTabsContent": true }); // Retrieve the body content only from the main JSON. const body = document.tabs.find(t => t.hasOwnProperty("documentTab")).documentTab.body.content console.log(body) let text = "" /** * A callback function used to iterate over the body object and * extract text from a textRun, person or richLink * @param {Object} a paragraph or table property */ function paragraphOrTable(tab) { if (tab.paragraph) { // console.log("PARA:",tab.paragraph) const paraEls = tab.paragraph.elements console.log("PARA:", paraEls) paraEls.forEach(el => { if (el.hasOwnProperty("person")) { const props = el.person.personProperties const name = props.name const email = props.email const mdLink = `[${name}](mailto:${email})` console.log("Person Chip:", mdLink) text += mdLink } else if (el.hasOwnProperty("richLink")) { const props = el.richLink.richLinkProperties const title = props.title const uri = props.uri const mimeType = props.mimeType const mdLink = `[${title}](${uri}) mimeType: ${mimeType}` console.log("RichLink Chip:", mdLink) text += mdLink } else if (el.hasOwnProperty("textRun")) { console.log(el.textRun.content) // Check if has link const link = el.textRun.textStyle?.link?.url console.log("!!LINK:", link, el.textRun.textStyle) const content = el.textRun.content if(link !== undefined){ text+= `[${content}](${link})` }else{ text+= content } } }) } else if (tab.table) { const tableRows = tab.table.tableRows console.log("TABLE:", tableRows) tableRows.forEach(row => { console.log("ROWS:", row.tableCells) const tableCells = row.tableCells tableCells.forEach(cell => { console.log("CELL:", cell.content) cell.content.forEach(t => { paragraphOrTable(t) }) }) }) } } body.forEach(tab => { paragraphOrTable(tab) }) console.log(text) return text } |
Lines 1-8: We will use the test function to run our script in this example.
The getBodyTextAsString_chips() function
Line 20: The function takes a Google Doc ID as a function and returns the found body text as a string.
I’ve added internal private methods into this function for modularity, but you could also abstract them into their own functions and save them to their own page in your script.
Retrieve the JSON from the Google Docs REST API
Lines 28-48
To retrieve the body object data from the Docs API, we will use the Document.get
method. Left unchecked, this will generate an unwieldy maze of JSON paths with a matching large file size. We don’t need that much bloat.
Fortunately, there are only a few properties that we need and we can use field masks, to request only what we need. This is done with the fields
property. Each available property can be found in the document resource. From there you click on the object link to navigate to the nested properties of each item. We then put each of our required fields in a string.
For us, we need the Google Doc title, and then in the body of the Doc we want to get any text content or links found in paragraphs and tables along with the text and links from the person and rich link chip data.
So this object:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
{ tabs:[ { documentTab:{ body:{ content:[ { paragraph:{ elements:[ { textRun:{ content, textStyle.link }, person, richlink } ] }, table:{ tableRows:[ { tableCells:{ content:[ { paragraph: "!!SEE PARAGRAPH!!" }, { table: "!!SEE TABLE!!" }, person, richLink ] } } ] } } ] } } } ] } |
Can be represented like this:
1 |
tabs(documentTab.body.content(paragraph.elements(textRun.content,textRun.style.link,person,richLink),table.tableRows(tableCells.content(paragraph.elements(textRun.content,textRun.style.link,person,richLink), ... more tables and paragraphs ...)) |
As you can see curved braces represent arrays and dots represent child properties.
Tables can be nested within tables. So we will need to create an iterator callback method to add more nested fields or we will get more bloat in our response object. My guess is that people won’t nest more than 6 tables within one table so let’s generate our table/paragraph field 6 times using our matryoshkaDoll_tablesAndParas()
method.
I think there are now two ways that we can look at the body data. One is the old way by iterating over the document elements and the other is via Tabs. It looks like Google wants to convert Google Docs into a kind of website-looking beast where you can navigate document tabs in the same way you have tabs in your Google Sheets. Kinda cool, I guess.
Anyway, we will need to take this approach to save ourselves a rewrite from any future deprecation. Here we will add the includeTabsContent
property to our request payload.
Outside of the world of the tutorial, it is probably a good idea to wrap this in a try-catch statement to handle for any request failures.
Get the body property data only
Line 54: Our first means to simplify our navigation of the JSON response data is to extract the body tag.
First, traverse the property tree to the 'tabs'
array. Then we can use the JavaScript find method to find the tab named "documentTab"
. From that, we will step to the body
array property.
The Paragraph or body callback method
Line 58: First we set a text
variable to collect our body text.
Lines 118-120: Jumping to the bottom of our function we now iterate over the body
property. The body
contains a tab property for each element type in the document like table and paragraphs.
We now call the paragraphOrTable()
callback method feeding it the current tab.
Is the tab a Paragraph or a Table?
On each iteration, we need to check if the current tab is a paragraph (Line 67) or table (Line 101).
Paragraphs
Each paragraph can contain one of three elements that have text we can retrieve:
- person (lines 74-80): The person property contains both the name of the person (
props.name
) and their email (props.email
). We will add this to a markdown format to display in our body text. - richLink (lines 81-88): Similarly, we can extract the title (
props.title
) and uri (props.uri
) from the rich link properties along with the MIME file type (props.mimeTyle
). Again, we will add this in a markdown format. - textRun (lines 89-99): Text runs are defined as segments of text containing their own unique set of attributes like formatting or links. Here we will check if the current text has a link. If it does we will add it using the link markdown format. Otherwise, we will display the text.
Table
Lines 103-118
Tables can contain paragraphs or other tables, which may, indeed contain tables. Tables are the veritable world turtle of Google Docs.
This means that if we find one we need to:
- Iterate over each row array.
- Iterate over each cell array.
- Send the cell tab property back to
paragraphOrTable()
function to check if it is either a paragraph tab or another table, “What the hey?” (Terry Pratchett).
The Text Results

As you can see, we have managed to successfully extract our text along with any related links. We have also extracted any people chips and rich link chip data. We have no indication from the body object where the other chips are so we cannot even mark them as missing. On the bright side, the code block is printed out and is conveniently marked by these icons (❎).
Conclusion
As you can see, there is no real single perfect approach to extracting the text from a Google Doc, only a ‘least bad’ option that may better fit your needs.

I would really love to hear which approach you decided to use in your own project and how you are using it. I am sure other readers would gain some inspiration from this too. Go ahead and add a comment below.
Oh, and let me know if there have been any updates to the APIs that I have missed. I keep these posts up to date.
Need help with Google Workspace development?
Go something to solve bigger than Chat GPT?
I can help you with all of your Google Workspace development needs, from custom app development to integrations and security. I have a proven track record of success in helping businesses of all sizes get the most out of Google Workspace.
Schedule a free consultation today to discuss your needs and get started or learn more about our services here.
~ Yagi 🐐
One thought on “Get a Google Docs Body Text with Apps Script”