How to get the most frequently appearing words in a column in Google Sheets

I love it when a Google Sheets blog post generates more curiosity and further questions. Apart from the almost endless source of new material to write about, it is great to see peoples curiosity for a topic deepen.

I wish I could reward you all for this curiosity every time, but sadly I seem to run out of hours of the day.

Fortunately, I have found a little time today to put more details into a question that came from Sue in a recent post Get the Occurrence of a Value in a Range as a Percentage in Google Sheets. She asked:

How would I modify this [the solution in the post] to get the value with the most occurrences?

I hastily wrote a solution but sadly ran out of time to provide any context to the solution. Further, on review of my solution, I discovered two much better approaches. This post answers Sue’s question more fully.

Sue, thanks for the inspiration for this article.

via GIPHY

… in a complete platonic Goat-human interactive way, of course. 🐐

In this tutorial, I will provide two solutions. Both are interesting ways to solve the problem and provide some great insight into how to use certain functions in Google Sheets. You can choose which one makes the most sense for you in your project.

You can use the table of contents below to jump straight to the solution or follow along to learn some cool stuff about the QUERY, FILTER, INDEX, COUNTIF, MAX, ARRAYFORMULA and UNIQUE functions in Google Sheets.

If you are playing along, you can get the sheet with just the data here for you to work on.

Data Sheet. 

Just go to File > Make a copy. So you have your own copy of the sheet to edit.

The problem

Our task is to display the most frequently occurring word or words in a list of words.

Take this list of mystery words, for example.

List of letters to determine most frequently occurring in Google Sheets v2

 

If we were to do a manual count we would discover that the value “puzzling” was the most frequently occurring as it occurs 7 times.

Incidentally, if it was just a list of numbers we could get the most commonly occurring number simply with the MAX function.

So how do we solve this with a Google Sheets formula?

The Solution

As we mentioned earlier, we have two interesting solutions this problem. One we solve primarily using FILTER around a bunch of other functions and one around using the QUERY function on a lovingly hand-crafted array.

The FILTER solution

most frequently occurring word in a column in Google Sheets FILTER solution
Click to Expand!

Here you replace A2:A with the column that you want to find the most frequently occurring cell item in. This can be a single letter or word or phrase.

Simply copy this formula into your Google Sheet and change the range.

To quickly change the range you can select the cell with your new formula in it and hit ctrl + h. The find and replace menu will appear. Let’s say you want to change your range to H4:H200. Your Find and replace menu should look like this:

find and replace a range in Google Sheets
Click to Expand!

You are all done.

But I am sure you are just itching to learn how it all works so you can tweak and change it to other projects. So read on!

Formula Breakdown.

Get a frequency count for each value

Our first task is to get a count of all the times each word appears in each cell. Essentially we want the total count of each word to appear next to the work.

To do this we use the COUNTIF function. This function takes a range as the first argument and a criterion to check as the second argument. If the criterion is true, it will count the value in the cell.

Now, if you have used COUNTIF before you may not have seen another range being added as a criterion. We can apply a criterion as a range because the FILTER function (more on FILTER later) allows for array conditions or ranges of cells to be calculated.

Essentially inside FILTER, this COUNTIF will loop through each cell and conduct a count if the current value in column A matches any other values in the entire range.

Most commonly occurring text in a cell with Google Sheets_Get count of each cell
Click to Expand!

We can now clearly see that “puzzling” is the most frequently occurring word, followed by “cryptic”.

Note! If you are playing along, you might have noticed the inclusion of ARRAYFORMULA in this formula. I’ve used this here to help simulate how the COUNTIF function actually works in the FILTER function when it iterates through the cells in the range. 

If you want to try it out yourself, your formula should look like this: 

Find the max frequency as a value


Next, we need to find the max value of our newly created range of count numbers for each word.

We can do this with the MAX function which conveniently takes a range of values or numbers and return the highest number in that range.

Here we can apply our COUNTIF function we created above so MAX can find the highest value. On ya! Max!

Again, if you are playing along, then you will need to apply the ARRAYFORMUAL function to COUNTIF for it to work on its own in order to simulate it in the FILTER function.

Your result should look like this:

Most commonly occurring text in a cell with Google Sheets_Find the max frequency as a value
Click to Expand!

Filter only those most frequent words in the list

Now we can use these formulas in a FILTER function to only show the word or words that appear the most in your Google Sheets range.

The FILTER function takes a minimum of two arguments:

  1. The range to be displayed.
  2. The conditions that are used to filter the data. The condition starts with a range and then a criterion.

In our FILTER formula we want to display A2:A. So that goes in the first argument.

Then in the next argument, we want to compare the count of each cell against the max value.

Our FILTER will then look like this:

Which would result in displaying all the times that the most frequently occurring cell, well… occurs.

Click to Expand!
What if there is more than one Frequently occurring cell value?

I’m glad you asked.

The great part of this formula is that it will take this into account and include any cell value that is also the most common in the range.

If you have a good memory, you might have noticed that the word, “cryptic” occurs 6 times. Let’s add another “cryptic” to cell A20 and see what happens.

Most commonly occurring text in a cell with Google Sheets_Filter only those most frequent words in the list more than one word
Click to expand!

Cool, hey?

Only show each most frequent word once

We don’t want to see repetitions of the words. We get it. They are the most common. Let’s do something about that with the UNIQUE function.

UNIQUE takes a range and displays any value that occurs in the range only once.

It is pretty cool in that it doesn’t discriminate if it is a number, date or text. It is also case sensitive.

Our final formula would now look like this (Keeping in mind that we have added “cryptic” now as a most frequent value).

Most commonly occurring text in a cell with Google Sheets_Only grabbing the values once
Click to expand!

Bonus! Sorting your results

Having come this far I have now realised that my example is woefully ill-prepared to present the sorting of our results alphabetically because it is, well… already that way.

So for the sake of this short example, let’s go a little wild and sort from Z to A.

We can do this by wrapping our formula in the SORT function.

Sort takes at least 3 functions:

  1. The range to sort – This will be the result of our formula.
  2. The column to sort by – There is only one column for us. So this will be ‘1’.
  3. Whether ascending or descending – Let’s go Z to A here.

Our formula will now look like this:

Which will look like this spectacularly topsy-turvy hellscape:

Most commonly occurring text in a cell with Google Sheets_Sort
Click to Expand!

Quickly now! Change the sort value to TRUE and we will speak no more of it.

The QUERY solution

Most commonly occurring text in a cell with Google Sheets with QUERY
Click to Expand!

Here you replace the A2:A with the range you want to run the formula. You will need to do this each of the three times this range occurs in the formula.

For a shortcut to do this check out the example in The FILTER solution chapter.

This approach has the benefit of being able to quickly display headers or even the count for each max value with simple changes to the query. More on this in the bonus material at the end.

Now I am sure that you are barely containing yourself to find out how this formula works so let’s get cracking.

Formula Breakdown.

This formula uses the QUERY Google Sheets function to show the most commonly occurring cell item in a range. To do this we build our own range based on the original A1:A range and a range generated from the total count of each cell value in the A1:A range as our second column.

We then dynamically insert the largest number of occurrences from this second column to only show those values that are most frequently used, before using the UNIQUE function to shot those values only once.

Let’s break it down into steps…

Get a frequency count for each value.

Our first task is to get the number of times each cell repeats in our list.

We did the same thing here as we did in the FILTER versions.

You can click on the link for the previous explanation.

Make an array containing both the mystery word list and the total count for each word

Next, we need to create a new array containing our original list of mystery words and their corresponding count. In Google Sheets, we can create an array or range of data with curly braces ({}). We create two columns by putting our data between a comma (,).

Here is what our array is going to look like:

Which will produce this result in your Google Sheet:

most frequently occurring word in a column in Google Sheets QUERY solution_get the array
Click to Expand!

Find the max frequency as a value

Just like in our FILTER version we need to get the highest value from our list frequency count for each cell value in the list.

Check out the explanation in the FILTER portion of this tutorial.

QUERY the newly built array where column two is equal to 7

We can finally get into the query. The QUERY function takes two mandatory arguments:

  1. The range to query. This is the two-column array we created.
  2. The query. This will be the query we will write below.

QUERY uses the Google Visualisation API Query Language. You can find some good use-case examples in the main docs.

Being a new language to learn it can take a little while to get the hang of, but one of the beauties of using the query language is that is pretty close to plain English to read.

So, let’s dive in.

Your first step is to ask what you want from the data. For us, we want to display the words in our first column where the total frequency count of the word is the highest.

In query language, our statement would look like this:

Here we select all cells in column 1  where column 2 (our frequency sum col) is equal to 7.

Why 7? This is just a placeholder for our dynamic max count value we created earlier. We know that our current max count is 7 for our most frequently occurring word. We will update this in the next step. For now, it is just easier to read this way.

If you have tried out QUERY before you might be scratching your head over the ‘Col’ word instead of say, a column letter like A, B, C etc. We use the ‘Col#’ syntax when we are not referencing a ‘physical’ range on the sheet but a generated range like our array.

Let’s put our QUERY together now:

Which will result in:

most frequently occurring word in a column in Google Sheets QUERY solution_the basic query
Click to Expand!

Adding the max frequency formula inside the query

One of the cool things about Google Sheets is that we can add dynamic values to strings of text in a cell. To insert a value or formula result in a string of text we use the ampersand character “&” .

Let’s go ahead and add our dynamic max frequency formula to our query.

most frequently occurring word in a column in Google Sheets QUERY solution_the basic query with dynamic input
Click to Expand!

Complete the formula by only selecting UNIQUE values

Finally, we can use the UNIQUE function to only display the words once.

most frequently occurring word in a column in Google Sheets QUERY solution_full formula result
Click to Expand!

BONUS! Expanding QUERY views for kicks

Another benefit of using QUERY is that we can quickly change how the data is displayed by making a few changes to the formula.

First off, for all of these examples, we will need to change our ranges to A1:A instead of A2:A. Go ahead and change this now (See if you can do it with find and replace).

All done? Let’s make some mods to our QUERY.

Add a header

The QUERY function has an optional third parameter that allows you to have a header. We can simply add one to our formula and we now have a header – keeping in mind that we have adjusted all of our ranges to start at A1, not A2.

And the result is:

most frequently occurring word in a column in Google Sheets QUERY solution_bonus header
Click to Expand!
Add the header and frequency count to the formula

Now that we have our headers in place, let’s add the frequency column so that we know how many times these max values have appeared.

All that we need to do here is to add Col2 to our SELECT stage. Don’t forget to separate the columns by a comma.

Which will now look like this:

most frequently occurring word in a column in Google Sheets QUERY solution_bonus header and col2
Click to Expand!

Those headers don’t look great. Especially the frequency column. Let’s do something about that.

Add the header and frequency count to the formula with header LABELS

We can use the query language LABEL clause to create custom labels for any column.

Let’s change column one’s header to read ‘Top Mystery Word/s’ and column two to read ‘Frequency’. To do this we add the LABEL clause followed by the column and then in single quotation ('   ') we add our new label. We separate each label group with a comma.

Our formula will look like this now.

And our end result is this:

most frequently occurring word in a column in Google Sheets QUERY solution_bonus header and col2 with labels
Click to Expand!

Isn’t it pretty?

Conclusion

If you would like the complete Google Sheet with all the examples you can get it here:

Get the most common words in a column – complete examples.googleSheet

If you have found the tutorial helpful, why not shout me a coffee ☕? I'd really appreciate it.

I really love hearing how you use these projects or build on them in the comments below.

Did you enjoy the tutorial? Want to upskill and get a solid step-by-step course to become a pro at Google Sheets? Check out my course, Google Sheets: Learn the Essentials with Three Detailed Projects. Sign up today.

 
Google Sheets Shortcuts Mousepad
Google Sheets Shortcuts Mousepad
Want a fast track to boost your Spreadsheet efficiency? Grab one of these handy Google Sheets Shortcuts mousepads that I created from my store.
Google Sheets Shortcuts Mousepad Gel
Google Sheets Shortcuts Mousepad Ge

 

~Yagi

Get the Occurrence of a Value in a Range as a Percentage in Google Sheets (Updated Feb 2022)

Finding out how frequently something occurs compared to the rest of the data in a range is a pretty common task in Google Sheets. You will use it to find the frequency of grades in a grade book, the number of sales of an item compared to your total sales or the frequency of choices of a particular item in a survey.

In this tutorial, we will cover how to find the occurrence of a value in a range as a percentage in Google Sheets. I then show you how to do this quickly for all your values in a range as a set of formulas and as a pivot table.

Let’s get started…

The Example

In our example, we have a survey item asking quite humbly and without any leading statements 🐐😇:

How awesome are goats?

  • A: Great
  • B: Good
  • C: Okay
  • D: Meh
  • E: Jealousy is a curse

Then in column A, we have a list of respondents and, in column B, we have a list of responses or choices.

Example data for list of occurrences as a percentage in Google Sheets V2
Click to expand!

Here is a link to the raw Google Sheet data I am using in the example so that you can play along.

Get Occurrences as a Percentage – Worksheet

Just go to File > Make a copy. Then you have your own version to edit.

Get the Occurrence as a Percentage of a Single Value

To get the number of times a value occurs as a percentage of all the values in a list we will need to:

  1. Get the count of the times the value appears in the range.
  2. Get the total count of all the values in the range.
  3. Divide the value count by the total count.
  4. Change the result to be displayed as a percentage.

In our example, we want to humbly look for all the A’sbecause it is pretty clear that the survey is a shallow ruse to fish for compliments.

Our range of choices is from B3:B20.

Let’s break our formula down into manageable chunks.

Get the total count of all A’s

To get the total count of all the A’s in our range we can use the COUNTIF function. This function allows us to get the count of something based on a value.

It doesn’t have to be just A though. You could use this function to get the count of, say, the number of students who got higher than a grade in a particular test.

It’s a really useful function.

COUNTIF takes two arguments:

  1. The range you want to find your values in.
  2. The criterion you want to find to count.

=COUNTIF(Range to search, Criterion)

In our example, the range is column B and the criterion is A.

=COUNTIF(B3:B20,“A”)

For our example, this will return a value of 7.

Get the total count of all values in the range

Next, we need to get the total count of all the values in a range. We can’t use the traditional COUNT function here because COUNT only deals with numbers. Instead, we need to use COUNTA that counts any value that isn’t blank.

COUNTA can take a single range or multiple ranges. However, we only want to count our range of choices. Here is what it would look like:

=COUNTA(B3:B20)

Which will return a value of 18.

Dividing the value count by the range count

Next, we need to divide the value count by the range count.

=COUNTIF(B3:B20,“A”)/COUNTA(B3:B20)

This will return a decimal value: 0.3888888889.

We can change this in our submenu by clicking the Percentage format symbol:

Percentage Menu Item Google Sheets

This will result in 38.89%. Much lower than what is conscionably right and true. 🐐

Check out the short video demonstration:

Get the Percentage of Occurrences of Each Value in the Range with Formulas

Get the percentage of one value occurring in our range is fine, but often we need to get the percentage of all the values in a range.

I try and keep my data as dynamic as possible. This means that I want to create formulas that can update easily if changes are made to the data. So that the next time I have to do a similar task, I already have a working template ready to copy and paste into.

In this example, I will show you how to create a summary chart of all the values as a percentage.  Here is what our end result will look like:

percentage of occurrence of all values in a range in Google Sheets
Click to Expand!

If you are playing along, go to the next Google Sheets tab in the sample sheet named Range (formulas). 

Titles and subtitles

First, in column F3, paste in the header:  Occurrence of Choices as a percentage of all data. Format to taste.

Then in F4 and G4 add the sub-headers: Choice and Choice as % respectively.

Get a dynamic list of values

We could just list our values by hand but that doesn’t make a very good template. We can get a unique list of all of our choices by using the UNIQUE function in Google Sheets. UNIQUE takes a range of values.

=UNIQUE(B3:B20)

This won’t sort your choices alphabetically though. So let’s put our UNIQUE function inside a SORT function. The SORT function first takes the range you want to sort and then can take a number of optional arguments that we won’t use here.

Here is what our example will look like:

=SORT(UNIQUE(B3:B20))

SORT UNIQUE Google Sheets
Click to Expand!

Now we have a dynamic list of choices that will change automatically if our data changes. Sweet!

Getting the percentage of occurrences for each value

Now we can use our original formula to get the frequency of occurrence of each value in our Google Sheet. This time around I have made a few minor changes to the formula that you might miss at first glance:

=COUNTIF($B$3:$B$20,F5)/COUNTA($B$3:$B$20)

First of all, I have referenced the corresponding value cell on the left. In our first row, this is F5 or the value  “A”. Next, I have added dollar signs to my range to lock it in or make it absolute. Otherwise, the cell range would have moved down as I dragged my formula down each cell in the column causing and error in our count.

You can find out more about absolute and relative references in my post here:

How do I lock certain cells in a formula in Google Sheets?

Once you have dragged down your formula to accommodate all your values you should get something that looks a little like this:

Get the occurrences of each value in a range as a percentage in Google Sheets
Click to Expand!

Finishing up

Finally, we can add the total to the bottom, just in case someone doesn’t know that everything should equal 100%. Oh, and to check that we didn’t mess up.

You can simply do this with the SUM function. That will take the range or all the calculated percentages. If you are lucky, when you put in the equals sign, Google Sheets will suggest the calculation for you.

=SUM(G5:G9)

Add in the Total as a side header and get it to stand out to make it look pretty.

Sum of all occurrences of values as percentage in Google Sheets
Click to Expand!

Next, we will look at another way of achieving this that you might find easier or quicker.

Get the Percentage of Occurrences of Each Value in the Range with a Pivot Table

In this example, we are going to grab our occurrence percentages for each value using a pivot table. This can be a much quicker approach. Plus it has the added benefit of adjusting to the remaining values in a filter or a slicer.

Let’s get cracking!

If you are playing along, go to the next sheet tab named, Range (Pivot Table).

Pivot tables in Google Sheets allow you to analyse and transform data into interesting charts that allow you to quickly summarise your data.

Creating the pivot table

First, we will select our column of Choices. This time around, we will include the header:

B2:B20

Next, access the Pivot table by going to Insert > Pivot Tables. A popup window will appear containing:

  1. The data range you selected.
  2. A choice of where to insert your data.
    1. Select > Existing Sheet.
    2. A line with an example will appear.
    3. Select the grid symbol on the far right.
Create a pivot table first popup window for frequency of values as a percentage in Google Sheets
Click to Expand!
    1. Another pop-up window will appear. Here you can choose what cell you want to start your Pivot Table.
    2. Select cell E3  and then hit, Okay. ( You can  drag the pop-up to a different location if it is in the way)
Selcte a location for a Pivot Table for a list of percentage of occurrences in Google Sheets
Click to Expand!
  1. Back in the original popup and click Okay. 

Creating your pivot table data

Once you hit okay, a Pivot Table editor in the sidebar would have appeared on the right. And a small template would have appeared starting in your selected cell.

Pivot Table Editor for frequency by percentage in Google Sheets
Click to Expand!

Take a look at the Pivot Table editor. Look at the Rows item, select Add and then select Choice.

You will get a little box with your Choice data.

Next, move down to Values and select Add. Again, select Choice. If you are lucky, Google Sheets would have added a count of all the values in the range. If not select the Summarized by dropdown and choose COUNTA.

At this stage, things should look like this:

Pivot Table of count of choices in Google Sheets
Click to Expand!

We’re almost there.

Next, under Values, select the Show as dropdown and select % of column. 

Bang! We did it! Go ahead and close the Pivot Table editor.

Now, to tidy up let’s change that verbose subheader with the COUNTA of blah…blah…

Click on cell F3, and enter “Choice as %”.

This is what it should look like now?

Pivot Table of Complete percentages of choices in Google Sheets
Click to Expand!

That’s it you are all done. The great news is that the pivot table will change as you change your data too.

Conclusion

We’ve covered how to get the frequency of a value occurring in a range by creating our custom formula. Then we have looked at how to make a dynamic summary of all the values as a percentage using formulas first and then as a pivot table.

So which one did you like the most the pivot table or the formulas to build the summary? Let me know in the comments below. I’d be interested to hear your thoughts.

Did you enjoy the tutorial? Want to upskill and get a solid step-by-step course to become a pro at Google Sheets? Check out my course, Google Sheets: Learn the Essentials with Three Detailed Projects. Sign up today.

 
Google Sheets Shortcuts Mousepad
Google Sheets Shortcuts Mousepad
Want a fast track to boost your Spreadsheet efficiency? Grab one of these handy Google Sheets Shortcuts mousepads that I created from my store.
Google Sheets Shortcuts Mousepad Gel
Google Sheets Shortcuts Mousepad Ge

~Yagi

Google Sheets: Counting and Filtering Rows where Cells Contain Particular Values.

Google Sheets: COUNTIF, ARRAYFOMULA, FILTER, REGEXMATCH, REGULAR EXPRESSIONS (RE2)

Okay, wait! Stop!…

…I know the byline contained regular expressions, but I promise you I won’t just leave you with the formula for you to figure out your own jumbled mass of jibberish characters or inundate you with a list of commands and leave to attempt to piece it all together.

I promise to give you some clear examples with an explanation of each to you can apply it to your project.

Take a deep breath, mate, and let’s get cracking.

We are going to look at two related scenarios:

Scenario 1:

Imagine that you have a huge list of items. You have a hunch that some of the cells contain certain values of interest for you. You want to build a new list with only those values in them.

Imagine that you have a list of full names, and you want to use Google Sheets to create a new list of full names that only contain John.

Scenario 2:

You have that same huge list again, but this time you only want the total count of all the rows that contain certain values in each cell.

Imagining that list of full names again, you are now going to get a total count of all full names that contain John in it.

 

We’ll first go through how to create these formulas and then provide a number of clear examples on some common criteria for searching cell for values using REGEXMATCH and regular expressions.

Continue reading “Google Sheets: Counting and Filtering Rows where Cells Contain Particular Values.”

Google Sheets – Remove The Lowest Grade for Each Student on a Course

Google Sheets – MIN, FILTER, INDEX, MATCH, SUM, COUNTIF

In the region of the world that I work in, it is a pretty common occurrence for university courses to run weekly assessment. At the end of the course, all the weekly assessment is then added together minus the lowest piece of assessment.

For lecturers with small course sizes, this is a pretty simple task that you could simply eyeball if you have a small enough group, but what if your course runs into the thousands with half a dozen tests to choose from. Eyeballing is just not going to do it.

Recently I was asked to do the same thing for the program that I manage. Over an 8 week term, we run 7 assessment at the end of each week for our students.  My job was to find the lowest grade out of the 7 assessment and drop it, taking note of the assessment unit that I dropped for each student. 

I use Google Sheets for this purpose for its ease of use and sharability.

This is an example dataset of the 7 assessment (in this case, weekly tests) in Google Sheets. We need to remove the lowest grade from each student. As you can see not all students have their lowest grade in the same Unit test. 

List of 7 grades for each student - Google Sheets

Continue reading “Google Sheets – Remove The Lowest Grade for Each Student on a Course”