Google Sheets: COUNTIF, ARRAYFOMULA, FILTER, REGEXMATCH, REGULAR EXPRESSIONS (RE2)
Okay, wait! Stop!…
…I know the byline contained regular expressions, but I promise you I won’t just leave you with the formula for you to figure out your own jumbled mass of jibberish characters or inundate you with a list of commands and leave to attempt to piece it all together.
I promise to give you some clear examples with an explanation of each to you can apply it to your project.
Take a deep breath, mate, and let’s get cracking.
We are going to look at two related scenarios:
Scenario 1:
Imagine that you have a huge list of items. You have a hunch that some of the cells contain certain values of interest for you. You want to build a new list with only those values in them.
Imagine that you have a list of full names, and you want to use Google Sheets to create a new list of full names that only contain John.
Scenario 2:
You have that same huge list again, but this time you only want the total count of all the rows that contain certain values in each cell.
Imagining that list of full names again, you are now going to get a total count of all full names that contain John in it.
We’ll first go through how to create these formulas and then provide a number of clear examples on some common criteria for searching cell for values using REGEXMATCH and regular expressions.
Contents
- The Formulas
- The Regular Expressions – Examples using a list of songs
- Contains “Love” on its own or part of a word
- Contains “You” as its own word only
- Contains both “You” AND “Love” as their own words
- Contains any songs or singers and groups who have numbers in them
- Contains any day of the Week
- Text starts with the letter “I”
- Text starts with the word “I”
- Singer or Group name that contains a number in its name
- Singer or Group with only one name
- Conclusion
The Formulas
Before we get started, here is a basic example. I have a list of snack. I want to extract all snacks that contain the word ‘Chips’.
Getting a list of items that contain certain values in each cell
Here we are going to use Google Sheet’s FILTER and REGEXMATCH functions.
FILTER
FILTER, as the name suggests, filters out any extraneous data you don’t want that is based on certain rules. With FILTER, you can grab a single column or a range of many columns and stack rules or conditions for one or more columns to filter down your results.
=FILTER(selected range, condition 1, condition 2, condition #, ...)
In our example, we just want to filter a single column based on a match of our regular expression.
REGEXMATCH
REGEXMATCH allows us to search through a piece of text and see if a particular condition exists using the ancient mystical secret codex of the regular expression…woooh.
The REGEXMATCH function takes two parameters. First, it takes a text or cell link to a text. The second parameter is then, the regular expression you will use to search for in your text.
=REGEXMATCH(text or cell ref to text, regular expression)
putting it together
We’ll place our REGEXMATCH function in the first condition of our filter. The FILTER function allows us to look through each row of the range we have selected. This means that when we apply our REGEXMATCH inside FILTER, we can give REGEXMATCH a range to search for instead of a single cell. It will iterate through each cell in the range completing the match for each.
Combined, it will look a little like this:
=FILTER(selected range, REGEXMATCH(text cell range, regular expression))
In our snack example, we want to get a list of cells that only contain the word “Chips”, so our code would look a little like this:
=FILTER(A2:A6,REGEXMATCH(A2:A6,"\bChips\b") = TRUE)
For now, don’t worry too much about what the \b
thingy means. I’ll go into that later.
Your end result in Google Sheets should end up like this:
Counting a list of items that contain certain values in each cell
To achieve a count of how many cells in our column contain a certain value we will be using COUNTIF, ARRAYFORMULA and REGEXMATCH.
COUNTIF
COUNTIF takes two parameters. A range of data you want to count and the condition or rule you want to set to count.
=COUNTIF(range to count, condition)
For our example, our COUNTIF range would be the snack column and the condition would be our regular expression.
arrayformula
We can’t just plonk in the REGEXMATCH function for our COUNTIF range. If you recall, when we learnt how to create a list earlier in this tutorial, the REGEXMATCHs first parameter is a single cell or string of text. We could only change this to a range of cells when we put it inside FILTER.
This is why we use ARRAYFORMULA. The ARRAYFORMULA iterates its containing formulas over a selected range. So now, we can safely set the REGEXMATCH formula to a range and it will apply the function to each cell in the range.
It will look a little like this:
=ARRAYFORMULA(REGEXMATCH(text cell range, regular expression))
This will result in a list of TRUE or FALSE responses for each item in the range. The TRUE items being those that passed the REGEXMATCH condition.
pUTTING IT TOGETHER
Our COUNTIF function will take the REGEXMATCH function contained the ARRAYFORMULA as its first range parameter. COUNTIFs second condition parameter will be simply, TRUE.
=COUNTIF(ARRAYFORMULA(REGEXMATCH(text cell range, regular expression)),TRUE)
In our snack example, it will look a little like this on Google Sheets:
=COUNTIF(ARRAYFORMULA(REGEXMATCH(A2:A6,"\bChips\b")),TRUE)
Now that we can count and filter cells that contain certain values in Google Sheets, let’s take a look a 9 cool* regular expressions you can use to search for cell items. We’ll use our two newly constructed formulas as our examples.
*I thought if I said cool here it would be more engaging.
The Regular Expressions
Regular expressions are tricky. They are the Ood of the tech world – impossible to remember, but incredibly fashionable (That one is for the Doctor who fans).
There are a number of variants to regular expressions, but Google Sheets uses the Google RE2 syntax.
For our example, we will use a list of 1,000 songs. Our goal is to list and count any song on the list that contains certain parameters.
Let’s get cracking with the examples.
Contains “Love” on its own or part of a word
- Regular Expression:
"Love"
- List formula:
=FILTER(A:A,REGEXMATCH(A:A,"Love") = true)
- Count formula:
=COUNTIF(ARRAYFORMULA(REGEXMATCH(A:A,"Love")),TRUE)
We can search for “Love” in our list by simply adding the term, “Love” to the criteria of our REGEXMATCH function. Any text you put into your regular expression is case-sensitive unless you deem it otherwise.
By simply adding the text you want to search for as your regular expression it will search for the word “Love” anywhere in the text on its own as a single word or as part of a larger word like “Lovely”.
Contains “You” as its own word only
- Regular Expression:
"\bYou\b"
- List formula:
=FILTER(A:A,REGEXMATCH(A:A,"\bYou\b") = true)
- Count formula:
=COUNTIF(ARRAYFORMULA(REGEXMATCH(A:A,"\bYou\b")),TRUE)
Here, we only want to search our list of songs and find those that contain the word “You”. We don’t want “You” to be part of another word so we use the \b
character on either side of the word. \b
is an ASCII word boundary regular expression. It identifies the start or end of a word.
Let’s take a look a the results:
Contains both “You” AND “Love” as their own words.
- Regular Expression:
"\bYou\s.*?\bLove\b|\bLove\s.*?You\b"
- List formula:
=FILTER(A:A,REGEXMATCH(A:A,"\bYou\s.*?\bLove\b|\bLove\s.*?You\b") = true)
- Count formula:
=COUNTIF(ARRAYFORMULA(REGEXMATCH(A:A,"\bYou\s.*?\bLove\b|\bLove\s.*?You\b")),TRUE)
In this example, we want to find the word “You” and “Love” in the text. The words can be anywhere in the text, but they must both be present.
This one looks infinitely more confusing than our last two examples but is simple once we break it down.
We need to tell the regular expression that “You” can be anywhere in front of “Love” in the text. Likewise, “Love” can be anywhere in front of “You” in the text.
First, we have our familiar \b
character that determines a word boundary. Then we have the word “You” followed by the whitespace regular expression,\s
.
\bYou\s
Now between “You” and our first “Love” (Ha!), we see this combo of regular expressions: .*?
.
= the period represents any character.*
= the star represents zero or more repetitions.?
= the question mark represents zero or one repetitions with a preference of one.
Combined, this regular expression set is saying that the text can contain any set of characters that is zero or greater with a preference of one more characters that can occur after “You ” and before “Love” appears.
“Love” has the boundary \b
expressions on either side to ensure we are looking for a word.
So now we have half of the regular expression set:
\bYou\s.*?\bLove\b
Next is the or expression, |
. This means that we are looking for text that contains “You” and “Love” somewhere in the text OR (|
) “Love” and “You”.
The last “Love” and “You” portion after the or regular expression is the same as the first half, with the two words swapped around. So now it will search for both instances and if one of those instances exists, mark it as true.
\bYou\s.*?\bLove\b|\bLove\s.*?You\b
Contains any songs or singers and groups who have numbers in them
- Regular Expression:
"\d"
- List formula:
=FILTER(A:A,REGEXMATCH(A:A,"\d") = true)
- Count formula:
=COUNTIF(ARRAYFORMULA(REGEXMATCH(A:A,"\d")),TRUE)
In this example, we want to find any song, singer or group that has a number value. Basically, we are looking for the existence of a number anywhere in the text. To do this we use the \d
regular expressions. Just think “d” for digit.
Contains any day of the Week
- Regular Expression:
"Sunday|Monday|Tuesday|Wednesday|Thursday|Friday|Saturday"
- List formula:
=FILTER(A:A,REGEXMATCH(A:A,"Sunday|Monday|Tuesday|Wednesday|Thursday|Friday|Saturday") = true)
- Count formula:
=COUNTIF(ARRAYFORMULA(REGEXMATCH(A:A,"Sunday|Monday|Tuesday|Wednesday|Thursday|Friday|Saturday")),TRUE)
In this example, we are going to use the OR (|
) regular expression so that we will accept any text that contains at least one of the days of the week.
If you had a long list of words you wanted to choose from you could possibly add a JOIN function and reference a row or column of data you wanted to choose from.
Imagine if you had a column of days of the week – one on each row on cell A1:A7. You could use join to make them one string:
=JOIN("|",A1:A7)
Then insert that into your main formula:
=FILTER(A:A,REGEXMATCH(A:A,JOIN("|",A1:A7)) = true)
Text starts with the letter “I”
- Regular Expression:
"^I"
- List formula:
=FILTER(A:A,REGEXMATCH(A:A,"^I") = true)
- Count formula:
=COUNTIF(ARRAYFORMULA(REGEXMATCH(A:A,"^I")),TRUE)
Here, we want to find all the songs that start with the letter “I”. It can be “I” on its own or with other words.
To get the first letter or even a set of characters at the start of a sentence, we use the ^
regular expression.
Text starts with the word “I”
- Regular Expression:
"^I\s"
- List formula:
=FILTER(A:A,REGEXMATCH(A:A,"^I\s") = true)
- Count formula:
=COUNTIF(ARRAYFORMULA(REGEXMATCH(A:A,"^I\s")),TRUE)
This time around, we only want the word, “I”, if it is at the start of the text. To do this, we add the space \s
regular expression. You could also use the word boundary \b
regular expression here too.
Singer or Group name that contains a number in its name
- Regular Expression:
"\s–\s.*?\d"
- List formula:
=FILTER(A:A,REGEXMATCH(A:A,"\s–\s.*?\d") = true)
- Count formula:
=COUNTIF(ARRAYFORMULA(REGEXMATCH(A:A,"\s–\s.*?\d")),TRUE)
In this example, we only want cells that contain a digit in the singer or group’s name.
We know that our list has a special dash that separates the song name from the artist. We also know that there is a space either side of the dash. We want our regular expression to look for a digit somewhere after that dash.
First, we will add the dash and two spaces either side of the character.
\s–\s
Next, we will add our .*?
combo we used earlier. Remember this combination says, that it’s okay to have any characters of any length that is zero or greater, but preferably greater than one.
\s–\s.*?
Finally, we add our digit regular expression to look for a number anywhere after the dash.
\s–\s.*?\d
Singer or Group with only one name
- Regular Expression:
"\s–\s\S+$"
- List formula:
=FILTER(A:A,REGEXMATCH(A:A,"\s–\s\S+$") = true)
- Count formula:
=COUNTIF(ARRAYFORMULA(REGEXMATCH(A:A,"\s–\s\S+$")),TRUE)
In this last example, we want to find all the song that contains a singer or group with a single word name.
Again, we start off with our dash and spaces on each side.
\s–\s\
Next, we want to see a character that is not a space in the text. We use the capital “S” regular expression for this.
\s–\s\S
We then want to say that we need one or more of these non-space characters to follow. The plus sign +
is used for this regular expression.
\s–\s\S+
Finally, we want any number of non-space characters to be the last thing in the cell or line. To do this we use the dollar sign $
.
\s–\s\S+$
Conclusion
Hopefully, by now you have a pretty good grasp of how to get a list of rows based on cells that contain certain values along with their total count.
We also went over nine examples that you will most likely find useful in your day-to-day. The only way to get a grasp of these regular expressions is to apply them to your own projects.
Don’t worry if you can’t remember them, just bookmark this post and come back. Here is a summary you can use for quick reference:
Or you could make a copy of the file for an interactive version:
Counting and Filtering Data Where Data Contains Certain Values
Can’t get enough of Regular Expression? I get it. You can get some more here:
Google Sheets: Conditional Formatting with Custom Formula
~Yagi
Thanks a lot for this resource. It helped me to figure out a FILTER I needed.
Although, I had trouble combining your post on OR and REGEXMATCH. I’m not sure why I couldn’t get ‘+’ to work in that. Instead I found another resource that used “|” right within the REGEXMATCH regular expression portion. So it looked like “Chips|Corn”
This is really helpful. How would I combine this with AND or an OR statement (or another way) to count records only if they match conditions on multiple columns?
Hi Claire,
The FILTER function allows for multiple conditions, so you could account for different cell inclusions under an ‘and’ by adding extra conditions for different columns or even the same column. For ‘or’, it is a little more tricky. I recommend you have a look a this tutorial:
Google Sheets: How to use OR inside a FILTER
Hope this helps.
~Yagi
Thank you, one more question – What does the TRUE statement do in the COUNTIF() formula here? I can’t find documentation on Google on how a TRUE/FALSE statement could be incorporated in a COUNTIF() or ARRAYFORMULA() formula.
Hi Claire,
The REGEXMATCH in the ARRAYFORMULA returns an array of TRUE or FALSE statements depending upon if there was a match for each cell. So when we run our COUNTIF it will look at this array and check if any of these values are true and count them. A great way of looking at this is separating out the ARRAYFORMULA and running on a empty part of your Google Sheet to see how it works.
=ARRAYFORMULA(REGEXMATCH(text cell range, regular expression))
. Then on a cell next to the results of this formula enter, COUNTIF(the result range of the array formula, true) and see what happens.Essentially, count if will take a range to search as the first argument and a criterion as the second argument.
Hopefully, this helps. Let me know if you have any other questions.
Cheers,
~Yagi.