We ❤️ Open Source

A community education resource

6 min read

Explore regular expressions by playing a word game

Learn how to use regular expressions by enhancing your chances at Wordle.

I sometimes need to take a break after I’ve been working for a long time. And one way that I like to relax is to play a puzzle game. I might play a simple “guess the number” game or a “guess the word” game. One “guess the word” game I go back to all the time is Wordle at the New York Times.

If you haven’t played this game, you have to guess the secret five-letter word within six attempts. For each guess, Wordle highlights the letters in your word: gray letters don’t appear in the secret word; yellow letters are in the secret word, but not in that location; green letters are correctly placed in the secret word.

This word puzzle game is an excellent opportunity to use regular expressions. I don’t see it as “cheating,” but using a command line tool (grep) to help narrow down the options for each guess.

Find all five-letter words

Before we make our first guess, we need to start with a list of all valid five-letter words. Wordle only uses normal words, not proper nouns or acronyms. Fortunately, Linux and other Unix-like systems provide a list of words in a file, usually /usr/share/dict/words.

The grep program is an incredibly useful command line tool to find text in a file. Rather than simply looking for an exact match of a “search word,” grep operates with regular expressions. A regular expression is a pattern that describes letters or other characters that might appear in the source text. For finding words to play Wordle, we can start with just a few patterns:

  • [a-z] matches any single character from a to z, in lowercase
  • ^ indicates the start of a line
  • $ matches the end of a line

So to match only all-lowercase five-letter words from the /usr/share/dict/words file, we use this grep command:

$ grep '^[a-z][a-z][a-z][a-z][a-z]$' /usr/share/dict/words > mywords

On my system, that gives me a list of 15,000 words that are all-lowercase and exactly five letters long. The wc -l command prints the number of lines in my new file:

$ wc -l mywords
15034 mywords

Make the first guess

I like my first guess to use five unique letters. While the secret word might be a word like guess (a valid five-letter word) I wouldn’t make that my first guess because of the repeated letter. Instead, I can get the most information if my word has five unique letters. The letters E, S, T, and R occur most frequently in English words, so I’ll pick a word like tries which has five unique letters and includes those most-common letters:

Screenshot of Wordle game with the first word: TRIES. The S block is green and all other letter blocks are grey.
Make the first guess: TRIES Image credits: Jim Hall, CC-BY-SA

Narrow the options for the second guess

Despite what it looks like, this is actually a pretty good guess. We know the secret word has an S at the end, but does not contain the letters T, R, I, or E. We can use this information to narrow down the options for our next guess.

The -v option to grep will “invert” a search. This means it will return entries that do not match a pattern. This is easier to see if we have a smaller sample, such as the numbers 5 to 15:

$ seq 5 15
5
6
7
8
9
10
11
12
13
14
15

If we wanted to see all the numbers from this list that contained either 1 or 7, we could use this grep command:

$ seq 5 15 | grep '[17]'
7
10
11
12
13
14
15

The [17] means “a single character that is either 1 or 7.” Inverting this search with -v returns only those numbers from 5 to 15 that do not contain the numbers 1 or 7:

$ seq 5 15 | grep -v '[17]'
5
6
8
9

We can use this method to narrow down the list of possible words by removing any words that contain the letters T, R, I, or E:

$ grep -v '[trie]' mywords > guess2
$ wc -l guess2
2893 guess2

That has reduced the list of possible next guesses from 15,000 words to about 2,900 words. But we can take this a step further; we also know that the secret word has an S at the end, so we can use another grep command to only find entries in that list of 1,900 words that have S at the end:

$ grep 's$' guess2 > guess2a
$ wc -l guess2a
817 guess2a

With just two grep commands, we have reduced the possible words for our next guess to just 800 words. Looking through that list of words, I found alums which seems appropriate for graduation season, so I’ll guess that.

Screenshot of Wordle game with the first word: TRIES and the second word: ALUMS. The S and A blocks are green, the M block is gold, and all other letter blocks, L and U, are grey.
Make the second guess: ALUMS Image credits: Jim Hall, CC-BY-SA

Further narrow the options

We now know the secret word starts with an A and ends with an S, and there’s an M in there somewhere, but not as the second-to-last letter. The secret word also does not have the letters L or U. Since the guess2a list only contains words that end in S, we can use another grep command to narrow the list to words that also start with A:

$ grep '^a' guess2a > guess3
$ wc -l guess3
54 guess3

This has already narrowed the options to only 50 words. Now we need to weed out the words with L or U, using the -v option:

$ grep -v '[lu]' guess3 > guess3a
$ wc -l guess3a
30 guess3a

That’s only 30 words! But some of these words don’t have the letter M, like abyss, and other words have an M but as the second-to-last letter, like adams. To narrow the list to words that include M but not as the next-to-last letter, we can use a pair of grep commands:

$ grep m guess3a | grep -v 'm.$' > guess3b
$ wc -l guess3b
8 guess3b

In a regular expression, the . (period) means “any character,” so the regular expression m.$ matches “M as the second-to-last letter.” And we inverted that search with -v to find words that don’t have M as the next-to-final letter.

Finish the game

The list is now very short, with only eight possible words:

$ cat guess3b
agmas
amaas
amahs
amass
ambas
ambos
ammos
amoks

Finishing the game is an exercise of making a guess from this limited list, and removing further options. Since Wordle usually uses “common” words that most people will find familiar, I’ll guess amass, which means “to collect for oneself,” such as “to amass a fortune.” And that happens to be the correct word:

Screenshot of Wordle game with the first word: TRIES and the second word: ALUMS, and the correct word in all green blocks: AMASS
Make the third guess: AMASS Image credits: Jim Hall, CC-BY-SA

Regular expressions to match patterns

Using grep with some simple regular expressions is a fun exercise to help narrow the options in Wordle, but it’s also about how to use regular expressions in general. Regular expressions make it easy to find patterns of any kind. Other common patterns in regular expressions include:

syntaxdescription
?zero or one of the preceding character or pattern
*zero or more of the preceding character or pattern
+one or more of the preceding character or pattern

You can learn more about regular expressions by reading the regex manual page from section 7:

$ man 7 regex

Have fun!

About the Author

Jim Hall is an open source software advocate and developer, best known for usability testing in GNOME and as the founder + project coordinator of FreeDOS. At work, Jim is CEO of Hallmentum, an IT executive consulting company that provides hands-on IT Leadership training, workshops, and coaching.

Read Jim's Full Bio

The opinions expressed on this website are those of each author, not of the author's employer or All Things Open/We Love Open Source.

Save the Date for All Things Open 2024

Join thousands of open source friends October 27-29 in downtown Raleigh for ATO 2024!

Upcoming Events

We do more than just All Things Open and Open Source 101. See all upcoming events here.

Open Source Meetups

We host some of the most active open source meetups in the U.S. Get more info and RSVP to an upcoming event.