Be the first to know and get exclusive access to offers by signing up for our mailing list(s).

Subscribe

We ❤️ Open Source

A community education resource

5 min read

The AWK trick I use to count real words in Markdown

Skip the code, count the content: How to get accurate word counts in Markdown files.

I like to use Markdown to write the first draft of anything. I find that when I write with Markdown, I can focus on what I’m writing and not what it will look like when it’s done.

How much I write depends on the topic. When I write an article to introduce an open source program, I might write 500 to 800 words. To write about how to use that program, I might write 800 to 1000 words. Explaining how to write your first program usually requires going deeper, typically over 1000 words.

I know how much I write because I can use the wc command to count words in a file. By default, wc actually prints three values: how many lines, how many words, and how many characters. You can print just one value by using an option, like -w to print the number of words. For example, I wrote over 1300 words (that’s 240 lines and over 7800 characters) in a recent article about programming:

$ wc program.md
 240 1346 7874 program.md

$ wc -w program.md
1346 program.md

However, that program included several source code listings. If you just want to count the number of words excluding the code listings, we need to split up the file into separate blocks of text and code.

Read more: How I built a Markdown-to-HTML tool on a 5MB FreeDOS system

Scripting with AWK

The AWK scripting language makes it easy to examine a text file. One reason AWK is so easy to use is because it automatically separates lines into fields (usually words) so you can act on each field separately. The awk program is the standard Unix program that implements AWK scripts; most Linux systems provide the GNU version of awk, or gawk.

AWK scripts are a series of pattern-action pairs. A pattern can be a matching regular expression, such as /^hello/ to match lines that have the word hello at the start of the line. But patterns can also be some kind of comparison or test, such as a == 1 for any time when the AWK variable a has the value 1.

When AWK matches a pattern, it executes one or more statements inside curly braces. One such action is print, which will print the current line. An action might also assign a value to a variable.

Separating code and text

We can use these pattern-action rules to separate text blocks from a Markdown file. We can do this because Markdown uses three “back ticks” as a “code fence” to define a block of source in an article. You can also indicate sample code by adding four spaces at the start of the line, but I find it easier to use the “code fence” method. This also makes it really easy to identify code blocks versus text blocks; code and text are always separated by a fence, at least in my style of writing Markdown text.

For example, I might write about a sample C program in this way:

To print a message to the screen, use the `puts` function,
as in this sample program:

```
#include <stdio.h>

int main()
{
  puts("Hi there");
  return 0;
}
```

This prints the text **Hi there** on the terminal.

In this way, the three backticks (the code fence) becomes a separator between the text and the code.

I can use an AWK pattern-action statement on the code fence to increment a variable called show. Every time the script sees the fence, it adds one to the variable.

/^```/ {show++}

I can then use another pattern-action statement to evaluate the variable: if the show variable is an even number, it prints the line.

show%2 == 0 {print}

Using gawk, we can combine both of these pattern-action statements on a single command line to print only the body text from the Markdown article:

$ gawk '/^```/ {show++} show%2 == 0 {print}' puts.md 
To print a message to the screen, use the `puts` function,
as in this sample program:

```

This prints the text **Hi there** on the terminal.

The output includes one of the code fences, which will count as a “word” when I send the output to the wc program. But I don’t mind miscounting by a few words here and there. I’m not looking for an exact count, just an approximate count. For example, this sample has about 25 words:

$ gawk '/^```/ {show++} show%2 == 0 {print}' puts.md | wc -w
26

Read more: Clean up your code with Indent

Counting words, not code

Let’s apply this neat trick to a recent article I wrote about DOS programming and see how many words I wrote. I wrote the original draft in Markdown, in a file called dos.md.

Using the wc command to count the words, for the entire file, shows that I wrote over 1300 “words.”

$ wc -w dos.md
1346 dos.md

But that includes sample code, in addition to the body text. If I only want to count the body text, excluding the blocks of code, I need to use the gawk command to extract only the body text so I can count that.

$ gawk '/^```/ {show++} show%2 == 0 {print}' dos.md | wc -w
1007

Since I use Markdown to write the first draft of all my articles, I can use the same method to count the words (excluding code) from other things I’ve written. If you use Markdown, try this simple command line trick to count words only from the text body, not the code samples.

More from We Love Open Source

About the Author

Jim Hall is an open source software advocate and developer, best known for usability testing in GNOME and as the founder + project coordinator of FreeDOS. At work, Jim is CEO of Hallmentum, an IT executive consulting company that provides hands-on IT Leadership training, workshops, and coaching.

Read Jim's Full Bio

The opinions expressed on this website are those of each author, not of the author's employer or All Things Open/We Love Open Source.

Want to contribute your open source content?

Contribute to We ❤️ Open Source

Help educate our community by contributing a blog post, tutorial, or how-to.

We're hosting two world-class events in 2026!

Join us for All Things AI, March 23-24 and for All Things Open, October 18-20.

Open Source Meetups

We host some of the most active open source meetups in the U.S. Get more info and RSVP to an upcoming event.