Grep Performance - Testing how fast grep can parse through data

Jun 7, 2024

trunc_team

Grep is the goto command to search for content on Linux and Unix systems. Most system administrators and network/security professionals use it on a daily basis when troubleshooting issues or looking for possible signs of compromise.

Also, grep is fast. It allows you to look for multiple files recursively and often returns the results pretty immediately. In fact, there is an interesting post from the original author of GNU grep where it explains why it is so fast:


            #1 trick: GNU grep is fast because it AVOIDS LOOKING AT

            EVERY INPUT BYTE.

            

            #2 trick: GNU grep is fast because it EXECUTES VERY FEW

            INSTRUCTIONS FOR EACH BYTE that it *does* look at.

            ..

            ..

            The key to making programs fast is to make them do practically nothing. ;-)

And it ends with a beautiful quote: "The key to making programs fast is to make them do practically nothing.". Grep is a very simple tool that allows you to search for keywords (or regexes) in files. Because of its simplicity, it can be very fast.

But how fast is fast? We care a lot about logs at Trunc, and some times we deal with gigabytes and hundreds of gigabytes of logs. And even though grep is fast, some times it can take a while to get a response. How quickly do you think grep can parse through data?

Grep performance for Gigabyte of data

We deal with a lot of text files (logs) at Trunc, so we decided to check how quickly it takes to parse through files depending on its size. We separated a few real log files with different data, one with 1.1G of logs, one with 4.0G and another with 17.0G of logs to see how quick (or slow) it would take:


            1.1G Jun 3 23:59 ./testing/1.1G.log

            $ time grep test123 ./testing/1.1G.log | wc -l

            0

            

            real 0m0.755s

            user 0m0.405s

            sys 0m0.350s

            

            

            17G Jun 3 23:59 ./testing/17G.log

            $ time grep test123 ./testing/17G.log | wc -l

            6

            

            real 0m10.484s

            user 0m5.922s

            sys 0m4.553s

            

            4.0G Jun 4 05:32 ./testing/4G.log

            $ time grep test123 ./testing/4G.log | wc -l

            4

            

            real 0m3.096s

            user 0m1.554s

            sys 0m1.532s

Based on these tests, on a high performance server with SSD, we can see that it took 0.75 seconds to parse 1.1G of data, at a rate of 1.4 GB/s. On the 4G log file, it took 3 seconds, at a rate of 1.3GB/s. On the 17G log file, it took 10.4 seconds, at a rate of 1.6 GB/s. We repeated the same test on different servers and at different log files and we were always getting a rate in the range of 1.2GB/s to 1.7 GB/s.

That gives us a good indication on how fast it can load data out of the SSD (non-SSD drives would take a lot longer) , parse the data and print out the results. Like Mike Haertel (author of Grep) said, when you don't do much, it makes your code run a lot faster. Most of the time it takes is reading the data out of the disk, because even when you add more complex searches (like -E for extended regex), it barely increases the time of execution.

Grep for large datasets

Even though grep is fast, for large datasets, it can take a long time. If you are analysing web server logs, for example, which can easily grow to 100's of GB, it can take minutes to get a simple response. I remember restoring hacked servers and taking forever to do our investigations mostly because of the time it takes to parse through the data.

That's why in fact, Trunc was born. For example, for that 17G log file that takes 10 seconds in grep, the results are almost immediatly at the Trunc terminal:


            > search test123 

            ... results ...

            found 6 logs in 0.68 seconds.

In fact, even when we go through large data sets, it returns the results right away:

> search test123 ... results ... found 180 logs in 0.89 seconds.

In this one, it went in over 800GB in 0.89 seconds. That would have taken minutes in the terminal via grep. And the main difference is that grep has to go through all the data every time, while we store the logs into a specialized database where all the keywords are indexed (think Google , but for logs), allowing us to skip re-parsing all the data every search requests.

In summary, grep is a great tool, pretty fast and can easily search through ~ 1-2GB of data per second on a fast disk. If you have logs in the hundreds of GB's, you might need to use a specialized logging storage to optimize your queries instead of waiting.

Logging Guides

We love logs. In this section we will share some articles from our team to help you get better at logging.

Trunc Logging

Logging for fun and a good night of sleep.

Real time search
Google simple
Cheap
Just works
PCI compliance

Latest Articles

Latest articles from our learning center.

2025-07-22Early Scans for CVE-2025-53771 (SharePoint Vulnerability) Detected
2025-06-03Investigating the 'slince_golden' WordPress Backdoor
2025-05-30Vulnerability Scanner Logs: WPScan
2025-05-29Web Scanning, Development Hygiene, and File Exposure Risks
2025-05-29Troubleshooting Remote Syslog with TCPDUMP
2025-05-29Logging basics: Syslog protocol in detail

Contact us!

Do you have an idea for an article that is not here? See something wrong? Contact us at support@noc.org

Tired of price gouging

Clear pricing
No need to guess
Real people
Real logging

Grep Performance - Testing how fast grep can parse through data

Grep performance for Gigabyte of data

Grep for large datasets

Logging Guides

Trunc Logging

Latest Articles

Contact us!

Tired of price gouging

Simple, Affordable, Log Management and Analysis.

PRODUCTS

USEFUL LINKS

CONTACT US