grep: How to match lines using any of multiple patterns

Recently, we needed to filter the results of ps x using two different patterns.
The first pattern was ./ where we needed to match that exact character sequence.
The . period character is treated as a special character in regular expressions (it matches a single character of any value, except for the end of line), so we decided to use the -F parameter to remove this special handling.
Doing this change prevented us from writing a regular expression that uses the OR | operator.

-F (or --fixed-strings) is a matching control option that instructs grep to interpret the patterns as a list of fixed strings (instead of regular expressions), separated by newlines, any of which is to be matched.
We tried assigning the different patterns as different lines to a variable and then using them on the pipe, like in the following example:

ps x | grep -F $patterns;

..but it failed.


grep supports a matching control option -e that allows us to define multiple patterns using different strings.

-e PATTERN (or --regexp=PATTERN) uses the value PATTERN as the pattern. If this option is used  multiple times or it is combined with the -f (--file) option, grep will search for all patterns given.

In the end, our command was transformed to the following, which worked just fine!

ps x | grep -F -e "./" -e "banana";

How to search for specific filenames in .tar archives

The following commands will search in the .tar archives found in the specified folder and print on screen all files that their paths or filenames match our search token. We provide multiple solutions, each one for a different type of .tar archive depending on the compression used.

For .tar archives

find /media/repository/packages/ -type f -iname "*.tar" -exec tar -t -f '{}' \; | grep "configurations/arm-cortexa9";

For .tar.bz2 archives

find /media/repository/packages/ -type f -iname "*.tar.bz2" -exec tar -t -j -f '{}' \; | grep "configurations/arm-cortexa9";

For .tar.xz archives

find /media/repository/packages/ -type f -iname "*.tar.xz" -exec tar -t -J -f '{}' \; | grep "configurations/arm-cortexa9";

For .tar.gz and .tgz archives

Please note that this commands uses the -o (which is the logical or) parameter on find to search for multiple filename extensions.

find /media/repository/packages/ -type f \( -iname "*.tar.gz" -o -iname "*.tgz" \) -exec tar -t -z -f '{}' \; | grep "configurations/arm-cortexa9";

find Parameters Legend

  • -type f filters out any result which is not a regular file
  • -exec command '{}' \; runs the specified command on the results of find. The string '{}' is replaced by the current file name being processed.
  • -o is the logical Or operator. The second expression  is not evaluated if the first expression is true.

tar Parameters Legend

  • -z or --gzip instructs tar to filter the archive through gzip
  • -j or --bzip2 filters the archive through bzip2
  • -J or --xz filters the archive through xz
  • -t or --list lists the contents of an archive
  • -f or --file=INPUT uses the archive file or device named INPUT

How to suppress binary files from matching results

When you try to find all files that contain a certain string value, it can be very costly to check binary files that you might not want to check.
To automatically prevent your search from testing if the binary files contain the needle you can add the parameter -I (capital i) to prevent grep from testing them.
Using grep, -I will process a binary file as if it did not contain matching data, this is equivalent to the --binary-files=without-match option.


find . -type f -exec grep 'string' '{}' -s -l -I \;

The above command breaks down as follows:

  • find . -type f Find all files in current directory.
  • -exec For each match execute the following.
  • grep 'string' '{}' Search the matched file '{}' if it contains the value ‘string’.
  • -s Suppress error messages about nonexistent or unreadable files.
  • -l (lambda lower case) or --files-with-matches Suppress normal output, instead print the name of each input file from which output would normally have been printed. The scanning will stop on the first match.
  • -I (i capital) or --binary-files=without-match Process a binary file as if it did not contain matching data.

Bash: Extract data from files both filtering filename, the path and doing internal processing

The following code will find all files that match the pattern 2016_*_*.log (all the log files for the year 2016).

To avoid finding log files from other services than the Web API service, we filter only the files that their path contains the folder webapi. Specifically, we used "/ServerLogs/*/webapi/*" with the following command to match all files that are under the folder /ServerLogs/ and somewhere in the path there is another folder named webapi, we do that to match files that are like /ServerLogs/Production/01/webapi/* only. The way we coded our regular expression, it will not match if there is a folder called webapi directly under the /ServerLogs/ (e.g. /ServerLogs/webapi/*).

For each result, we execute an awk script that will split the lines using the comma (FS=",";) character, then check if the line contains exactly 4 tokens (if (NF == 4) {). Later, we get the 4th token and check if it contains the substring "MASTER=" (if (match($4,"MASTER=")) {), if it does contain it we split it using the space character and assign the result to the variable named tokens. From tokens, we get the first token and use substr to remove the first character. Finally, we use the formatted result to create an array where the keys are the values we just created and it is used as a hashmap to keep record of all unique strings. In the end clause, we print all the elements of our hash map.

Finally, we sort all the results from all the awk executions and remove duplicates using sort --unique.

find /ServerLogs/ \
    -iname "2016_*_*.log" \
    -ipath "/ServerLogs/*/webapi/*" \
    -exec awk '
        BEGIN {
            if (NF == 4) {
                if (match($4,"MASTER=")) {
                    split($4, tokens, " ");
                    instances[substr(tokens[1], 2)];
        END {
            for (element in instances) {
                print element;
    ' \
    '{}' \; | sort --unique;

Following is the same code in one line.

 find /ServerLogs/ -iname "2016_*_*.log" -ipath "/ServerLogs/*/webapi/*" -exec awk 'BEGIN {FS=",";} {if (NF == 4) {if (match($4,"MASTER=")){split($4, tokens, " "); instances[substr(tokens[1], 2)];}}} END {for (element in instances) {print element;}}' '{}' \; | sort --unique 

Another way

Another way to do similar functionality would be the following

find /ServerLogs/ \
    -iname "2016_*_*.log" \
    -ipath "/ServerLogs/*/webapi/*" \
    -exec sh -c '
        grep "MASTER=" -s "$0" | awk "BEGIN {FS=\",\";} NF==4" | cut -d "," -f4 | cut -c 3- | cut -d " " -f1 | sort --unique
    ' \
    '{}' \; | sort --unique;

What we changed is the -exec part. Instead of calling a awk script, we create a new sub-shell using sh -c, then we define the source to be executed inside the single codes and we pass as the first parameter of the shell the filename that matched.

Inside the shell, we find all lines that contain the string MASTER= using the grep command. Later we filter out all lines that do not have four columns when we tokenize using the comma character using awk. Then, we get the 4th column using cut and delimiter the comma. We remove the first two characters of the input string using cut -c 3- and later we get only the first column by reusing cut and changing the delimiter to be the space character. With those results we perform a sort that eliminates duplicates and we pass the results to the parent process to perform other operations.

Following is the same code in one line

find /ServerLogs/ -iname "2016_*_*.log" -ipath "/ServerLogs/*/webapi/*" -exec sh -c 'grep "MASTER=" -s "$0" | awk "BEGIN {FS=\",\";} NF==4" | cut -d "," -f4 | cut -c 3- | cut -d " " -f1 | sort --unique' '{}' \; | sort --unique;

Git: Delete all local branches

The following command will:

  • print all branches that were merged to master
  • then filter out the branch named master and the branch you are currently switched to
  • and finally, it will delete the rest (one branch at a time).
git branch --merged master | grep -v -e "\*" -e "master" | xargs git branch -D


To cleanup any remote-tracking references that no longer exist on the remote use the following:

git fetch --prune

Grep lines that do not begin with ‘#’ or ‘;’

Recently, we wanted to modify  the squid configuration file, which is really really big!

wc -l /etc/squid/squid.conf
7898 /etc/squid/squid.conf

We wanted to find all active rules that are enabled to modify our proxy server. Out of those ~8K lines less than 20 are actually active configuration, the rest is documentation.

To find all active configuration lines we needed to find all lines that:

  • are not empty
  • do not start with #
  • do not start with ;

To do this we used the following grep command

grep "^[^#;]" /etc/squid/squid.conf

The first ^ refers to the beginning of the line, this way if in a line there is some configuration and after that there is a comment it will not be excluded by mistake. The rest, [^#;] matches any character which is not # or ;.

This is what was actually in my configuration file (out of ~8K lines)

acl SSL_ports port 443
acl Safe_ports port 80        # http
acl Safe_ports port 21        # ftp
acl Safe_ports port 443        # https
acl Safe_ports port 70        # gopher
acl Safe_ports port 210        # wais
acl Safe_ports port 1025-65535    # unregistered ports
acl Safe_ports port 280        # http-mgmt
acl Safe_ports port 488        # gss-http
acl Safe_ports port 591        # filemaker
acl Safe_ports port 777        # multiling http
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access allow localhost manager
http_access deny manager
http_access allow localhost
http_access deny all
http_port 3128
coredump_dir /var/spool/squid
refresh_pattern ^ftp:        1440    20%    10080
refresh_pattern ^gopher:    1440    0%    1440
refresh_pattern -i (/cgi-bin/|\?) 0    0%    0
refresh_pattern (Release|Packages(.gz)*)$      0       20%     2880
refresh_pattern .        0    20%    4320

Grep: Print only the words of the line that matched the regular expression, one per line

grep -oh "\w*$YOUR_PATTERN\w*" *

We used the following parameters on our command:

-h, –no-filename : Suppress the prefixing of file names on output. This is the default when there is only  one  file  (or only standard input) to search.
-o, –only-matching : Print  only  the matched (non-empty) parts of a matching line,  with each such part on a separate output line.

Also, we wrapped out pattern with the \w* that matches  all word-constituent characters on either side. The * character states that it should find 0 or more of those characters in the pattern to match.

How to: Extract all usernames that are logged in from who

who | cut -d ' ' -f 1 | sort -u

who: will show who is logged on

cut  -d ‘ ‘ -f 1: will remove all sections from each line except for column 1. It will use the space character as the delimiter for the columns

sort -u: it will sort the usernames and remove duplicate lines. So if a user is logged in multiple times you will get that username only once.
In case you want to filter out root user from this list you can do it as follows:

who | cut -d ' ' -f 1 | sort -u | grep -v 'root'

Bash: grep: A couple of usefull applications

(How) to delete all empty lines from a file:

cat someFile | grep -v '^$'

(How) to get all numbers that have no sign in front of them (all positive numbers, from a file where on each line there is one number only):

cat someFile | grep ^[0-9]

(How) to extract all negative numbers (from a file where on each line there is one number only):
You can of course replace the minus ‘-‘ character with any other you want to use as a starting character for a line.

cat someFile | grep ^-