grep


How to find lines that contain only lowercase characters

To print all lines that contain only lower case characters, we used the following regular expression in grep:

egrep '^[[:lower:]]+$' <file>;
#If you do not have egrep, use
grep -e '^[[:lower:]]+$' <file>;

Breakdown of the above regular expression:

  • ^ instructs the regular expression parser that the pattern should always start with the beginning of the line
  • [[:lower:]] this special instruction informs us that only lower case characters can match it
  • + the plus sign causes the preceding token to be matched one or more times
  • $ signifies the end of the line

Ignore all edits to a file that is committed in git

Recently, we were working on a project that had committed in the source code a configuration file. That configuration file had hard-coded the production system values, so we had to modify them to the development system values before using it.

To avoid committing the configuration file with the development parameters by accident, we instructed git to ignore any changes that were made to it using the following command.

git update-index --assume-unchanged <file>;

By doing so, git assumed that the file was always unchanged and it never showed up in the git status results nor was staged when git add . was used etc.

After we were done with development (and whenever we needed to pull the branch for changes or checkout another branch) we removed the file from the list of ignored files using the following command.

git update-index --no-assume-unchanged <file>;

Using this command, git would start again to monitor changes to the file and merge it or update it or push it when needed as it would normally do for any file not included in the .gitignore file. The best part of this trick is that you do not have to update the .gitignore file to achieve the task of ignoring a file.

More information

git update-index modifies the index or directory cache. Each file mentioned is updated into the index and any unmerged or needs updating state is cleared.

--[no-]assume-unchanged When these flags are specified, the object names recorded for the paths are not updated. Instead, these options set and unset the “assume unchanged” bit for the paths. When the “assume unchanged” bit is on, Git stops checking the working tree files for possible modifications, so you need to manually unset the bit to tell Git when you change the working tree file. This is sometimes helpful when working with a big project on a filesystem that has very slow lstat(2) system call (e.g. cifs).

This option can be also used as a coarse file-level mechanism to ignore uncommitted changes in tracked files (akin to what .gitignore does for untracked files). Git will fail (gracefully) in case it needs to modify this file in the index e.g. when merging in a commit; thus, in case the assumed-untracked file is changed upstream, you will need to handle the situation manually.

From: man git-update-index

Bonus

In case you are wondering on how to see which files are currently ignored in your local repository copy by the git update-index --assume-unchanged <file>; command, you can use the following code:

git ls-files -v | grep -e '^[[:lower:]]';

git ls-files -v will print out all objects that git knows and the -v parameter will print all flags associated with them. The files that are ignored because of the
git update-index --assume-unchanged <file>; command will be printed each one on a different line that starts with a lower case character. So, to get all files that are ignored by the git update-index --assume-unchanged <file>; command, we need to grep the results of git ls-files -v for lines that start with a lower case.

git-ls-files shows information about files in the index and the working tree.
It merges the file listing in the directory cache index with the actual working directory list, and shows different combinations of the two.

-v Similar to -t (below), but use lowercase letters for files that are marked as assume unchanged (see git-update-index(1)).

-t This feature is semi-deprecated. For scripting purpose, git-status(1)–porcelain and git-diff-files(1)–name-status are almost always superior alternatives, and users should look at git-status(1)–short or git-diff(1)–name-status for more user-friendly alternatives.
This option identifies the file status with the following tags (followed by a space) at the start of each line:

An additional interesting parameter for git ls-files is
-i, --ignored Shows only ignored files in the output. When showing files in the index, it prints only those matched by an exclude pattern. When showing “other” files, it shows only those matched by an exclude pattern.

--exclude-standard Add the standard Git exclusions: .git/info/exclude, .gitignore in each directory, and the user’s global exclusion file.
From: man git-ls-files

Examples for git ls-files -i, –ignored and –exclude-standard

# Show files in the index that are ignored because of patterns in .gitignore
git ls-files --ignored --exclude-from=.gitignore;
# Show other (i.e. untracked) files that are ignored because of patterns in .gitignore
git ls-files --ignored --other --exclude-from=.gitignore;
# Show files in the index that are ignored because of patterns in any of the standard git exclusions.
git ls-files --ignored --exclude-standard;
# Show other (i.e. untracked) files that are ignored because of patterns in any of the standard git exclusions.
git ls-files --ignored --exclude-standard --other;


grep: How to match lines using any of multiple patterns

Recently, we needed to filter the results of ps x using two different patterns.
The first pattern was ./ where we needed to match that exact character sequence.
The . period character is treated as a special character in regular expressions (it matches a single character of any value, except for the end of line), so we decided to use the -F parameter to remove this special handling.
Doing this change prevented us from writing a regular expression that uses the OR | operator.

-F (or --fixed-strings) is a matching control option that instructs grep to interpret the patterns as a list of fixed strings (instead of regular expressions), separated by newlines, any of which is to be matched.
We tried assigning the different patterns as different lines to a variable and then using them on the pipe, like in the following example:

patterns="./
banana";
ps x | grep -F $patterns;

..but it failed.

Solution

grep supports a matching control option -e that allows us to define multiple patterns using different strings.

-e PATTERN (or --regexp=PATTERN) uses the value PATTERN as the pattern. If this option is used  multiple times or it is combined with the -f (--file) option, grep will search for all patterns given.

In the end, our command was transformed to the following, which worked just fine!

ps x | grep -F -e "./" -e "banana";

How to search for specific filenames in .tar archives

The following commands will search in the .tar archives found in the specified folder and print on screen all files that their paths or filenames match our search token. We provide multiple solutions, each one for a different type of .tar archive depending on the compression used.

For .tar archives

find /media/repository/packages/ -type f -iname "*.tar" -exec tar -t -f '{}' \; | grep "configurations/arm-cortexa9";

For .tar.bz2 archives

find /media/repository/packages/ -type f -iname "*.tar.bz2" -exec tar -t -j -f '{}' \; | grep "configurations/arm-cortexa9";

For .tar.xz archives

find /media/repository/packages/ -type f -iname "*.tar.xz" -exec tar -t -J -f '{}' \; | grep "configurations/arm-cortexa9";

For .tar.gz and .tgz archives

Please note that this commands uses the -o (which is the logical or) parameter on find to search for multiple filename extensions.

find /media/repository/packages/ -type f \( -iname "*.tar.gz" -o -iname "*.tgz" \) -exec tar -t -z -f '{}' \; | grep "configurations/arm-cortexa9";

find Parameters Legend

  • -type f filters out any result which is not a regular file
  • -exec command '{}' \; runs the specified command on the results of find. The string '{}' is replaced by the current file name being processed.
  • -o is the logical Or operator. The second expression  is not evaluated if the first expression is true.

tar Parameters Legend

  • -z or --gzip instructs tar to filter the archive through gzip
  • -j or --bzip2 filters the archive through bzip2
  • -J or --xz filters the archive through xz
  • -t or --list lists the contents of an archive
  • -f or --file=INPUT uses the archive file or device named INPUT

How to suppress binary files from matching results

When you try to find all files that contain a certain string value, it can be very costly to check binary files that you might not want to check.
To automatically prevent your search from testing if the binary files contain the needle you can add the parameter -I (capital i) to prevent grep from testing them.
Using grep, -I will process a binary file as if it did not contain matching data, this is equivalent to the --binary-files=without-match option.

Example

find . -type f -exec grep 'string' '{}' -s -l -I \;

The above command breaks down as follows:

  • find . -type f Find all files in current directory.
  • -exec For each match execute the following.
  • grep 'string' '{}' Search the matched file '{}' if it contains the value ‘string’.
  • -s Suppress error messages about nonexistent or unreadable files.
  • -l (lambda lower case) or --files-with-matches Suppress normal output, instead print the name of each input file from which output would normally have been printed. The scanning will stop on the first match.
  • -I (i capital) or --binary-files=without-match Process a binary file as if it did not contain matching data.

Bash: Extract data from files both filtering filename, the path and doing internal processing

The following code will find all files that match the pattern 2016_*_*.log (all the log files for the year 2016).

To avoid finding log files from other services than the Web API service, we filter only the files that their path contains the folder webapi. Specifically, we used "/ServerLogs/*/webapi/*" with the following command to match all files that are under the folder /ServerLogs/ and somewhere in the path there is another folder named webapi, we do that to match files that are like /ServerLogs/Production/01/webapi/* only. The way we coded our regular expression, it will not match if there is a folder called webapi directly under the /ServerLogs/ (e.g. /ServerLogs/webapi/*).

For each result, we execute an awk script that will split the lines using the comma (FS=",";) character, then check if the line contains exactly 4 tokens (if (NF == 4) {). Later, we get the 4th token and check if it contains the substring "MASTER=" (if (match($4,"MASTER=")) {), if it does contain it we split it using the space character and assign the result to the variable named tokens. From tokens, we get the first token and use substr to remove the first character. Finally, we use the formatted result to create an array where the keys are the values we just created and it is used as a hashmap to keep record of all unique strings. In the end clause, we print all the elements of our hash map.

Finally, we sort all the results from all the awk executions and remove duplicates using sort --unique.

find /ServerLogs/ \
    -iname "2016_*_*.log" \
    -ipath "/ServerLogs/*/webapi/*" \
    -exec awk '
        BEGIN {
            FS=",";
        }
        {
            if (NF == 4) {
                if (match($4,"MASTER=")) {
                    split($4, tokens, " ");
                    instances[substr(tokens[1], 2)];
                }
            }
        }
        END {
            for (element in instances) {
                print element;
            }
        }
    ' \
    '{}' \; | sort --unique;

Following is the same code in one line.

 find /ServerLogs/ -iname "2016_*_*.log" -ipath "/ServerLogs/*/webapi/*" -exec awk 'BEGIN {FS=",";} {if (NF == 4) {if (match($4,"MASTER=")){split($4, tokens, " "); instances[substr(tokens[1], 2)];}}} END {for (element in instances) {print element;}}' '{}' \; | sort --unique 

Another way

Another way to do similar functionality would be the following

find /ServerLogs/ \
    -iname "2016_*_*.log" \
    -ipath "/ServerLogs/*/webapi/*" \
    -exec sh -c '
        grep "MASTER=" -s "$0" | awk "BEGIN {FS=\",\";} NF==4" | cut -d "," -f4 | cut -c 3- | cut -d " " -f1 | sort --unique
    ' \
    '{}' \; | sort --unique;

What we changed is the -exec part. Instead of calling a awk script, we create a new sub-shell using sh -c, then we define the source to be executed inside the single codes and we pass as the first parameter of the shell the filename that matched.

Inside the shell, we find all lines that contain the string MASTER= using the grep command. Later we filter out all lines that do not have four columns when we tokenize using the comma character using awk. Then, we get the 4th column using cut and delimiter the comma. We remove the first two characters of the input string using cut -c 3- and later we get only the first column by reusing cut and changing the delimiter to be the space character. With those results we perform a sort that eliminates duplicates and we pass the results to the parent process to perform other operations.

Following is the same code in one line

find /ServerLogs/ -iname "2016_*_*.log" -ipath "/ServerLogs/*/webapi/*" -exec sh -c 'grep "MASTER=" -s "$0" | awk "BEGIN {FS=\",\";} NF==4" | cut -d "," -f4 | cut -c 3- | cut -d " " -f1 | sort --unique' '{}' \; | sort --unique;


Git: Delete all local branches

The following command will:

  • print all branches that were merged to master
  • then filter out the branch named master and the branch you are currently switched to
  • and finally, it will delete the rest (one branch at a time).
git branch --merged master | grep -v -e "\*" -e "master" | xargs git branch -D

Tip:

To cleanup any remote-tracking references that no longer exist on the remote use the following:

git fetch --prune

Grep lines that do not begin with ‘#’ or ‘;’

Recently, we wanted to modify  the squid configuration file, which is really really big!

wc -l /etc/squid/squid.conf
7898 /etc/squid/squid.conf

We wanted to find all active rules that are enabled to modify our proxy server. Out of those ~8K lines less than 20 are actually active configuration, the rest is documentation.

To find all active configuration lines we needed to find all lines that:

  • are not empty
  • do not start with #
  • do not start with ;

To do this we used the following grep command

grep "^[^#;]" /etc/squid/squid.conf

The first ^ refers to the beginning of the line, this way if in a line there is some configuration and after that there is a comment it will not be excluded by mistake. The rest, [^#;] matches any character which is not # or ;.

This is what was actually in my configuration file (out of ~8K lines)

acl SSL_ports port 443
acl Safe_ports port 80        # http
acl Safe_ports port 21        # ftp
acl Safe_ports port 443        # https
acl Safe_ports port 70        # gopher
acl Safe_ports port 210        # wais
acl Safe_ports port 1025-65535    # unregistered ports
acl Safe_ports port 280        # http-mgmt
acl Safe_ports port 488        # gss-http
acl Safe_ports port 591        # filemaker
acl Safe_ports port 777        # multiling http
acl CONNECT method CONNECT
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access allow localhost manager
http_access deny manager
http_access allow localhost
http_access deny all
http_port 3128
coredump_dir /var/spool/squid
refresh_pattern ^ftp:        1440    20%    10080
refresh_pattern ^gopher:    1440    0%    1440
refresh_pattern -i (/cgi-bin/|\?) 0    0%    0
refresh_pattern (Release|Packages(.gz)*)$      0       20%     2880
refresh_pattern .        0    20%    4320

Grep: Print only the words of the line that matched the regular expression, one per line

grep -oh "\w*$YOUR_PATTERN\w*" *

We used the following parameters on our command:

-h, –no-filename : Suppress the prefixing of file names on output. This is the default when there is only  one  file  (or only standard input) to search.
-o, –only-matching : Print  only  the matched (non-empty) parts of a matching line,  with each such part on a separate output line.

Also, we wrapped out pattern with the \w* that matches  all word-constituent characters on either side. The * character states that it should find 0 or more of those characters in the pattern to match.


How to: Extract all usernames that are logged in from who

who | cut -d ' ' -f 1 | sort -u

who: will show who is logged on

cut  -d ‘ ‘ -f 1: will remove all sections from each line except for column 1. It will use the space character as the delimiter for the columns

sort -u: it will sort the usernames and remove duplicate lines. So if a user is logged in multiple times you will get that username only once.
In case you want to filter out root user from this list you can do it as follows:

who | cut -d ' ' -f 1 | sort -u | grep -v 'root'