awk


Bash: Switch positions between all characters in odd positions with characters in even positions

The following awk script allowed us to switch position of all characters placed in odd numbered positions with their next neighboring even numbered position characters.
In detail what it does is to create a for loop that skips one character every time and then it prints each pair in reverse order (it will print the second character first, then the first one, then the fourth and so on).


echo "123456789" | awk -vFS= '{for (i = 1; i <= NF; i+=2) {printf $(i+1)$i""} printf "\n"}';

# Will produce 214365879

echo "1234567890" | awk -vFS= '{for (i = 1; i <= NF; i+=2) {printf $(i+1)$i""} printf "\n"}';

# Will produce 2143658709

Please note that we set the built-in variable FS (The input field separator which is a space by default) to the empty string so that each character is treated like a different field by NF (The number of fields in the current input record).

 


Bash: Print time stamp in front of every line in a pipe

Recently, we received a binary that collected data from a web service and it printed them on screen.
The binary did not print a time stamp in front of each line so we had to improvise of a way to add the time stamp to the logs without modifying the binary.

The solution we came to was to use awk to prepend the time stamp in front of every line using a pipe.
Specifically, our solution was the following:


server_application 2>&1 | awk '{ print strftime("%Y-%m-%d %H:%M:%S"), $0; fflush(); }'

What we did there was to start our binary server_application, redirect stderr to stdout (using 2>&1) so that we will have only one stream and then we read the lines one by one using awk and printed the time stamp right before the line ($0) using strftime.
The strftime() function formats the broken-down time according to the format specification format.
fflushforces a write of all user-space buffered data for the given output or update stream via the stream’s underlying write function. We call it at each line to make sure that we do not cause additional delay in presenting the data due to buffering limitations caused by our prints.

Example


$ echo -e "hi\nHI" 2>&1 | awk '{ print strftime("%Y-%m-%d %H:%M:%S"), $0; fflush(); }'
2017-06-21 20:33:41 hi
2017-06-21 20:33:41 HI


Bash: Extract data from files both filtering filename, the path and doing internal processing

The following code will find all files that match the pattern 2016_*_*.log (all the log files for the year 2016).

To avoid finding log files from other services than the Web API service, we filter only the files that their path contains the folder webapi. Specifically, we used "/ServerLogs/*/webapi/*" with the following command to match all files that are under the folder /ServerLogs/ and somewhere in the path there is another folder named webapi, we do that to match files that are like /ServerLogs/Production/01/webapi/* only. The way we coded our regular expression, it will not match if there is a folder called webapi directly under the /ServerLogs/ (e.g. /ServerLogs/webapi/*).

For each result, we execute an awk script that will split the lines using the comma (FS=",";) character, then check if the line contains exactly 4 tokens (if (NF == 4) {). Later, we get the 4th token and check if it contains the substring "MASTER=" (if (match($4,"MASTER=")) {), if it does contain it we split it using the space character and assign the result to the variable named tokens. From tokens, we get the first token and use substr to remove the first character. Finally, we use the formatted result to create an array where the keys are the values we just created and it is used as a hashmap to keep record of all unique strings. In the end clause, we print all the elements of our hash map.

Finally, we sort all the results from all the awk executions and remove duplicates using sort --unique.


find /ServerLogs/ \
    -iname "2016_*_*.log" \
    -ipath "/ServerLogs/*/webapi/*" \
    -exec awk '
        BEGIN {
            FS=",";
        }
        {
            if (NF == 4) {
                if (match($4,"MASTER=")) {
                    split($4, tokens, " ");
                    instances[substr(tokens[1], 2)];
                }
            }
        }
        END {
            for (element in instances) {
                print element;
            }
        }
    ' \
    '{}' \; | sort --unique;

Following is the same code in one line.

 find /ServerLogs/ -iname "2016_*_*.log" -ipath "/ServerLogs/*/webapi/*" -exec awk 'BEGIN {FS=",";} {if (NF == 4) {if (match($4,"MASTER=")){split($4, tokens, " "); instances[substr(tokens[1], 2)];}}} END {for (element in instances) {print element;}}' '{}' \; | sort --unique 

Another way

Another way to do similar functionality would be the following


find /ServerLogs/ \
    -iname "2016_*_*.log" \
    -ipath "/ServerLogs/*/webapi/*" \
    -exec sh -c '
        grep "MASTER=" -s "$0" | awk "BEGIN {FS=\",\";} NF==4" | cut -d "," -f4 | cut -c 3- | cut -d " " -f1 | sort --unique
    ' \
    '{}' \; | sort --unique;

What we changed is the -exec part. Instead of calling a awk script, we create a new sub-shell using sh -c, then we define the source to be executed inside the single codes and we pass as the first parameter of the shell the filename that matched.

Inside the shell, we find all lines that contain the string MASTER= using the grep command. Later we filter out all lines that do not have four columns when we tokenize using the comma character using awk. Then, we get the 4th column using cut and delimiter the comma. We remove the first two characters of the input string using cut -c 3- and later we get only the first column by reusing cut and changing the delimiter to be the space character. With those results we perform a sort that eliminates duplicates and we pass the results to the parent process to perform other operations.

Following is the same code in one line


find /ServerLogs/ -iname "2016_*_*.log" -ipath "/ServerLogs/*/webapi/*" -exec sh -c 'grep "MASTER=" -s "$0" | awk "BEGIN {FS=\",\";} NF==4" | cut -d "," -f4 | cut -c 3- | cut -d " " -f1 | sort --unique' '{}' \; | sort --unique;


Use awk to print the last N columns of a file or a pipe

In this post we will describe a way to print the last N number of columns in awk.

We will use this code as example, where we will print the last 2 columns only:


awk '{n = 2; for (--n; n >= 0; n--){ printf "%s\t",$(NF-n)} print ""}';
'

In the awk script we use the variable n to control how many columns we want to print. In the above example we initialized it  to the value 2 as that is the number of columns we want printed.

After, we use a for loop to iterate over the fields (in this case the last two fields) and we print them to the screen using printf "%s\t",$(NF-n) to avoid printing the new line character and to separate them with a tab character.

NF is a special variable in awk that holds the total number of fields available on that line. If you do not change the delimiter, then it will hold the number of words on the line.

$(NF-n) is the way we ask awk to gives us the variable value that is n places before the last.

Outside the loop we print "" to print the new line character between input rows.

Examples:

If we want to print the last two columns of the ls -l command we can do it as follows:


ls -l | awk '{i = 2; for (--i; i >= 0; i--){ printf "%s\t",$(NF-i)} print ""}';

If we want to print the last two columns of the /etc/passwd file we can do it as follows:


awk -F ':' '{i = 2; for (--i; i >= 0; i--){ printf "%s\t",$(NF-i)} print ""}' /etc/passwd;

Note that we change the delimiter with the command line argument -F ":"


Bash: Close a range of open sockets by killing the PIDs that are holding them open

Sometimes you want to use a specific port number but some other process(es) is using it. To get the control of the port you need to terminate all other processes and free it.
To find out which process(es) you need to kill, use lsof -i :port. It will return a list of each command and PID that is using the specific port. After that kill those PID using kill -s 9.

The following script will accept a range of ports to free, and for each it will try to kill all processes that are holding them blocked.

low=12345;
high=12350;
for i in `seq $low $high`; do
  lsof -i :$i | tail -n +2 | awk '{system("kill -s 9 " $2)}';
done

Using tail -n +2 we skip the first line of the input which would be the header information.
The system method will invoke a new sh shell and execute the command in it.
Using kill -s 9 we signal the processes that they have to terminate immediately.


Kill all processes of a user (Or kill almost all using an exception list) in linux

Following is a command a root can use to stop all active threads of a user with an exception list (you can replace someApplication) with specific commands you wish to keep alive.

ps -U useraccount | egrep -v "someApplication|someCommand" | awk '{print $2}' | xargs -t kill

The next example is very similar to the first one but it used to kill all of the processes of your account.

ps x | egrep -v "ssh|screen|ps|bash|awk|tail" | awk '{print $1}' | tail -n +2 | xargs -t kill

NOTE: USE WITH CAUTION!


Resolve IPs for Servers listed in a file using /etc/hosts

cat $NODEFILE | xargs -L 1 -I xx grep xx /etc/hosts | awk '{print $1}'

*NOTES:$NODEFILE contains a list of Hostnames that you want their IPs resolved.
xargs is used to get each Hostname and use on its own as a filter for the grep command that will parse the /etc/hosts file. In other words for each hostname the commands xx grep xx /etc/hosts | awk ‘{print $1}’ are issued. Also it is important to explain what xx is: xx is a variable name that we use, in order to show to the grep command where and how we want it to use the hostname that we got from the /etc/hosts file.
awk is removing all columns but the first where the IPs should be listed there.