new line


Add a new line whenever the first column changes 1

Recently we were processing some results from an SQL query on the command line, we were grouping the results based on the first column of the query results and we wanted to add an empty line between each group of data.

Instead of messing with SQL specifics, we decided to use awk to finish the task which resulted in the following code:

awk -F '|' -v column=1 'NR>1 && $column != previous { print "" } { previous = $column } { print $0 }'

Explanation:

  • -F fs or --field-separator fs: This option allowed us to choose the input field separator, in other words the character that should be used to split our text into columns. We used the value | because that character is the default column delimiter for sqlite.
  • -v var=val or --assign var=val: We used the -v option to define a variable to be used later on in the script. The value 1 was assigned to the variable column before execution of the program began and it was available event to the BEGIN rule of the AWK program. We did this to make the code a bit more modular, we could have just hardcoded the number in.
  • NR>1 && $column != previous { print "" } : Here we defined an if statement that checks two options: First we make sure that we are not on the first line of the input by using the NR>1 (and thus avoid creating an empty line which will be the first line of the output).
    Second, we check that the last value we had for the column of interest did change since last time. (We still did not define the value of previous, it is on the next step). When both statements are true (we are not on the first row and the value of the column in the current row is different than the value of the column in the previous row) it will print out an empty line.
  • { previous = $column }: This part is executed on ALL lines (even the first one) no matter what the values are. What this line does is to translate the value of the column variable from being a number (the index of the column that we are interested in) into the actual value that the column has at that specific line. That value is then copied to the previous variable to allow us to perform the check in the previous point once we move to the next line.
  • { print $0 }: Finally, this part is also executed on all lines and it instructs awk to print the input row whole and as is. This whole part could be replaced by a true value like the value 1. In awk as you see in this example, you define a series of operations. Each operation is constructed by a pattern to be matched and an action. Each pattern is evaluated for each input line, and in the cases where the pattern matches, the action is executed. The user can choose to omit either the pattern or the action for any operation. When a pattern is omitted, the action is executed on every line. When the action is omitted, then awk will execute { print $0 }. So, by adding a true value on its own it will be translated as on each line execute { print $0 } which prints the whole row as is.

Example

1|1|0.0564904019731175
1|2|0.103176086258974
1|3|0.12910406904073
1|4|0.188592489201024
1|5|0.169676224898487
1|6|0.164690820027741
1|7|0.128458728519047
1|8|0.18549773544014
1|9|0.155677575617836
1|10|0.153941343314285
2|1|0.217221158956016
2|2|0.23390973064067
2|3|0.180231657220626
2|4|0.257673927303071
2|5|0.261393785194329
2|6|0.273441488895552
2|7|0.242815632929545
2|8|0.262269697286057
2|9|0.256054399760891
2|10|0.262613705138411
3|1|0.378589461360716
3|2|0.33008177312116
3|3|0.380973166776554
3|4|0.340431190160728
3|5|0.38189416214207
3|6|0.364842933594872
3|7|0.372958396398964
3|8|0.350010176652464
3|9|0.355815612501188
3|10|0.380553180349294

Will become

1|1|0.0564904019731175
1|2|0.103176086258974
1|3|0.12910406904073
1|4|0.188592489201024
1|5|0.169676224898487
1|6|0.164690820027741
1|7|0.128458728519047
1|8|0.18549773544014
1|9|0.155677575617836
1|10|0.153941343314285

2|1|0.217221158956016
2|2|0.23390973064067
2|3|0.180231657220626
2|4|0.257673927303071
2|5|0.261393785194329
2|6|0.273441488895552
2|7|0.242815632929545
2|8|0.262269697286057
2|9|0.256054399760891
2|10|0.262613705138411

3|1|0.378589461360716
3|2|0.33008177312116
3|3|0.380973166776554
3|4|0.340431190160728
3|5|0.38189416214207
3|6|0.364842933594872
3|7|0.372958396398964
3|8|0.350010176652464
3|9|0.355815612501188
3|10|0.380553180349294
Advertisements

Use awk to print the last N columns of a file or a pipe

In this post we will describe a way to print the last N number of columns in awk.

We will use this code as example, where we will print the last 2 columns only:

awk '{n = 2; for (--n; n >= 0; n--){ printf "%s\t",$(NF-n)} print ""}';
'

In the awk script we use the variable n to control how many columns we want to print. In the above example we initialized it  to the value 2 as that is the number of columns we want printed.

After, we use a for loop to iterate over the fields (in this case the last two fields) and we print them to the screen using printf "%s\t",$(NF-n) to avoid printing the new line character and to separate them with a tab character.

NF is a special variable in awk that holds the total number of fields available on that line. If you do not change the delimiter, then it will hold the number of words on the line.

$(NF-n) is the way we ask awk to gives us the variable value that is n places before the last.

Outside the loop we print "" to print the new line character between input rows.

Examples:

If we want to print the last two columns of the ls -l command we can do it as follows:

ls -l | awk '{i = 2; for (--i; i >= 0; i--){ printf "%s\t",$(NF-i)} print ""}';

If we want to print the last two columns of the /etc/passwd file we can do it as follows:

awk -F ':' '{i = 2; for (--i; i >= 0; i--){ printf "%s\t",$(NF-i)} print ""}' /etc/passwd;

Note that we change the delimiter with the command line argument -F ":"


Creating an MD5 hash of a string in bash

Make sure that you are not including the new line character (\n) as well in your string, there are cases where it might be included without you explicitly writing it.
For example if you use the output of an echo command in bash, it will automatically add a new line character to the end and will cause your hash to be not the one you would like to have.
For example, the MD5 hash of ‘bytefreaks.net’ (without the quotes) should be 16c00d9cfaef1688d4f2ddfb11b60f46 but if you execute the following you will see you will get a different result.
echo 'bytefreaks.net' | md5sum
01c46835dcb79be359e0b464ae6c6156 -

To avoid this error, use the -n option for echo that will direct the command not to output a trailing new line character.
echo -n 'bytefreaks.net' | md5sum
16c00d9cfaef1688d4f2ddfb11b60f46 -