Add a new line whenever the first column changes 3


Recently we were processing some results from an SQL query on the command line, we were grouping the results based on the first column of the query results and we wanted to add an empty line between each group of data.

Instead of messing with SQL specifics, we decided to use awk to finish the task which resulted in the following code:

awk -F '|' -v column=1 'NR>1 && $column != previous { print "" } { previous = $column } { print $0 }'

Explanation:

  • -F fs or --field-separator fs: This option allowed us to choose the input field separator, in other words the character that should be used to split our text into columns. We used the value | because that character is the default column delimiter for sqlite.
  • -v var=val or --assign var=val: We used the -v option to define a variable to be used later on in the script. The value 1 was assigned to the variable column before execution of the program began and it was available event to the BEGIN rule of the AWK program. We did this to make the code a bit more modular, we could have just hardcoded the number in.
  • NR>1 && $column != previous { print "" } : Here we defined an if statement that checks two options: First we make sure that we are not on the first line of the input by using the NR>1 (and thus avoid creating an empty line which will be the first line of the output).
    Second, we check that the last value we had for the column of interest did change since last time. (We still did not define the value of previous, it is on the next step). When both statements are true (we are not on the first row and the value of the column in the current row is different than the value of the column in the previous row) it will print out an empty line.
  • { previous = $column }: This part is executed on ALL lines (even the first one) no matter what the values are. What this line does is to translate the value of the column variable from being a number (the index of the column that we are interested in) into the actual value that the column has at that specific line. That value is then copied to the previous variable to allow us to perform the check in the previous point once we move to the next line.
  • { print $0 }: Finally, this part is also executed on all lines and it instructs awk to print the input row whole and as is. This whole part could be replaced by a true value like the value 1. In awk as you see in this example, you define a series of operations. Each operation is constructed by a pattern to be matched and an action. Each pattern is evaluated for each input line, and in the cases where the pattern matches, the action is executed. The user can choose to omit either the pattern or the action for any operation. When a pattern is omitted, the action is executed on every line. When the action is omitted, then awk will execute { print $0 }. So, by adding a true value on its own it will be translated as on each line execute { print $0 } which prints the whole row as is.

Example

1|1|0.0564904019731175
1|2|0.103176086258974
1|3|0.12910406904073
1|4|0.188592489201024
1|5|0.169676224898487
1|6|0.164690820027741
1|7|0.128458728519047
1|8|0.18549773544014
1|9|0.155677575617836
1|10|0.153941343314285
2|1|0.217221158956016
2|2|0.23390973064067
2|3|0.180231657220626
2|4|0.257673927303071
2|5|0.261393785194329
2|6|0.273441488895552
2|7|0.242815632929545
2|8|0.262269697286057
2|9|0.256054399760891
2|10|0.262613705138411
3|1|0.378589461360716
3|2|0.33008177312116
3|3|0.380973166776554
3|4|0.340431190160728
3|5|0.38189416214207
3|6|0.364842933594872
3|7|0.372958396398964
3|8|0.350010176652464
3|9|0.355815612501188
3|10|0.380553180349294

Will become

1|1|0.0564904019731175
1|2|0.103176086258974
1|3|0.12910406904073
1|4|0.188592489201024
1|5|0.169676224898487
1|6|0.164690820027741
1|7|0.128458728519047
1|8|0.18549773544014
1|9|0.155677575617836
1|10|0.153941343314285

2|1|0.217221158956016
2|2|0.23390973064067
2|3|0.180231657220626
2|4|0.257673927303071
2|5|0.261393785194329
2|6|0.273441488895552
2|7|0.242815632929545
2|8|0.262269697286057
2|9|0.256054399760891
2|10|0.262613705138411

3|1|0.378589461360716
3|2|0.33008177312116
3|3|0.380973166776554
3|4|0.340431190160728
3|5|0.38189416214207
3|6|0.364842933594872
3|7|0.372958396398964
3|8|0.350010176652464
3|9|0.355815612501188
3|10|0.380553180349294

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

3 thoughts on “Add a new line whenever the first column changes

  • Len Richards

    why are answers always based on non windows systems where ” is required instead of ‘ this answer with all the ” and ‘ is impossible to figure out to make on windows based gnu system

    • George

      A terminal-based application has no user interface, so the user interacts with the application by entering a command, which returns text output directly. Compared to GUIs, these applications are lightweight, require fewer system resources, and launch faster.

      Terminal-based applications can be launched locally or remotely. Therefore, you can log onto a system remotely, using a terminal emulator, and then launch the application by entering commands in the terminal. The application is very easy to control remotely as long as a terminal emulator is installed on the system.

      These types of applications can easily get integrated into multiple other applications without the need for extensive customization.

      From a UX perspective, terminal consoles have a few key advantages over GUI’s. These advantages are rarely relevant to end-users, which is why CLI’s are used almost exclusively by technical users and “power users” these days.

      *Things done in a terminal are easily repeatable.*

      Imagine needing to configure 5 computers for some purpose. For this small number, it’s probably not worthwhile to create an automated solution to configure them all. You’ll just tweak the settings of each.

      On a terminal, it’s simple to copy and paste the commands. On a GUI, you might get familiar with where the configurations are stored, but may still be left with a fair deal of clicking. Mistakes are more likely.

      This also is relevant to documentation and technical help. Providing four lines to copy and paste into a terminal console might replace a couple paragraphs of explanation of how to click into various admin tools (especially since the GUI directions might only be perfect for one of several interfaces, see “GUI interfaces are countless” below).

      *Tracking/auditing is easier*

      A command-line interface (CLI) retains a record of what you’ve done, which you can easily extract and keep. When something goes wrong your boss asks, “Are you sure you did it exactly right?” you can have evidence to prove you did. Even if you did make a mistake, you can more easily see it and learn from it.

      *Scripting/automating is nearly as easy as entering commands*

      Everyone knows how to use a mouse to open a file in a GUI. Far fewer people know how to make that happen automatically. With a CLI, automating the task can be almost as easy as typing it in by hand.

      *Technological Constraints can still make CLI’s compelling*

      Logging into a GUI box remotely requires a fair deal of bandwidth, and latency (delay in the connection) can make the experience frustrating. A CLI remote login has a much lower bandwidth requirement, since it is just transmitting a little text back and forth. And latency is much easier to deal with when you are typing, than when you are trying to hover a mouse over an icon.

      *A CLI can be used for Input/Output*

      Some have called this The Age of the API, where systems are built on top of systems. A CLI allows a similar interoperability. In the question’s example about Git, it would be possible to hook another “enterprise system” – such as a defect tracking system.

      Compared to a SOAP or even Rest API, CLI commands can be easy to develop and test.

      *There are just a few standard shells; GUI interfaces are countless*

      If you log into the GUI admin interface of a few different web hosting services, you’ll probably see that they are all different. It’s not hard to poke around and figure out where everything is, but it takes time. Next year, the host may have upgraded their solution, and now everything is in a slightly different place — one more thing to learn.

      That hosting service probably still uses the same CLI. The new version is backward compatible with the old. Users familiar with Bash, PowerShell, or whatever CLI is on the system can eliminate the ramp-up time spent familiarizing (or re-familiarizing) themselves with that particular GUI.