The following code will find all files that match the pattern 2016_*_*.log
(all the log files for the year 2016).
To avoid finding log files from other services than the Web API service, we filter only the files that their path contains the folder webapi
. Specifically, we used "/ServerLogs/*/webapi/*"
with the following command to match all files that are under the folder /ServerLogs/
and somewhere in the path there is another folder named webapi
, we do that to match files that are like /ServerLogs/Production/01/webapi/*
only. The way we coded our regular expression, it will not match if there is a folder called webapi
directly under the /ServerLogs/ (e.g. /ServerLogs/webapi/*
).
For each result, we execute an awk
script that will split the lines using the comma
(FS=",";
) character, then check if the line contains exactly 4 tokens (if (NF == 4) {
). Later, we get the 4th token and check if it contains the substring "MASTER="
(if (match($4,"MASTER=")) {
), if it does contain it we split
it using the space
character and assign the result to the variable named tokens
. From tokens
, we get the first token and use substr
to remove the first character. Finally, we use the formatted result to create an array where the keys are the values we just created and it is used as a hashmap to keep record of all unique strings. In the end clause, we print all the elements of our hash map.
Finally, we sort all the results from all the awk executions and remove duplicates using sort --unique
.
find /ServerLogs/ \
-iname "2016_*_*.log" \
-ipath "/ServerLogs/*/webapi/*" \
-exec awk '
BEGIN {
FS=",";
}
{
if (NF == 4) {
if (match($4,"MASTER=")) {
split($4, tokens, " ");
instances[substr(tokens[1], 2)];
}
}
}
END {
for (element in instances) {
print element;
}
}
' \
'{}' \; | sort --unique;
Following is the same code in one line.
find /ServerLogs/ -iname "2016_*_*.log" -ipath "/ServerLogs/*/webapi/*" -exec awk 'BEGIN {FS=",";} {if (NF == 4) {if (match($4,"MASTER=")){split($4, tokens, " "); instances[substr(tokens[1], 2)];}}} END {for (element in instances) {print element;}}' '{}' \; | sort --unique
Another way
Another way to do similar functionality would be the following
find /ServerLogs/ \
-iname "2016_*_*.log" \
-ipath "/ServerLogs/*/webapi/*" \
-exec sh -c '
grep "MASTER=" -s "$0" | awk "BEGIN {FS=\",\";} NF==4" | cut -d "," -f4 | cut -c 3- | cut -d " " -f1 | sort --unique
' \
'{}' \; | sort --unique;
What we changed is the -exec
part. Instead of calling a awk
script, we create a new sub-shell using sh -c
, then we define the source to be executed inside the single codes and we pass as the first parameter of the shell the filename that matched.
Inside the shell, we find all lines that contain the string MASTER=
using the grep
command. Later we filter out all lines that do not have four columns when we tokenize using the comma
character using awk
. Then, we get the 4th column using cut
and delimiter the comma
. We remove the first two characters of the input string using cut -c 3-
and later we get only the first column by reusing cut
and changing the delimiter to be the space character. With those results we perform a sort that eliminates duplicates and we pass the results to the parent process to perform other operations.
Following is the same code in one line
find /ServerLogs/ -iname "2016_*_*.log" -ipath "/ServerLogs/*/webapi/*" -exec sh -c 'grep "MASTER=" -s "$0" | awk "BEGIN {FS=\",\";} NF==4" | cut -d "," -f4 | cut -c 3- | cut -d " " -f1 | sort --unique' '{}' \; | sort --unique;