Text processing using tr, uniq, sort, sed and awk

Name: Example for text processing Shell commands - tr, uniq, sort, sed and awk
Rating: 1.3 (3065 reviews)
Author: chefgs

Introduction
Objective
Commands intro
Let us create the command
Let us breakdown the commands
Conclusion
References

Introduction

Text processing in shell script is always easy and effective using various commands.
In this blog, we will pick a sample sentence and see the use of commands tr, uniq, sort, sed and awk to process the sentence
These commands are very effective and has user friendly options to process and print string as per user convenience
In real-time usage - The text processing will be needed in case if we need to process a larger log files, and print the required values as per user convenience

Objective

For the given string, we need to count the number of occurrences of alphanumeric (strings) and print output to the console (in descending order) as per the format given below,

Example:

string1->4
string2->3
string3->3
string4->2

Commands intro

tr - Translates the given string in the specified pattern
uniq - Omits repeated string occurrences
sed - Text stream editor, one of the powerful sh command for filtering/transforming texts
sort - Sort the lines of string or files
awk - Pattern matching and text processing command, it is another powerful command like sed

Let us create the command

Store the given string in a variable

$ string="This is the Sample sentence, that contains repeated sample string exists more than once in the sample sentence. Repeat once more added in the sample string"

Here is the command to achieve our objective

$ echo $string | tr -c '[:alnum:]' '\n' | tr '[:upper:]' '[:lower:]' | sed '/^$/d' | sort | uniq -c | sort -nr | awk '{ print $2"->"$1 }'

Executing the command will print the output in console, as per the expected format

$ echo $string | tr -c '[:alnum:]' '\n' | tr '[:upper:]' '[:lower:]' | sed '/^$/d' | sort | uniq -c | sort -nr | awk '{ print $2"->"$1 }'
sample->4
the->3
string->2
sentence->2
once->2
more->2
in->2
this->1
that->1
than->1
repeated->1
repeat->1
is->1
exists->1
contains->1
added->1

Let us breakdown the commands

tr -c '[:alnum:]' '\n' -> Convert the paragraph (all alphanumeric chars) into one column single word per line
tr '[:upper:]' '[:lower:]' -> Convert upper case letters to lower case letters
sed '/^$/d' -> Remove empty newlines
sort -> Sort each line alphabetically
uniq -c -> Count the word occurrences and prefix lines by the number of occurrences
sort -nr -> Compare and sort according to string numerical value and print in reverse desc order
awk '{ print $2"->"$1 }' -> Print the sort output as per the expected format

Conclusion

In this blog, we discussed some of the text processing shell commands and printed a sample output as per the expected format
As a reader, you can explore the options of each commands from the man pages and learn more about it and apply when needed

Thanks for reading!

References

Man Pages

Example for text processing Shell commands - tr, uniq, sort, sed and awk