Example for text processing Shell commands - tr, uniq, sort, sed and awk

Saravanan Gnanaguru - May 10 '22 - - Dev Community

Text processing using tr, uniq, sort, sed and awk

Table of Contents


Introduction

  • Text processing in shell script is always easy and effective using various commands.
  • In this blog, we will pick a sample sentence and see the use of commands tr, uniq, sort, sed and awk to process the sentence
  • These commands are very effective and has user friendly options to process and print string as per user convenience
  • In real-time usage - The text processing will be needed in case if we need to process a larger log files, and print the required values as per user convenience

Objective

For the given string, we need to count the number of occurrences of alphanumeric (strings) and print output to the console (in descending order) as per the format given below,

Example:

string1->4
string2->3
string3->3
string4->2
Enter fullscreen mode Exit fullscreen mode

Commands intro

  • tr - Translates the given string in the specified pattern
  • uniq - Omits repeated string occurrences
  • sed - Text stream editor, one of the powerful sh command for filtering/transforming texts
  • sort - Sort the lines of string or files
  • awk - Pattern matching and text processing command, it is another powerful command like sed

Let us create the command

  • Store the given string in a variable
$ string="This is the Sample sentence, that contains repeated sample string exists more than once in the sample sentence. Repeat once more added in the sample string"
Enter fullscreen mode Exit fullscreen mode
  • Here is the command to achieve our objective
$ echo $string | tr -c '[:alnum:]' '\n' | tr '[:upper:]' '[:lower:]' | sed '/^$/d' | sort | uniq -c | sort -nr | awk '{ print $2"->"$1 }'
Enter fullscreen mode Exit fullscreen mode
  • Executing the command will print the output in console, as per the expected format
$ echo $string | tr -c '[:alnum:]' '\n' | tr '[:upper:]' '[:lower:]' | sed '/^$/d' | sort | uniq -c | sort -nr | awk '{ print $2"->"$1 }'
sample->4
the->3
string->2
sentence->2
once->2
more->2
in->2
this->1
that->1
than->1
repeated->1
repeat->1
is->1
exists->1
contains->1
added->1
Enter fullscreen mode Exit fullscreen mode

Let us breakdown the commands

  • tr -c '[:alnum:]' '\n' -> Convert the paragraph (all alphanumeric chars) into one column single word per line
  • tr '[:upper:]' '[:lower:]' -> Convert upper case letters to lower case letters
  • sed '/^$/d' -> Remove empty newlines
  • sort -> Sort each line alphabetically
  • uniq -c -> Count the word occurrences and prefix lines by the number of occurrences
  • sort -nr -> Compare and sort according to string numerical value and print in reverse desc order
  • awk '{ print $2"->"$1 }' -> Print the sort output as per the expected format

Conclusion

  • In this blog, we discussed some of the text processing shell commands and printed a sample output as per the expected format
  • As a reader, you can explore the options of each commands from the man pages and learn more about it and apply when needed

Thanks for reading!

References

Man Pages


Follow me on,

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .