AWK Tutorial

AWK - original from AT&T
NAWK - newer version from AT&T ( BSD and OSX )
GAWK - GNU version of AWK ( Linux )

Install

Chances are that you almost certainly already have AWK installed.

Install AWK on Ubuntu or Debian based systems:


sudo apt-get update
sudo apt-get install gawk

Install on RedHat based systems:


sudo yum install gawk

Build AWK from source:


wget http://ftp.gnu.org/gnu/gawk/gawk-4.1.1.tar.xz
tar xvf gawk-4.1.1.tar.xz
./configure
make
make check
sudo make install
which awk
/usr/bin/awk

Core Functionality

This is the basic lifecycle of an AWK program:

Run commands in BEGIN block
Loop until end of file
- Read line
- Execute commands on line
Run commands in END block

This is what the basic format looks like:


BEGIN {awk-commands}
/pattern/ {awk-commands}
END {awk-commands}

Basic Common Commands

One of the most common use cases will be to print selected columns. These can be separated by spaces or whatever string you choose. Input can come from a file but very often will just be piped in from STDIN.

Pipe input from STDIN. Print selected columns separated by spaces:


ps -ef | awk '{print $3, $5, $7}'

Read from a file. Print selected columns separated by spaces:


awk '{print $3, $5, $7}'  test1.txt

Separate with a space or with a custom string:


ps -ef | awk '{print $3" "$5" -- "$7}'

Print a header. Print columns 2, 3, and 7. Print a footer.


awk 'BEGIN{printf "Col1\tCol2\tCol3\n"} {print $2"\t"$3"\t"$7} END{print "Done"}' test1.txt

Field Separator and Variables

Two different ways to change the field separator. Both of these will change the field separator to a colon “:”.


ps -ef | awk -F: '{print $1}'    
ps -ef | awk -v FS=: '{print $1}'

Change it to a slash or a comma:


ps -ef |awk -F/ '{print $3, $5}'
ps -ef |awk -F, '{print $3, $5}'

Use -v to assign variables before the program executes


awk -v var1=xyz 'BEGIN{printf "var1 = %s\n", var1}'

Matching / Filtering / Counting / Replace

Match a pattern ( ex: “tty” ) and print each matching line. Both of these commands do the same thing.


ps -ef | awk ' /tty/ {print}'
ps -ef | awk '/tty/'

Match a pattern and select columns:


ps -ef | awk ' /tty/ {print $3, $5}'

Match and print everything:


ps -ef | awk '//'

Count lines that match a pattern and print the total:


ps -ef | awk '/test/{++c} END {print "Total matched: ", c}'

Only print lines with under 100 characters:


ps -ef | awk 'length($0) < 100'

Replace or substitute one string for another like this:


ps -ef | awk '{sub(/user1/,"user2");print }'

More Useful Commands

Print entire line with either of these commands. These basically just print out everything. The variable $0 refers to every all fields or the entire line. This is also implied by default if you leave it out.


ps -ef | awk '{print}'
ps -ef | awk '{print $0}'

Do nothing:


awk '' file1.txt

Print out all global variables:


awk -d ''
cat awkvars.out

Get help:


awk --help

Running an AWK Script Loaded From a File

AWK commands can be read from a script like this while taking input from a pipe:


ps -ef | awk -f test1.awk

Running commands from an AWK script while reading input from a file:


awk -f test1.awk  output.txt

An awk script might look like this. Note that the pattern to be matched needs to be on the same line as the opening curly bracket.


BEGIN{printf "Col1\tCol2\tCol3\n"} 
/tty/{

print $2"\t"$3"\t"$7

}
END{print "Done"}

Flow Control

Here is a basic example of an if statement. If the condition is true it will print out a message.


awk 'BEGIN {x = 5; if (x * 2 == 10) printf "%d matched.\n", x }'

You can also check for string equality:


awk 'BEGIN {x = "xyz"; if (x == "xyz") printf "%s matched.\n", x }'

This is a basic if / else statement. If the expression is true the first statement is run. Otherwise it runs the second statement after the ‘else’ keyword.


awk 'BEGIN {
   x = 7; 
   if (x > 5 ) print "over";
   else print "under or equal";
}'

You can check for multiple conditions using “if”, “else if”, and “else”:


awk 'BEGIN {
   x = 789;
   
   if (x==123)
       print "It is 123";
   else if (x == 456)
       print "it is 456";
   else if (x == 789)
       print "it is 789";
   else
      print "something else"
}'

Arrays

These are basically associative arrays.

Create an array, assign values, and print them like this:


awk 'BEGIN {
   array1["tree"] = "green";
   array1["rock"] = "brown";
   array1[23] = "red";
   print array1["tree"] " " array1[23];
}'

Delete values like this:


awk 'BEGIN {
   array1["abc"] = "xyz";
   delete array1["abc"];
}'

Arrays can be indexed with a key. If the key is an integer it will be converted to a string. These three commands do the same thing.


awk 'BEGIN {array1["123"] = 10; print array1["123"];}'
awk 'BEGIN {array1["123"] = 10; print array1[123];}'  
awk 'BEGIN {array1[123] = 10;   print array1[123];}'

Multidimensional arrays don’t really exist in AWK but they can be simulated by passing a value that looks like this for the key ( it will be converted to a string ).


awk 'BEGIN {array1[1,2] = 10; print array1[1,2];}'

Loops

For Loop:


awk 'BEGIN { for (c = 0; c <= 10; c++) print c }'

While Loop:


awk 'BEGIN {c = 0; while (c <= 10) { print c; c++ } }'

Do-While Loop - statement is executed at least once.


awk 'BEGIN {c = 0; do { print c; c++ } while (c <= 10) }'

Break - stop executing the loop.


awk 'BEGIN {
    for (c = 0; c < 100; c++) { 
        if (c > 50) break; else print c 
    } 
}'

Continue - continue to next iteration.


awk 'BEGIN {
   for (c = 0; c < 10; c++) {
      if (c == 5) continue;
      print c;
   } 
}'

Exit - Stop executing the script. This doesn’t need to be in a loop.


awk 'BEGIN {
   for (c = 0; c < 10; c++) {
      if (c == 5) exit(10);
      print c;
   } 
}'

User Defined Functions

A function defined and called on a single line:


awk 'function test(n){return n*2} BEGIN {  print test(5) }'

Here is a slightly longer example showing one function that calls another function. It uses return values and multiple parameters.



awk '
function test1(x, y){
   z = x * y;
   return z;
}

function main(a, b){
   c = test1(35, 25);
   print c;
}

# script actually starts running here
BEGIN {
   main(10, 20)
}'

Regular Expressions

Match the string “abc”:


ps -ef | awk '/root/'

Exclude ( negate the expression ):


ps -ef | awk '!/root/'

Match any character:


ps -ef | awk '/google.chrome/'

Match begining of line:


ps -ef | awk '/^root/'

Match end of line:


ps -ef | awk '/chrome$/'

Match any character in a character class:


ps -ef | awk '/user[12]/'
ps -ef | awk '/user[0-9]/'
ps -ef | awk '/use[a-z][0-9]/'

Exclude character class:


ps -ef | awk '/user[^12]/'

Logical OR:


ps -ef | awk '/user1|root/'

Zero or one of a character:


ps -ef | awk '/userx?/'

Zero or more:


ps -ef | awk '/userx*/'

One or more:


ps -ef  | awk '/userx+/'

Match any character, zero or more times:


ps -ef  | awk '/user.*Dec/'

Greedy vs Stingy - AWK aparently lacks the ability to handle stingy matching but there are workarounds for this.

Grouping with logical OR:


ps -ef | awk  '/Dec(26|27)/'

Standard Variables

$0 - entire input record
$1 - first field
$2 - second field
RS - record separator, defaults to newline
RSTART - first matched position in string, used with match function
SUBSEP - array subscript separator character
ARGC - number of command line arguments


awk 'BEGIN {print "Total args: ", ARGC}' a b c d

ARGV - the actual command line arguments, zero is the command name, 1 is the first arg


awk 'BEGIN {print "Command name: ", ARGV[0]}' a b c d
awk 'BEGIN {print "First arg: ", ARGV[1]}' a b c d
awk 'BEGIN {print "Second arg: ", ARGV[2]}' a b c d

Print all args:


awk 'BEGIN { for (c = 0; c < ARGC - 1; ++c) { print ARGV[c] } }' a b c d

CONVFMT - conversion format


awk 'BEGIN { print "Conversion Format: ", CONVFMT }'

ENVIRON - hash of environment variables


awk 'BEGIN { print ENVIRON["USER"] }'

FILENAME - the current file name. Shows “-“ for STDIN.


awk 'END {print FILENAME}' test1.txt
ps -ef | awk 'END {print FILENAME}'

FS - field separator


awk 'BEGIN {print "FS = " FS}' 
awk -v FS=: 'BEGIN {print FS}'

NF - number of fields

Print all lines with over 2 fields:


ps -ef | awk 'NF > 5'

print number of fields:


ps -ef | awk '{print NF}'

NR - record number ( line number )


ps -ef | awk '{print NR}'

FNR - Same as NR but only for the current file. Useful when working with multiple files.
OFMT - output format number
OFS - output field separator
ORS - output record separator
RLENGTH - length of matched string, used with match function


awk 'BEGIN { if (match("abc xyz 123", "bc")) { print RLENGTH } }'

GAWK Variables

ARGIND - argv index of current file being processed
BINMODE - binary mode for non-POSIX systems
ERRNO - error with getline or close calls
FIELDWIDTHS - set field width instead of using a separator
IGNORECASE - ignore case
LINT - control the lint option
PROCINFO - hash with process info
TEXTDOMAIN - for localization info


awk 'BEGIN{IGNORECASE = 1} /amit/' marks.txt

Operators

Assignment

The assignment operator is a single “=”:


awk 'BEGIN { x = "abc"; print x }'

Arithmetic:

You can use the normal arithmetic operators for addition, subtraction, multiplication, division, and modulus:


awk 'BEGIN { 
    a = 25; 
    b = 35; 
    print a + b 
    print a - b 
    print a * b 
    print a / b 
    print a % b 
    }'

Either of these operators work for exponents:



awk 'BEGIN { 
    a = 25; 
    print a^2 
    print a**2 
    }'

Pre-Increment - increment, then assign


awk 'BEGIN { a = 25; b = ++a; print  a, b }'

Pre-Decrement - decrement, then assign


awk 'BEGIN { a = 25; b = --a; print a, b }'

Post-Increment - assign, then increment


awk 'BEGIN { a = 25; b = a++; print a, b }'

Post-Decrement - assign, then decrement


awk 'BEGIN { a = 25; b = a--; print a, b }'

Shorthand addition, subtraction, multiplication, division, modulus, and exponent:


awk 'BEGIN { 
    x = 5; 
    x += 2; print x
    x -= 2; print x
    x *= 2; print x
    x /= 2; print x
    x %= 2; print x
    x ^= 2; print x
    x **= 2; print x 
    }'

Unary plus and minus - multiply by 1 or -1:


awk 'BEGIN { 
    x = 25; 
    x = +x; print x
    x = -x; print x
    }'

Comparison:


awk 'BEGIN { 
    a = 25; 
    b = 25; 
    if (a == b)  print "true"; else print "false"
    if (a != b)  print "true"; else print "false" 
    if (a < b)   print "true"; else print "false"
    if (a <= b)  print "true"; else print "false"
    if (b > a )  print "true"; else print "false"
    if (b >= a ) print "true"; else print "false"
}'

Logical AND, OR, NOT

And:


awk 'BEGIN {
   x = 5; if (x > 3 && x < 8) print "got it"
   y = "abc"; if (y != "xyz" && y == "abc") print "got it"
}'

Or:


awk 'BEGIN {
      x = 5; if (x > 3 || x < 8) print "got it"
      y = "abc"; if (y == "xyz" || y == "abc") print "got it"
}'

Not:


awk 'BEGIN { a = "abc"; if (a !=  "xyz") print "got it" }'
awk 'BEGIN { b = ""; if (! length(b)) print "empty" }'

Ternary Operator

condition expression ? statement1 : statement2


awk 'BEGIN { a = 25; b = 35; (a > b) ? c = "larger" : c = "smaller"; print c}'

String concatenation:


awk 'BEGIN { a = "Hello "; b = "World"; c = a b; print c }'

Array membership operator “in”:


awk 'BEGIN { 
   a[0] = 1; 
   a[1] = 2; 
   a[2] = 3; 
   for (i in a) print i, a[i]
}'

Match or not match operator. Any line that contains the pattern.


ps -ef | awk '$0 ~ 1' 
ps -ef | awk '$0 !~ 1'

Built-in Functions

Math Functions

atan2(y, x)	arctangent in radians
cos(expr)	cosine
exp(expr)	exponential
int(expr)	truncate to int value
log(expr)	natural logarithm
rand(x)	random number
sin(x)	sine in radians
sqrt(expr)	square root
srand(x)	random number with optional seed value

awk ‘BEGIN { print(rand() “ “ rand()) }’

String Functions

asort(arr)	sort array
asorti(arr)	sort array by indexf
gsub(“abc”, “xyz”, str)	global substitution
index(x, sub)	check for substring
length(x)	get string length
match(x, regex)	find first longest match
split(str, arr, regex)	split string into array using regex
printf(fmt, list)	print based on variable list
strtonum(str)	convert string to number
sub(regex, sub, string)	replace first occurrence, $0 is used if string is ommitted
tolower(str)	to lower
toupper(str)	to upper

Time and Date Functions

systime()	seconds since epoc
mktime(“2018 11 19 32 15 05”)	seconds since epoc based on datespec string
strftime(“Time = %m/%d/%Y %H:%M:%S”, systime())	convert seconds since epoc to timestamp

These can be used for constructiong date format strings:

%a	weekday - short
%A	weekday - full
%b	month - short
%B	month full
%c	date / time by locale
%C	century
%d	day of month
%D	%m/%d/%y.
%e	day of month ( padded )
%F	%Y-%m-%d ISO 8601 standard date format
%g	week of the year ( goes past end of year )
%G	full year of week number as decimal
%h	same as %b.
%H	hour (00–23)
%I	hour (01–12)
%j	day of the year
%m	month of the year
%M	minute
%n	newline char
%p	AM/PM
%r	12-hour clock time for locale
%R	%H:%M.
%S	second
%t	just a tab char
%T	%H:%M:%S.
%u	weekday (1–7). Monday is is 1
%U	week number (00–53) starting on Sunday
%V	week number (01–53) starting on Monday
%w	weekday starting on Sunday (0–6)
%W	week number (00–53) starting on Monday
%x	date formated for locale
%X	time formated for locale
%y	year modulo 100 (00–99)
%Y	year - full
%z	time-zone offset in +HHMM format
%Z	time zone name

AWK - Bit Manipulation Functions

and(5,1)	bitwise and
compl(5)	bitwise complement
lshift(5,1)	bitwise left shift
rshift(5,1)	bitwise right shift
or(5,1)	bitwise or
xor(5,1)	bitsie xor

Other Functions

close(cmd)	close file of pipe
delete a[5]	delete array element
exit 5	exit script with optional return value
fflush()	flush buffers
getline	read the next line
next	stop and skip to next line
nextfile	stop and skip to next file
return 0	return from function
system()	run command and return status

Output Redirection

Redirect and overwrite:


awk 'BEGIN { print "testing..." > "test1.txt" }'

Redirect and append:


awk 'BEGIN { print "testing..." >> "test1.txt" }'

Pipe:


awk 'BEGIN { print "testing..." | "tr [a-z] [A-Z]" }'

Two way communication:


awk '
BEGIN {
  cmd = "tr a b "
  print "aaabbb" |& cmd
  close(cmd, "to")
   
  cmd |& getline out
  print out;
  close(cmd);
}
'

Output formatting

You can use either print or printf. Printf gives you a variety of different foratting options.

Newline:


awk 'BEGIN { printf "Hello\nWorld\n" }'

Form feed:


awk 'BEGIN { printf "test\ftest\ftest\ftest\n" }'

Tab:


awk 'BEGIN { printf "test\ttest\ttest\ttest\n" }'

Vertical tab:


awk 'BEGIN { printf "test\vtest\vtest\vtest\n" }'

Backspace:


awk 'BEGIN { printf "test1\btest2test3\b\n" }'

You can use printf to include variables in strings.


awk 'BEGIN { a = "abc"; b = 123; printf "%s -- %d", a, b }'

Formatting place holders for use with printf:

%c	print a character, first char of string or char with this int value
%d and %i	print integer, truncate decimal
%e and %E	floating point number in this format form [-]d.dddddde[+-]dd.
%f	print a floating point number of the form [-]ddd.dddddd.
%g and %G	like %e or %f but suppress non-significant zeros
%o	print unsigned octal
%u	print unsigned decimal
%s	print a string
%x and %X	print unsigned hex
%%	print %

Use a number to specify padding with spaces.
Start that number with a zero to lead with zeros instead of spaces.

%5d	5 spaces of padding
%50d	50 spaces of padding
%05d	5 leading zeros
%050d	50 leading zeros

%+5d	left justify
%+5d	show prefix even if positive
%#o	show leading 0
%#X	show leading 0x