AWK Tutorial
- AWK - original from AT&T
- NAWK - newer version from AT&T ( BSD and OSX )
- GAWK - GNU version of AWK ( Linux )
Install
Chances are that you almost certainly already have AWK installed.
Install AWK on Ubuntu or Debian based systems:
sudo apt-get update
sudo apt-get install gawk
Install on RedHat based systems:
sudo yum install gawk
Build AWK from source:
wget http://ftp.gnu.org/gnu/gawk/gawk-4.1.1.tar.xz
tar xvf gawk-4.1.1.tar.xz
./configure
make
make check
sudo make install
which awk
/usr/bin/awk
Core Functionality
This is the basic lifecycle of an AWK program:
- Run commands in BEGIN block
- Loop until end of file
- Read line
- Execute commands on line
- Run commands in END block
This is what the basic format looks like:
BEGIN {awk-commands}
/pattern/ {awk-commands}
END {awk-commands}
Basic Common Commands
One of the most common use cases will be to print selected columns. These can be separated by spaces or whatever string you choose. Input can come from a file but very often will just be piped in from STDIN.
Pipe input from STDIN. Print selected columns separated by spaces:
ps -ef | awk '{print $3, $5, $7}'
Read from a file. Print selected columns separated by spaces:
awk '{print $3, $5, $7}' test1.txt
Separate with a space or with a custom string:
ps -ef | awk '{print $3" "$5" -- "$7}'
Print a header. Print columns 2, 3, and 7. Print a footer.
awk 'BEGIN{printf "Col1\tCol2\tCol3\n"} {print $2"\t"$3"\t"$7} END{print "Done"}' test1.txt
Field Separator and Variables
Two different ways to change the field separator. Both of these will change the field separator to a colon “:”.
ps -ef | awk -F: '{print $1}'
ps -ef | awk -v FS=: '{print $1}'
Change it to a slash or a comma:
ps -ef |awk -F/ '{print $3, $5}'
ps -ef |awk -F, '{print $3, $5}'
Use -v to assign variables before the program executes
awk -v var1=xyz 'BEGIN{printf "var1 = %s\n", var1}'
Matching / Filtering / Counting / Replace
Match a pattern ( ex: “tty” ) and print each matching line. Both of these commands do the same thing.
ps -ef | awk ' /tty/ {print}'
ps -ef | awk '/tty/'
Match a pattern and select columns:
ps -ef | awk ' /tty/ {print $3, $5}'
Match and print everything:
ps -ef | awk '//'
Count lines that match a pattern and print the total:
ps -ef | awk '/test/{++c} END {print "Total matched: ", c}'
Only print lines with under 100 characters:
ps -ef | awk 'length($0) < 100'
Replace or substitute one string for another like this:
ps -ef | awk '{sub(/user1/,"user2");print }'
More Useful Commands
Print entire line with either of these commands. These basically just print out everything. The variable $0 refers to every all fields or the entire line. This is also implied by default if you leave it out.
ps -ef | awk '{print}'
ps -ef | awk '{print $0}'
Do nothing:
awk '' file1.txt
Print out all global variables:
awk -d ''
cat awkvars.out
Get help:
awk --help
Running an AWK Script Loaded From a File
AWK commands can be read from a script like this while taking input from a pipe:
ps -ef | awk -f test1.awk
Running commands from an AWK script while reading input from a file:
awk -f test1.awk output.txt
An awk script might look like this. Note that the pattern to be matched needs to be on the same line as the opening curly bracket.
BEGIN{printf "Col1\tCol2\tCol3\n"}
/tty/{
print $2"\t"$3"\t"$7
}
END{print "Done"}
Flow Control
Here is a basic example of an if statement. If the condition is true it will print out a message.
awk 'BEGIN {x = 5; if (x * 2 == 10) printf "%d matched.\n", x }'
You can also check for string equality:
awk 'BEGIN {x = "xyz"; if (x == "xyz") printf "%s matched.\n", x }'
This is a basic if / else statement. If the expression is true the first statement is run. Otherwise it runs the second statement after the ‘else’ keyword.
awk 'BEGIN {
x = 7;
if (x > 5 ) print "over";
else print "under or equal";
}'
You can check for multiple conditions using “if”, “else if”, and “else”:
awk 'BEGIN {
x = 789;
if (x==123)
print "It is 123";
else if (x == 456)
print "it is 456";
else if (x == 789)
print "it is 789";
else
print "something else"
}'
Arrays
These are basically associative arrays.
Create an array, assign values, and print them like this:
awk 'BEGIN {
array1["tree"] = "green";
array1["rock"] = "brown";
array1[23] = "red";
print array1["tree"] " " array1[23];
}'
Delete values like this:
awk 'BEGIN {
array1["abc"] = "xyz";
delete array1["abc"];
}'
Arrays can be indexed with a key. If the key is an integer it will be converted to a string. These three commands do the same thing.
awk 'BEGIN {array1["123"] = 10; print array1["123"];}'
awk 'BEGIN {array1["123"] = 10; print array1[123];}'
awk 'BEGIN {array1[123] = 10; print array1[123];}'
Multidimensional arrays don’t really exist in AWK but they can be simulated by passing a value that looks like this for the key ( it will be converted to a string ).
awk 'BEGIN {array1[1,2] = 10; print array1[1,2];}'
Loops
For Loop:
awk 'BEGIN { for (c = 0; c <= 10; c++) print c }'
While Loop:
awk 'BEGIN {c = 0; while (c <= 10) { print c; c++ } }'
Do-While Loop - statement is executed at least once.
awk 'BEGIN {c = 0; do { print c; c++ } while (c <= 10) }'
Break - stop executing the loop.
awk 'BEGIN {
for (c = 0; c < 100; c++) {
if (c > 50) break; else print c
}
}'
Continue - continue to next iteration.
awk 'BEGIN {
for (c = 0; c < 10; c++) {
if (c == 5) continue;
print c;
}
}'
Exit - Stop executing the script. This doesn’t need to be in a loop.
awk 'BEGIN {
for (c = 0; c < 10; c++) {
if (c == 5) exit(10);
print c;
}
}'
User Defined Functions
A function defined and called on a single line:
awk 'function test(n){return n*2} BEGIN { print test(5) }'
Here is a slightly longer example showing one function that calls another function. It uses return values and multiple parameters.
awk '
function test1(x, y){
z = x * y;
return z;
}
function main(a, b){
c = test1(35, 25);
print c;
}
# script actually starts running here
BEGIN {
main(10, 20)
}'
Regular Expressions
Match the string “abc”:
ps -ef | awk '/root/'
Exclude ( negate the expression ):
ps -ef | awk '!/root/'
Match any character:
ps -ef | awk '/google.chrome/'
Match begining of line:
ps -ef | awk '/^root/'
Match end of line:
ps -ef | awk '/chrome$/'
Match any character in a character class:
ps -ef | awk '/user[12]/'
ps -ef | awk '/user[0-9]/'
ps -ef | awk '/use[a-z][0-9]/'
Exclude character class:
ps -ef | awk '/user[^12]/'
Logical OR:
ps -ef | awk '/user1|root/'
Zero or one of a character:
ps -ef | awk '/userx?/'
Zero or more:
ps -ef | awk '/userx*/'
One or more:
ps -ef | awk '/userx+/'
Match any character, zero or more times:
ps -ef | awk '/user.*Dec/'
Greedy vs Stingy - AWK aparently lacks the ability to handle stingy matching but there are workarounds for this.
Grouping with logical OR:
ps -ef | awk '/Dec(26|27)/'
Standard Variables
- $0 - entire input record
- $1 - first field
- $2 - second field
-
RS - record separator, defaults to newline
- RSTART - first matched position in string, used with match function
-
SUBSEP - array subscript separator character
- ARGC - number of command line arguments
awk 'BEGIN {print "Total args: ", ARGC}' a b c d
- ARGV - the actual command line arguments, zero is the command name, 1 is the first arg
awk 'BEGIN {print "Command name: ", ARGV[0]}' a b c d
awk 'BEGIN {print "First arg: ", ARGV[1]}' a b c d
awk 'BEGIN {print "Second arg: ", ARGV[2]}' a b c d
Print all args:
awk 'BEGIN { for (c = 0; c < ARGC - 1; ++c) { print ARGV[c] } }' a b c d
- CONVFMT - conversion format
awk 'BEGIN { print "Conversion Format: ", CONVFMT }'
- ENVIRON - hash of environment variables
awk 'BEGIN { print ENVIRON["USER"] }'
- FILENAME - the current file name. Shows “-“ for STDIN.
awk 'END {print FILENAME}' test1.txt
ps -ef | awk 'END {print FILENAME}'
- FS - field separator
awk 'BEGIN {print "FS = " FS}'
awk -v FS=: 'BEGIN {print FS}'
- NF - number of fields
Print all lines with over 2 fields:
ps -ef | awk 'NF > 5'
print number of fields:
ps -ef | awk '{print NF}'
- NR - record number ( line number )
ps -ef | awk '{print NR}'
- FNR - Same as NR but only for the current file. Useful when working with multiple files.
- OFMT - output format number
- OFS - output field separator
- ORS - output record separator
- RLENGTH - length of matched string, used with match function
awk 'BEGIN { if (match("abc xyz 123", "bc")) { print RLENGTH } }'
GAWK Variables
- ARGIND - argv index of current file being processed
- BINMODE - binary mode for non-POSIX systems
- ERRNO - error with getline or close calls
- FIELDWIDTHS - set field width instead of using a separator
- IGNORECASE - ignore case
- LINT - control the lint option
- PROCINFO - hash with process info
- TEXTDOMAIN - for localization info
awk 'BEGIN{IGNORECASE = 1} /amit/' marks.txt
Operators
Assignment
The assignment operator is a single “=”:
awk 'BEGIN { x = "abc"; print x }'
Arithmetic:
You can use the normal arithmetic operators for addition, subtraction, multiplication, division, and modulus:
awk 'BEGIN {
a = 25;
b = 35;
print a + b
print a - b
print a * b
print a / b
print a % b
}'
Either of these operators work for exponents:
awk 'BEGIN {
a = 25;
print a^2
print a**2
}'
Pre-Increment - increment, then assign
awk 'BEGIN { a = 25; b = ++a; print a, b }'
Pre-Decrement - decrement, then assign
awk 'BEGIN { a = 25; b = --a; print a, b }'
Post-Increment - assign, then increment
awk 'BEGIN { a = 25; b = a++; print a, b }'
Post-Decrement - assign, then decrement
awk 'BEGIN { a = 25; b = a--; print a, b }'
Shorthand addition, subtraction, multiplication, division, modulus, and exponent:
awk 'BEGIN {
x = 5;
x += 2; print x
x -= 2; print x
x *= 2; print x
x /= 2; print x
x %= 2; print x
x ^= 2; print x
x **= 2; print x
}'
Unary plus and minus - multiply by 1 or -1:
awk 'BEGIN {
x = 25;
x = +x; print x
x = -x; print x
}'
Comparison:
awk 'BEGIN {
a = 25;
b = 25;
if (a == b) print "true"; else print "false"
if (a != b) print "true"; else print "false"
if (a < b) print "true"; else print "false"
if (a <= b) print "true"; else print "false"
if (b > a ) print "true"; else print "false"
if (b >= a ) print "true"; else print "false"
}'
Logical AND, OR, NOT
And:
awk 'BEGIN {
x = 5; if (x > 3 && x < 8) print "got it"
y = "abc"; if (y != "xyz" && y == "abc") print "got it"
}'
Or:
awk 'BEGIN {
x = 5; if (x > 3 || x < 8) print "got it"
y = "abc"; if (y == "xyz" || y == "abc") print "got it"
}'
Not:
awk 'BEGIN { a = "abc"; if (a != "xyz") print "got it" }'
awk 'BEGIN { b = ""; if (! length(b)) print "empty" }'
Ternary Operator
condition expression ? statement1 : statement2
awk 'BEGIN { a = 25; b = 35; (a > b) ? c = "larger" : c = "smaller"; print c}'
String concatenation:
awk 'BEGIN { a = "Hello "; b = "World"; c = a b; print c }'
Array membership operator “in”:
awk 'BEGIN {
a[0] = 1;
a[1] = 2;
a[2] = 3;
for (i in a) print i, a[i]
}'
Match or not match operator. Any line that contains the pattern.
ps -ef | awk '$0 ~ 1'
ps -ef | awk '$0 !~ 1'
Built-in Functions
Math Functions
atan2(y, x) | arctangent in radians |
cos(expr) | cosine |
exp(expr) | exponential |
int(expr) | truncate to int value |
log(expr) | natural logarithm |
rand(x) | random number |
sin(x) | sine in radians |
sqrt(expr) | square root |
srand(x) | random number with optional seed value |
awk ‘BEGIN { print(rand() “ “ rand()) }’
String Functions
asort(arr) | sort array |
asorti(arr) | sort array by indexf |
gsub(“abc”, “xyz”, str) | global substitution |
index(x, sub) | check for substring |
length(x) | get string length |
match(x, regex) | find first longest match |
split(str, arr, regex) | split string into array using regex |
printf(fmt, list) | print based on variable list |
strtonum(str) | convert string to number |
sub(regex, sub, string) | replace first occurrence, $0 is used if string is ommitted |
tolower(str) | to lower |
toupper(str) | to upper |
Time and Date Functions
systime() | seconds since epoc |
mktime(“2018 11 19 32 15 05”) | seconds since epoc based on datespec string |
strftime(“Time = %m/%d/%Y %H:%M:%S”, systime()) | convert seconds since epoc to timestamp |
These can be used for constructiong date format strings:
%a | weekday - short |
%A | weekday - full |
%b | month - short |
%B | month full |
%c | date / time by locale |
%C | century |
%d | day of month |
%D | %m/%d/%y. |
%e | day of month ( padded ) |
%F | %Y-%m-%d ISO 8601 standard date format |
%g | week of the year ( goes past end of year ) |
%G | full year of week number as decimal |
%h | same as %b. |
%H | hour (00–23) |
%I | hour (01–12) |
%j | day of the year |
%m | month of the year |
%M | minute |
%n | newline char |
%p | AM/PM |
%r | 12-hour clock time for locale |
%R | %H:%M. |
%S | second |
%t | just a tab char |
%T | %H:%M:%S. |
%u | weekday (1–7). Monday is is 1 |
%U | week number (00–53) starting on Sunday |
%V | week number (01–53) starting on Monday |
%w | weekday starting on Sunday (0–6) |
%W | week number (00–53) starting on Monday |
%x | date formated for locale |
%X | time formated for locale |
%y | year modulo 100 (00–99) |
%Y | year - full |
%z | time-zone offset in +HHMM format |
%Z | time zone name |
AWK - Bit Manipulation Functions
and(5,1) | bitwise and |
compl(5) | bitwise complement |
lshift(5,1) | bitwise left shift |
rshift(5,1) | bitwise right shift |
or(5,1) | bitwise or |
xor(5,1) | bitsie xor |
Other Functions
close(cmd) | close file of pipe |
delete a[5] | delete array element |
exit 5 | exit script with optional return value |
fflush() | flush buffers |
getline | read the next line |
next | stop and skip to next line |
nextfile | stop and skip to next file |
return 0 | return from function |
system() | run command and return status |
Output Redirection
Redirect and overwrite:
awk 'BEGIN { print "testing..." > "test1.txt" }'
Redirect and append:
awk 'BEGIN { print "testing..." >> "test1.txt" }'
Pipe:
awk 'BEGIN { print "testing..." | "tr [a-z] [A-Z]" }'
Two way communication:
awk '
BEGIN {
cmd = "tr a b "
print "aaabbb" |& cmd
close(cmd, "to")
cmd |& getline out
print out;
close(cmd);
}
'
Output formatting
You can use either print or printf. Printf gives you a variety of different foratting options.
Newline:
awk 'BEGIN { printf "Hello\nWorld\n" }'
Form feed:
awk 'BEGIN { printf "test\ftest\ftest\ftest\n" }'
Tab:
awk 'BEGIN { printf "test\ttest\ttest\ttest\n" }'
Vertical tab:
awk 'BEGIN { printf "test\vtest\vtest\vtest\n" }'
Backspace:
awk 'BEGIN { printf "test1\btest2test3\b\n" }'
You can use printf to include variables in strings.
awk 'BEGIN { a = "abc"; b = 123; printf "%s -- %d", a, b }'
Formatting place holders for use with printf:
%c | print a character, first char of string or char with this int value |
%d and %i | print integer, truncate decimal |
%e and %E | floating point number in this format form [-]d.dddddde[+-]dd. |
%f | print a floating point number of the form [-]ddd.dddddd. |
%g and %G | like %e or %f but suppress non-significant zeros |
%o | print unsigned octal |
%u | print unsigned decimal |
%s | print a string |
%x and %X | print unsigned hex |
%% | print % |
- Use a number to specify padding with spaces.
- Start that number with a zero to lead with zeros instead of spaces.
%5d | 5 spaces of padding |
%50d | 50 spaces of padding |
%05d | 5 leading zeros |
%050d | 50 leading zeros |
More:
%+5d | left justify |
%+5d | show prefix even if positive |
%#o | show leading 0 |
%#X | show leading 0x |