Autosave: 2023-02-10 07:37:15

This commit is contained in:
thomasabishop 2023-02-10 07:37:15 +00:00
parent f8d8405449
commit 48eebb7128
4 changed files with 88 additions and 11 deletions

View file

@ -26,7 +26,33 @@ We can also obviously pipe to it.
## Syntactic structure
An `awk` program consists in a sequence of pattern-action statements and optional functional definitions. `awk` is line-oriented.
`awk` is a line-oriented language.
An `awk` program consists in a sequence of **pattern: action** statements and optional functional definitions.
For most of the examples we will use this list as the input:
```
cloud
existence
ministerial
falcon
town
sky
top
bookworm
bookcase
war
Peter 89
Lucia 95
Thomas 76
Marta 67
Joe 92
Alex 78
Sophia 90
Alfred 65
Kate 46
```
### Patterns and actions
@ -38,16 +64,18 @@ pattern {action}
A **pattern** is what you want to match against. It can be a literal string or a regex. The **action** is what process you want to execute against the lines in the input that match the pattern.
The following script prints each line of input that contains the word "error":
The following script prints the line that matches `Joe`:
```bash
/error/ {print} file.tx
awk '/Joe/ {print}' list.txt
```
`/error/` is the patttern and `{print}` is the action.
`/Joe/` is the patttern and `{print}` is the action.
### Lines, records, fields
![](/_img/awk-outline.png)
When `awk` receives a file it divides the lines into **records**.
Each line `awk` receives is broken up into a sequence of **fields**.
@ -56,18 +84,66 @@ The fields are accessed by special variables:
- `$1` reads the first field, `$2` reads the second field and so on.
- The variable `$0` refers to the whole recordk
- The variable `$0` refers to the whole record
### Patterns and actions
So, in the picture `cloud existence ministerial` corresponse to `$1` `$2` `$3`
### Variables
## Basic examples
Variables are denoted with a leading `$`. The fields of input are denoted sequentially with `$1, $2, $3...`
This script prints the first and third fields of each line of input:
**_Match a pattern_**
```bash
{ print $1, $2 }
awk '/book/ { print }' list.txt
# bookworm
# bookcase
```
**_Print all words that are longer that five characters_**
```bash
awk 'length($1) > 5 { print $0 }' list.txt
```
For the first field of every line (we only have one field per line), if it is greater than 5 characters print it. The "every line" part is provided for via the all fields variable - `$0`.
We actually don't need to include the `{ print $0 }` action, as this is the default behaviour. We could have just put `length($1) > 5 list.txt`
**_Print all words that do not have three characters_**
```bash
awk '!(length($1) == 3)' list.txt
```
Here we negate by prepending the pattern with `!` and wrapping it in parentheses.
**_Return words that are either three characters or four characters in length_**
```
awk '(length($1) == 3) || (length($1) == 4)' list.txt
```
Here we use the logical OR to match against more than one pattern. Notice that whenever we use a Boolean operator such as NOT or OR, we wrap our pattern in parentheses.
**_Match and string-interpolate the output_**
```bash
awk 'length($1) > 0 {print $1, "has", length($1), "chars"}' list.txt
# storeroom has 9 chars
# tree has 4 chars
# cup has 3 chars
```
**_Match against a numerical property_**
```bash
awk '$2 >= 90 { print $0 }' scores.txt
# Lucia 95
# Joe 92
# Sophia 90
```
This returns the records where there is a secondary numerical field that is greater than 90.
https://zetcode.com/lang/awk/

1
Untitled.canvas Normal file
View file

@ -0,0 +1 @@
{}

BIN
_img/awk-outline.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB