eolas/neuron/7c141533-1806-4f5c-a70e-7527be05b1c5/Text_manipulation_in_Bash.md

---
tags:
  - shell
---

# Text manipulation

## Sorting strings: `sort`

If you have a `.txt` file containing text strings, each on a new line you can
use the sort function to quickly put them in alphabetical order:

```bash
sort file.txt
```

Note that this will not save the sort, it only presents it as a standard output.
To save the sort you need to direct the sort to a file in the standard way:

```bash
sort file.txt > output.txt
```

### Options

- `-r`
  - reverse sort
- `c`
  - check if file is already sorted. If not, it will highlight the strings which
    are not sorted

## Find and replace: `sed`

The `sed` programme can be used to implement find and replace procedures. In
`sed`, find and replace are covered by the substitution option: `/s` :

```bash
sed ‘s/word/replacement word/’ file.txt
```

This however will only change the first instance of word to be replaced, in
order to apply to every instance you need to add the global option: `-g` .

As sed is a stream editor, any changes you make using it, will only occur within
the standard output , they will not be saved to file. In order to save to file
you need to specify a new file output (using `> output.txt`) in addition to the
original file. This hasthe benefit of leaving the original file untouched whilst
ensuring the desired outcome is stored permanently.

Alternatively, you can use the `-i` option which will make the changes take
place in the source file as well as in standard input.

Note that this will overwrite the original version of the file and it cannot be
regained. If this is an issue then it is recommended to include a backup command
in the overall argument like so:

```bash
sed -i.bak ‘s/word/replacement word/’ file.txt
```

This will create the file `file.txt.bak` in the directory you are working within
which is the original file before the replacement was carried out.

### Remove duplicates

We can use the `sort -u` command can be used to remove duplicates:

```bash
sort -u file.txt
```

It is important to sort before attempting to remove duplicates since the `-u`
flag works on the basis of the strings being adjacent.

## Split a large file into multiple smaller files: `split`

Suppose you have a file containing 1000 lines. You want to break the file up
into five separate files, each containing two hundred lines. You can use `split`
to accomplish this, like so:

```bash
split -l 200 big-file.txt new-files
```

`split` will categorise the resulting five files as follows:

- new-file-aa,
- new-file-ab
- new-file-ac,
- newfile-ad,
- new-file-ae.

If you would rather have numeric suffixes, use the option `-d` . You can also
split a file by its number of bytes, using the option `-b` and specifying a
constituent file size.

## Merge multiple files into one with `cat`

We can use `cat` read multiple files at once and then append a redirect to save
them to a file:

```bash
cat file_a.txt file_b.txt file_c.txt > merged-file.txt
```

## Count lines, words, etc: `wc`

To count words:

```bash
wc file.txt
```

When we use the command three numbers are outputted, in order: lines, words,
bytes.

You can use modifiers to get just one of the numbers: `-l`, `-w` , `-b` .