- Find all names in the file dist.male.first that begin with 'JEFF'.
$ grep '^JEFF' dist.male.first
- Find all names in the file dist.male.first that end in 'SON'.
$ grep 'SON ' dist.male.first
(You can't use $ after SON because the name is not the last thing on the line. I just used a space to ensure
that it was at the end of the name.)
- Find all names in the file dist.male.first that end in 'SON' and sort them alphabetically.
$ grep 'SON ' dist.male.first | sort
- Find which names are in both the male and female files.
(You will need to use cut or sed to get just the names from the files.)
$ cut -d' ' -f1 dist.*male.first |\
> sort firstnames | uniq -d
(dist.*male.first will match both dist.female.first and dist.male.first filenames.)
- Combine the files of male and female first names and sort them by rank (column 4).
$ sort +3n dist.*male.first
- Combine the files of male and female first names and sort alphabetically and remove duplication.
$ cut -c1-15 dist.*male.first | sort -u
OR
$ cut -c1-15 dist.*male.first | sort | uniq
You can also use the -f option with space as a delimiter. This will only
work for the first feild, though, because
there are multiple spaces seperating feilds. For some reason:
cut -f1 -d' ' dist.male.first dist.female.first
does not work but
cat dist.male.first dist.female.first | cut -f1 -d' '
does. I can't explain it. According to the man page, cut can accept multiple files,
which it does with the -c option.
It might be a bug in cut. If you figure it out let me know.
- Display the female names that start with L.
$ grep '^L' dist.female.first
- Starting with the file 'names' create a list of the
first names in alphabetical order with the number of their rank
by occurrence before each name. (Hint: nl will add line numbers to the output.) (3 marks)
$ cut -f1 -d' ' names | sort | uniq -c |\
> sort -nr | nl | sort +2
We end up with a file with 3 columns: rank, frequency, and name. The file is sorted alphabetically by name.
Here is what is happening:
- Get only the first names.
- Sort them alphabetically.
- uniq removes duplicates and includes the number of occurances of each.
- Sort by the number of occurances, numeric, highest to lowest.
- Number the lines. This give us a ranking number.
- Sort it alphabetically by name.