[plug] A bit of gross bash code

Thu Mar 8 14:14:44 WST 2001

On Wed, 7 Mar 2001, Brad Campbell wrote:
> > On Wed, 7 Mar 2001, The Thought Assassin wrote:
> > > ls -ln /dev|grep "^[c|b|u]" |
> > > sed 's/^\(.\)......... *\([0-9]*\) *\([0-9]*\) *\([0-9]*\) *\([0-9]*\)\, *\([0-9]*\)..............\(.*\)$/\1:\3:\4:\5:\6:\7/'
> I'm off to read up on regular expressions..

Just to get you started, I'll document how the above works.
| grep "^[c|b|u]"
The caret '^' means "beginning of line", forcing grep to only recognise
the pattern if it the beginning of the line. [c|b|u] means "c or b or u"
as you might expect. The quotes are to prevent bash from trying to parse
the pipes or carets itself. In total, it filters the lines beginning with
c, b or u and tosses the rest out.

| sed s/foo/bar/
replaces any instance of "foo" with "bar".

I shall now run through what I used for "foo". (the search string)
I have deilberately wrapped the lines at _non_-spaces, to avoid confusion,
but the line numbers are followed by spaces for clarity.

0 ^

The caret means "beginning of line" again, so we won't match suffixes.

1 \(.\)......... *

\( means open bracket and \) close bracket. :)
Ignore the brackets for no, we will get to them at the end.
the '.' within the brackets matches any character. In this case: c/b/u.
The 9 '.'s each match any character, so whatever permissions are present
in the wrxwrxwrx format.
' ' matches a space, but when a pattern is followed by an asterisk '*', it
stands for "any number of consecutive occurences of this token" - in this
instance, any unbroken chain of spaces. sed is greedy in this regard;
although it might match the first space of a chain and ignore the rest,
sed chooses to make it match the longest possible string.

2 \([0-9]*\) *

"[0-9]" matches any single character in the range '0' to '9'. The asterisk
modifies this to match any string of digits. This is followed by another
string of spaces, as above.

3 \([0-9]*\) *

Another string of digits followed by a string of spaces.

4 \([0-9]*\) *

Ditto.

5 \([0-9]*\)\, *

String of digits, followed by a comma, then a string of spaces.

6 \([0-9]*\)..............

String of digits, followed by 14 characters of any description. (this
matches the date string, and spacing either side of it.)

7 \(.*\)

Any string of characters. Thanks to greediness: the rest of the line.

8 $

Matches "end-of-line" - analogous to '^'. Maybe redundant here.

That's it for the search string. Make sense?
The replace string is much simpler.

This is what any matching line is replaced with:
\1:\3:\4:\5:\6:\7

"\1" means "the string enclosed by the first \( \) pair".
"\3" is the third such pair.

We pick and choose which bracketed segments we wish to take, and seperate
them with colons in the output. If we wanted to, we could reorder them, or
use a particular bracketed section more than once in the output string.

Any questions? :)

> Maybe thats a good topic for a plug talk one night.
It probably is. If I had time to even _attend_ meetings these days...

-Greg Mildenhall