[plug] Grep not following my regex

Tim Bowden tim.bowden at westnet.com.au
Tue Sep 15 11:49:20 WST 2009


On Tue, 2009-09-15 at 11:09 +0800, Gregory Orange wrote:
> Tim Bowden wrote:
> > On Tue, 2009-09-15 at 09:38 +0800, Tim Bowden wrote:
> >> On Tue, 2009-09-15 at 09:11 +0800, Tim wrote:
> >>> I have been doing some string extraction for an application I've
> >>> written. I needed to extract a number (with decimal point) from a one
> >>> line string. I had the grep working, but then having moved to another
> >>> computer, it stopped working. I did some debugging on the computer I'd
> >>> moved to, to see why my application no longer worked, and narrowed it
> >>> down to this grep.
> >>> egrep -o '[0-9.]*'
> >>>
> >>> By changing the grep to the following, I managed to get it working again.
> >>> egrep -o '[0-9.]+'
> >>>
> >>> Now I know that the + may make more sense now, by forcing a match, but
> >>> I can't see why the first regex stopped working.
> >>>
> >>> The string it's matching against is:
> >>> You currently have 347.454MB remaining
> >>> and obviously it just pulls out the 347.454.
> >>>
> >>> Other than bash stealing the * from the grep (which I am sure
> >>> shouldn't be happening due to the quotes), can someone let me know why
> >>> the first regex stopped working? (Also, I was moving from an Ubuntu
> >>> system to a Fedora system)
> >>>
> >>> Thanks
> >>>
> >>> Tim
> >> The '.' has special meaning in a regex.  Escape it to make it just a
> >> '.'.
> >>
> >> Without looking at your problem in detail, here is a possible regex for
> >> a decimal point number.
> >>
> >> grep -o [0-9]+\.[0-9]+
> > 
> > Meh.  What a load of horse shit.  \ doesn't escape the '.' at all; Perl
> > it ain't.  And it should be egrep.  Happens to work anyway, so long as
> > your strings are as you say (otherwise it might not be quite what you're
> > after...).
> > 
> > <hangs head in shame>
> > Tim Bowden
> 
> I don't understand your retraction. \ does escape . in grep. I always
> use egrep but I tried straight grep, just in case:
> grep -o "[0-9][0-9]*\.[0-9][0-9]*"
> grep -o "[0-9][0-9]*.[0-9][0-9]*"
> 
> give different results, but substituting grep for egrep doesn't change
> anything - GNU grep 2.5.1 on SLES9 if it's relevant.
> 
> Also, my understanding is that the + symbol is extended regex, so .+ is 
> not as portable as ..*
> 
> Oh hang on... inside the [] . is different. Now I'm confused. Reading 
> OP's original regex tho, it looks like it would match on something like 
> "..3.4..5.." - not exactly a decimal number. Doesn't work for me tho (:
> 
> Cheers,
> Greg.

Do grep \. testfile

and it will match every char in the file.  The '.' isn't escaped by the
'\'.  Using [.] does indeed treat it as a literal.  In any event, there
is a problem if there is more than one number in any input line.

Tim Bowden




More information about the plug mailing list