[plug] trouble with searching for non-ascii characters in a text file

Fri May 16 14:25:32 WST 2003

In message <courier.3EC47134.0000060D at wasp.net.au>
on Fri, May 16, 2003 at 01:03:48PM +0800, David Buddrige wrote:
> I have some html files that contain odd [non-ascii] characters here and 
> there.  The web-browser displays them as "?" in the html page.
> My text editor uses another character to represent that particular
> character.

Are you lucky enough that they are well formed and indicate their
character set? "?" substitution always make me think of mail from
Microsoft clients, which seem to characteristically send e-mail that
states it's encoded in some particular character set when in fact the
e-mail is encoded with the operating system's default.

> I want to find out how to determine what exact hexadecimal value that
> character evaluates to, and then how to grep on that hex-value -
> rather than its ascii equivilent.  is this possible? 

I would normally have vim set to "set display=uhex" which will show such
characters as "<ab>" (using a distinctive text colour). Alternatively,
its "ga" or "g8" keystrokes to show the numerical encoding of the
character. You can search for arbitrary byte sequences using vim's
regular "/" and "?" searches -- you enter your characters using CNTL-v
followed by x followed by the hex for the encoding (or you can use octal
or decimal, I presume). Without invoking an editor, you can use "xxd |
less" from your shell. It will show you the byte values as numerics
alongside the verbatim contents of a file, so you could then do a
regular text search based on the numerics (would probably have a higher
false-positive rate than the vim method, depending on the specific
contents of the file).