[plug] trouble with searching for non-ascii characters in a text file

Tony Breeds magni at plug.linux.org.au
Fri May 16 14:23:15 WST 2003


On Fri, May 16, 2003 at 01:03:48PM +0800, David Buddrige wrote:
> Hi all, 
> 
> I have some html files that contain odd [non-ascii] characters here and 
> there.  The web-browser displays them as "?" in the html page.  My text 
> editor uses another character to represent that particular character.  I 
> want to find out how to determine what exact hexadecimal value that 
> character evaluates to, and then how to grep on that hex-value - rather 
> than its ascii equivilent.  is this possible? 

You can use "od -x" to work out the hex values. but that output isn't
really very machine readable.

But I think you'd probably benift from something like:
---
sub HEX($) {
	my $chr=shift;
	my $val = ord($chr);
	if ($val > 32 or $val < 127) {
		return $chr;
	} else {
		#return sprintf("&#%d;",$val);  #HTML
		return sprintf("0x%x;",$val);   #'C' hex
	}
}

while (<>) {
	s/./HEX($&)/eg;
	print;
}
---

It's ugly and untested but I should take any non printing chars and
replace them with there hex equivilents.  You could of course drop the
else clause and then get rid of them altogether.

Yours Tony

        Linux.Conf.AU       http://lca2004.linux.org.au/
        Jan 12-17 2004      The Australian Linux Technical Conference!



More information about the plug mailing list