[plug] Re: trouble with searching for non-ascii characters in a text file

David Buddrige buddrige at wasp.net.au
Fri May 16 16:05:48 WST 2003


thanks everyone, I'll give these a go. 8-) 

regards 

David.
Tony Breeds writes: 

> On Fri, May 16, 2003 at 01:03:48PM +0800, David Buddrige wrote:
>> Hi all,  
>> 
>> I have some html files that contain odd [non-ascii] characters here and 
>> there.  The web-browser displays them as "?" in the html page.  My text 
>> editor uses another character to represent that particular character.  I 
>> want to find out how to determine what exact hexadecimal value that 
>> character evaluates to, and then how to grep on that hex-value - rather 
>> than its ascii equivilent.  is this possible? 
> 
> You can use "od -x" to work out the hex values. but that output isn't
> really very machine readable. 
> 
> But I think you'd probably benift from something like:
> ---
> sub HEX($) {
> 	my $chr=shift;
> 	my $val = ord($chr);
> 	if ($val > 32 or $val < 127) {
> 		return $chr;
> 	} else {
> 		#return sprintf("&#%d;",$val);  #HTML
> 		return sprintf("0x%x;",$val);   #'C' hex
> 	}
> } 
> 
> while (<>) {
> 	s/./HEX($&)/eg;
> 	print;
> }
> --- 
> 
> It's ugly and untested but I should take any non printing chars and
> replace them with there hex equivilents.  You could of course drop the
> else clause and then get rid of them altogether. 
> 
> Yours Tony 
> 
>         Linux.Conf.AU       http://lca2004.linux.org.au/
>         Jan 12-17 2004      The Australian Linux Technical Conference! 
> 
 



More information about the plug mailing list