[plug] Mass substitute mixed utf files

Thu Feb 24 11:26:55 WST 2011

On Thu, Feb 24, 2011 at 2:37 AM, Peter Hallam <plug at inatick.com> wrote:
> On Wed, 23 Feb, 21:14 +0800 Carl Gherardi wrote:
>> grep something utf16file fails as does cat utf16file | sed
>> 's/something/replace/g'
>>
>> I've got several hundred files of mixed utf16 and utf8 files that i
>> want to perform a substitution on and preserve their utfness, as they
>> are part of a parsing regression test suite.
>
> You could try using Perl and its -C switch, the S and D values enable various UTF handling features:
>
>  perl -CSD -p -i -e 's/something/replace/g' <filename(s)>
>
> For example, given your specification:
>
>  perl -CSD -p -i -e 's/something/replace/g' utf16file
>
> A note of caution, the perl unicode documentation [ http://perldoc.perl.org/perlunicode.html ] talks a lot about UTF-8, so you should test the above first before running across your entire dataset.

Looks like this is failing. I've avoided perl for utf stuff for a
while now, but it was worth a shot.

FYI:

iconv -f utf-16 -t utf-8 <filename> | grep blah

At least allows me to grep the damn things.

Carl G