[plug] Mass substitute mixed utf files

Peter Hallam plug at inatick.com
Thu Feb 24 02:37:43 WST 2011


On Wed, 23 Feb, 21:14 +0800 Carl Gherardi wrote:
> grep something utf16file fails as does cat utf16file | sed
> 's/something/replace/g'
> 
> I've got several hundred files of mixed utf16 and utf8 files that i
> want to perform a substitution on and preserve their utfness, as they
> are part of a parsing regression test suite.

You could try using Perl and its -C switch, the S and D values enable various UTF handling features:

 perl -CSD -p -i -e 's/something/replace/g' <filename(s)>

For example, given your specification:

 perl -CSD -p -i -e 's/something/replace/g' utf16file

A note of caution, the perl unicode documentation [ http://perldoc.perl.org/perlunicode.html ] talks a lot about UTF-8, so you should test the above first before running across your entire dataset.


Regards,
Peter.



More information about the plug mailing list