[plug] Mass substitute mixed utf files
Peter Hallam
plug at inatick.com
Thu Feb 24 02:37:43 WST 2011
On Wed, 23 Feb, 21:14 +0800 Carl Gherardi wrote:
> grep something utf16file fails as does cat utf16file | sed
> 's/something/replace/g'
>
> I've got several hundred files of mixed utf16 and utf8 files that i
> want to perform a substitution on and preserve their utfness, as they
> are part of a parsing regression test suite.
You could try using Perl and its -C switch, the S and D values enable various UTF handling features:
perl -CSD -p -i -e 's/something/replace/g' <filename(s)>
For example, given your specification:
perl -CSD -p -i -e 's/something/replace/g' utf16file
A note of caution, the perl unicode documentation [ http://perldoc.perl.org/perlunicode.html ] talks a lot about UTF-8, so you should test the above first before running across your entire dataset.
Regards,
Peter.
More information about the plug
mailing list