Friday, 19 September 2008

Converting character sets on Linux

There's a tool in Linux and Unix systems which allows to convert encoding of a text file within wide range: UTF8, UTF16, ASCII, Windows-1251 etc. Its name is iconv and it is accessible from command line aka Terminal.
Some important options to be aware of:

  • -c - makes the program ignore unknown symbols
  • -f - precedes input file encoding
  • -t - precedes output file encoding
  • -l - lists all known encodings
Here's how to use it:

iconv -c -f UTF-16LE -t UTF8 input_file.txt > output_file.txt

This can prove quite handy for tablet users, as OS2008 tablet doesn't like UTF16-encoded documents, neither "native" Notes app nor Leafpad or PyGTKEditor.