Provided by: yudit_3.1.0-1_amd64 bug

NAME

       uniconv - convert text to native formats through Unicode

SYNOPSIS

       uniconv -out output-file [ -decode input-encoding ] [ -encode output-encoding ] [ input-file ] [ -todos ]
       [ -fromdos ] [ -tomac ] [ -frommac ]

DESCRIPTION

       uniconv program decodes scripts with a certain encoding encodes them with some other encoding.  The scipt
       is a 16,8 or 7 bit-byte stream.  The converted text  will be sent to the standard output, even in case of
       16-bit encoding methods,unless the output file is specified by the -out option.

       The  -decode  and  -encode  options  are optional, the default converter is utf-8.  The program reads the
       Unicode map helper files (*.my) from the  default  directory  /usr/share/data.   Simple  1-to-1  encoding
       methods  can  be  added  on  the  fly  by  adding a a my-file, or setting your yudit.datapath property in
       ~/.yudit/yudit.properties or /usr/share/yudit/config/yudit.properties.  By default  /usr/share/yudit/data
       and ~/.yudit/data are searched.

       My-files  can  be created by a program called The files can be converted between dos/unix/mac line-ending
       variants with -fromdos, -frommac, -todos, -tomac options. the  default  (not  scpecified  one)  is  Unix.
       makeumap.

ENCODING

       If you received this program through the Yudit distribution, then as of today you can convert between the
       encoding methods below.

       utf-8  Yudit  recommends  this  format  for  international  information  exchange.  ASCII text  will  get
              through  intact, while other Unicode characters will get their 8th bit set and the length  of  the
              code  will depend on how far away they are in the Unicode space.  This is the only  transformation
              format that can encode both 16-bit (ucs-2) and 31-bit (ucs-4) Unicode.

       utf-8-s
              Hackers  utf-8  format - it does not give an error message when a surrogate pair is decoded and it
              can encode a surrogate pair 'as is'.  This is not a  recommended  encoding  format  although  this
              format is used to encode/decode clipboard data, in order to preserve input.

       utf-16 Although 16 is bigger than 8 this is still a compromise required by OSes like Windows that can not
              handle  ucs-4  - this encoding produces 16-bit Unicode streams.  In addition to BMP it can convert
              16 planes using the Unicode Surrogate Area.  This encoding can not convert anything above U+10FFFF
              (Plane 16).  The input byte order is recognized by the first two characters BEM  (byte-order-mark)
              U+FEFF. This format is used in Windows NT for documents like notepad .txt files.

       utf-16-be
              Big endian utf-16 converter.

       utf-16-le
              Littlen endian utf-16 converter.

       utf-7  This  is  the  recommended  format  for international information exchange, when 7-bit can only be
              used. It can only handle 16-bit (utf-16) Unicode, for ucs-4 (above U+10FFFF) you should use  utf-8
              encoding.

       iso-8859-1
              This is the ISO 8859-1 character  encoding format. It is also known as "Latin-1" encoding.

       iso-8859-2
              This   is   the  ISO  8859-2  character  encoding  format.  It is also known as "Central European"
              encoding.

       iso-8859-5
              This is the ISO 8859-5 character encoding format. It is also known as "Cyrillic" encoding.

       iso-8859-7
              This is the ISO 8859-7 character encoding format. It is also known as "Greek" encoding.

       iso-8859-9
              This is the ISO 8859-9 character encoding format. It is also known as "Turkish" encoding.

       koi8-r This is the KOI8-R character encoding format. It is mainly used in Russia.

       cp-1251
              This is the CP1251 cyrillic character encoding format. It is mainly used in Microsoft Windows  and
              some web sites.

       iso-2022-jp
              This is a Japanese character encoding format. It is a 7-bit encoding format.

       iso-2022-jp-3
              This  is a Japanese character encoding format. It is a 7-bit encoding format. It is base upon  JIS
              X 0213 standard.

       euc-jp This is a Japanese character encoding format. It is an 8-bit encoding format.  Mainly used in UNIX
              systems.

       euc-jp-3
              The official name is EUC-JISX0213 - I just could not read this.   This  is  a  Japanese  character
              encoding format. It is a 8-bit encoding format. It is base upon  JIS X 0213 standard.

       shift-jis
              This  is  a  Japanese  character  encoding format.  It is an 8-bit encoding format. Mainly used in
              MSDOS/Windows.

       shift-jis-3
              The official name is Shift_JISX0213 - I just could not read this.  This is  a  Japanese  character
              encoding format.  It is an 8-bit encoding format. Mainly used in MSDOS/Windows.

       iso-2022-jp
              This  is  a  Japanese  7-bit  character  encoding  format.   The iso-2022-jp email messages can be
              decoded/encoded are in this format.

       iso-2022-x11
              This  is a Japanese character encoding format.  It is also known as "COMPOUND_TEXT"  encoding  for
              the  X   Window  System.  This is a 7-bit encoding format.  It can be derived from the ISO 2022-JP
              format with some differences.

       ksc-5601-x11
              This is a  Korean  character  encoding format used by the X window system(COMPOUND_TEXT  encoding)
              to  encode  Korean(KS  X  1001) and US-ASCII. This is a 7bit encoding format compliant to ISO-2022
              specification for encoding of multiple character sets.  Please, note that this is  DIFFERENT  from
              ISO-2022-KR (defined in IETF RFC 1557).

       euc-kr This   is  an 8bit  multibyte encoding for Korean.  It encodes US-ASCII(7bit) in single byte range
              and characters in KS X 1001(formerly KS C 5601) in double byte range with MSB on(8bit). It's  used
              in  Unix and Internet. Korean  version of MS-DOS, MacOS and MS-Windows use compatible (most cases,
              identical) variant of this encoding.

       johab  This  is  a  Korean  encoding  specified  in  KS  X  1001(KS  C  5601-1992),    Annex   3   as   a
              supplementary  encoding.   Widely  used  in  Korean  MS-DOS until mid-1990's.  It can  encode  all
              Hangul syllables(11,172) of modern Korean as well as all the special symbols  and  Hanja  (Chinese
              ideograms used in Korea) defined in KS X 1001.

       uhc    A   variant    of    EUC-KR    used    in    Korean    MS-Windows  95/98(proprietary  encoding  of
              Microsoft,CP949). Its character  repertoire  includes  all  modern   syllables   of  Hangul,Korean
              script  as  well as all the special symbols and Hanja (Chinese ideograms used in Korea) defined in
              KS X 1001.

       gb-18030
              This is a  Chinese  character  encoding  format  based  upon  GB  18030.   It  encodes  the  whole
              U+0000..U+10FFFF range, while being compatible with gb-2312.

       gb-2312-x11
              This is a Chinese character encoding format based upon GB 2312.  It is a 7-bit encoding format.

       gb-2312
              This is a Chinese character encoding format based upon GB 2312.  It is an 8-bit encoding format.

       big-5  This  is  a  Chinese  character encoding format based upon BIG5 encoding.  It is an 8-bit encoding
              format.

       hz     This is a Chinese character encoding format based upon "Hanzi" encoding.  It is a  7-bit  encoding
              format.

       viscii This is a Vietnamese character encoding format.

       ucs-2-be
              This  converts 16-bit Unicode (ucs-2) streams. The format takes care of big-endian variant.  Yudit
              does not recommend this format.

       ucs-2-le
              This converts 16-bit Unicode (ucs-2) streams. The format  takes  care  of  little-endian  variant.
              Yudit does not recommend this format.

       ucs-2  This converts 16-bit Unicode (ucs-2) streams.  The input byte order is recognized by the first two
              characters BEM (byte-order-mark) U+FEFF.  Yudit does not recommend this format.

       java   This converts \uxxxx character escapes. When encoding, all characters above U+0080 will be escaped
              with  a  string  like  '\u0080'.  When decoding the same format is decoded but, in addition, utf-8
              format is also recognized, so it can also be used to recover  data  accidentally  saved  with  the
              wrong encoding. The U+10000..U+10FFFF area is converted to surrogates and vice versa.

       java-s This converts \uxxxx character escapes. When encoding, all characters above U+0080 will be escaped
              with  a  string  like  '\u0080'.  When decoding the same format is decoded but, in addition, utf-8
              format is also recognized, so it can also be used to recover  data  accidentally  saved  with  the
              wrong  encoding.  Surrogates are not treated specially during conversion - this is why it is not a
              recommended conversion.

FILES

       ~/.yudit/yudit.properties or /usr/share/yudit/config/yudit.properties
              can  have  yudit.datapath  property.  This  is  where  the  map  files  are  kept.    By   default
              /usr/share/yudit/data is searched.

SEE ALSO

       makeumap

AUTHOR

       This program  was written by gaspar@yudit.org (Gaspar Sinai), Last updated: 5 February, 2023, Tokyo.

LINUX COMMANDS                                     Nov 5 1997                                         UNICONV(1)