HP 9000 User Manual page 150

Computers
Hide thumbs Also See for 9000:
Table of Contents

Advertisement

Code Sets
One objective of international program design is to create an application that
is codeset independent. To create a program that is sufficiently robust to
accept any kind of codeset, you must know how data is represented in different
languages and the potential problems you can encounter.
As a UNIX user, you are probably familiar with ASCII, the 7-bit codeset used
to support American English. All codesets supporting the diverse languages of
international users are supersets of the familiar ASCII. This ensures that these
codesets can communicate with the operating system, utilities, and applications
which have a dependency on ASCII.
The ISO 8859-1 and Roman8 codesets support Western European languages.
These 8-bit codesets support an additional 128 character codes beyond those
of ASCII. While this extension of the ASCII character set meets the needs of
Western European users, it is not large enough to support languages such as
Arabic and Greek that have alphabets completely different from those used in
Western Europe or the U.S. For these languages, other 8-bit codesets have been
designed such as ARABIC8 and GREEK8. ISO 8859-2 and ISO 8859-5 are
used for supporting Eastern European languages such as Polish and Russian
(a complete list of codesets and the languages they support is provided in
Appendix E).
8-bit codesets provide support to international users who speak and write
phonetic languages. A single byte, however, is not sufficient to represent the
symbols of users whose language is ideographic (for example, Traditional
Chinese which contains over 50,000 distinct ideographs). To provide for these
users, codesets that support multi-byte characters were introduced.
With the introduction of encoding schemes with multi-byte characters a
problem arose. Because users who read and write ideographics still need ASCII
(for communicating with the operating system and backwards compatibility),
it becomes possible to have a data stream consisting of a mixture of one and
two byte characters. The resulting problem is one of character interpretation:
How can a program interpret characters correctly, distinguishing between single
and multi-byte characters? A number of solutions to this problem have been
designed.
A group of 2-byte codesets were developed that adhere to a common definition
for interpreting a byte stream called HP15. All codesets that adhere to the
Special Topics for HP's 16-bit Interfaces A-3
A

Advertisement

Table of Contents
loading

Table of Contents