Regular Expressions - HP 9000 User Manual

Computers
Hide thumbs Also See for 9000:
Table of Contents

Advertisement

2
• Multi-byte characters. Finally, character handling
also
involves the correct
parsing of multi-byte character streams and the interpretation of multi-byte
characters. Multi-byte character streams may contain both single-byte and
multi-byte characters. To process this data, each byte must be identified
as either a single-byte character or as part of a multi-byte character. The
details of these and other aspects of character handling are discussed in
Appendix A.
Regular Expressions
HP-UX allows the specification of arbitrary character strings through the use of
regular expressions. For further details on their use, see the section, "Regular
Expressions", in The Ultimate Guide to the vi and ex Text Editors. The syntax
of regular expressions has been extended in HP -UX to allow use with other
character sets.
Here is one example of an internationalized regular expression:
h[
[=e=]]
Ip
This matches the word "help" spelled with any variation of the letter "e" (for
example, e,
e, e, e).
The existing syntax of a range expression (e.g., "[a-z]") is not changed.
However, its meaning has been extended to mean "match any collating element
which falls between the two given collating elements based on the current
locale's LC_COLLATE collation sequence."
For multi-byte languages, the support in regular expressions is not as extensive.
For example, multi-byte characters are allowed as single character elements in
expressions, and they can be used in character ranges. However, the inverse of
a range
("[-a .. z]")
is not allowed with multi-byte characters in general. This
is due to restrictions in the way the codesets are implemented. Moreover, some
new features are not allowed with multi-byte codesets simply because they have
no application to Asian languages.
2-12
Introduction to NLS

Advertisement

Table of Contents
loading

Table of Contents