Next: Splitting Characters, Previous: Character Sets, Up: Non-ASCII Characters
In multibyte representation, each character occupies one or more
bytes. Each character set has an introduction sequence, which is
normally one or two bytes long. (Exception: the ascii character
set and the eight-bit-graphic character set have a zero-length
introduction sequence.) The introduction sequence is the beginning of
the byte sequence for any character in the character set. The rest of
the character's bytes distinguish it from the other characters in the
same character set. Depending on the character set, there are either
one or two distinguishing bytes; the number of such bytes is called the
dimension of the character set.
This function returns the dimension of charset; at present, the dimension is always 1 or 2.
This function returns the number of bytes used to represent a character in character set charset.
This is the simplest way to determine the byte length of a character set's introduction sequence:
(- (charset-bytes charset)
(charset-dimension charset))
