Next: Scanning Charsets, Previous: Chars and Bytes, Up: Non-ASCII Characters
The functions in this section convert between characters and the byte values used to represent them. For most purposes, there is no need to be concerned with the sequence of bytes used to represent a character, because Emacs translates automatically when necessary.
Return a list containing the name of the character set of character, followed by one or two byte values (integers) which identify character within that character set. The number of byte values is the character set's dimension.
If character is invalid as a character code,
split-charreturns a list consisting of the symbolunknownand character.(split-char 2248) ⇒ (latin-iso8859-1 72) (split-char 65) ⇒ (ascii 65) (split-char 128) ⇒ (eight-bit-control 128)
This function returns the character in character set charset whose position codes are code1 and code2. This is roughly the inverse of
split-char. Normally, you should specify either one or both of code1 and code2 according to the dimension of charset. For example,(make-char 'latin-iso8859-1 72) ⇒ 2248Actually, the eighth bit of both code1 and code2 is zeroed before they are used to index charset. Thus you may use, for instance, an ISO 8859 character code rather than subtracting 128, as is necessary to index the corresponding Emacs charset.
If you call make-char with no byte-values, the result is
a generic character which stands for charset. A generic
character is an integer, but it is not valid for insertion in the
buffer as a character. It can be used in char-table-range to
refer to the whole character set (see Char-Tables).
char-valid-p returns nil for generic characters.
For example:
(make-char 'latin-iso8859-1)
⇒ 2176
(char-valid-p 2176)
⇒ nil
(char-valid-p 2176 t)
⇒ t
(split-char 2176)
⇒ (latin-iso8859-1 0)
The character sets ascii, eight-bit-control, and
eight-bit-graphic don't have corresponding generic characters. If
charset is one of them and you don't supply code1,
make-char returns the character code corresponding to the
smallest code in charset.
