Emacs and Unicode Tips

Xah Lee, 2006-07

This page gives some tips about using emacs and unicode. If you work in 2 languages, or type a lot math symbols, you'll find this page useful.

screenshot of unicode in emacs

above: A screen-shot of emacs window showing unicode chars. You can download this text here: unicode.txt.

The following tips are based on Emacs version 22 (released in 2007).

Input Unicode Characters

How to type this character é ?

To type characters such as é à ü î ñ, type:

character       key press
------------------------------
 é              Ctrl+x 8 ' e
 à              Ctrl+x 8 ` a
 ü              Ctrl+x 8 " u
 î              Ctrl+x 8 ^ i
 ñ              Ctrl+x 8 ~ n

To see all the non-ascii characters you can type with “Ctrl+x 8” prefix, type “Ctrl+x 8 Ctrl+h”.

Here is a sample of characters you can type by the “Ctrl+x 8” scheme: ¡ç £¢¥ ©®™ •º†‡ “”‘’«» ±π∞.

(If you are on a Mac, these characters can be typed by holding down the Option key. For detail, see Mac OS X Keybinding and Unicode Tips.)

How to use abbrev to type unicode chars?

Suppose you type a lot math and need math symbols such as: alpha α, beta β, gamma γ, theta θ, Infinity ∞ etc. You can set up a abbrev-mode, so that, when you type “alpha”, it automatically becomes “α”. Here's what to do. Put the following in your emacs init file:

(define-abbrev-table 'global-abbrev-table '(
    ("alpha" "α" nil 0)
    ("beta" "β" nil 0)
    ("gamma" "γ" nil 0)
    ("theta" "θ" nil 0)
    ("Infinity" "∞" nil 0)

    ("ar1" "→" nil 0)
    ("ar2" "⇒" nil 0)
    ))

Then, turn on the abbrev-mode by “Alt+x abbrev-mode”. Then, when you type alpha, it will become α.

Reference: (info "(emacs)Abbrevs").

How to set a keystroke to type a unicode char?

If you have some characters that you use often, you can make emacs inserting them with a single keypress. For example, put the following code in your ~/.emacs, then, each time you press the 6 key on the number pad, a arrow will be inserted.

(global-set-key (kbd "<kp-6>") "→")

You can also set shortcut by key sequence. In the following, typing “Alt+i a” will insert α.

(global-set-key (kbd "M-i a") "α")
(global-set-key (kbd "M-i b") "β")
(global-set-key (kbd "M-i t") "θ")

How to type a unicode character by its hex value?

Type “Alt+x set-input-method” and give a value “ucs”. Once you are in the ucs input method, you can type “u”, followed by a hex value. Emacs will then insert the unicode char with that hex value.

To return to your normal input method, type “Ctrl+\” (or “Alt+x toggle-input-method”). For example, try typing the Greek lower case alpha “α” by its hex value 03B1.

If you have the decimal value of a unicode char, you can first find its hex value. You can do this by using the build-in calculator. Suppose your character in decimal is 945. Now, type “Alt+x calc” to start calc. Then type “945” then Enter. Now, type “d6”, which puts calc in a hex mode. You can read off the screen that the hex value is 3B1. To put calc back to decimal mode, type “d0”. To quit calc, type “q”.

How to open a unicode character template?

You can put frequently used unicode chars into a file and save it, and define a keystroke to open this file, so that you can copy and paste the chars you want. Here's how you can define a keystroke to open a file. Put the following in your ~/.emacs file.

; open my unicode template with F8 key
(global-set-key (kbd "<f8>")
  (lambda () (interactive) (find-file "~/my_unicode_template.txt")))

How to type Chinese?

Regardless what text editor you are using, you need to do two things: (1) Set your editor's File Encoding system to one that supports your language. (2) set your Input Method to a particular system suitable for your language.

File Encoding tells your computer how to map symbols/glyphs/characters into binary code. Input Method allows you to type languages that are not based on alphabet. (For example, in Chinese, you cannot just type a character by pressing a key, instead, you must use a input method to type Chinese.) For languages based on the Latin alphabet, you don't need to worry about input method.

To set your file encoding in emacs, use the menu “Options‣Mule (Multilingual Environment)‣Set Language Environment”.

To set your input method, use the menu “Options‣Mule (Multilingual Environment)‣Select Input Method...”.

After you've pulled the menu, be sure to also pull the menu command “Options‣Save Options” so that emacs remembers your settings.

For me, i type Chinese often. There are several encoding systems for Chinese, for example GB 18030↗, Big5↗, UTF-8↗. I use the UTF-8 encoding system. Among the Chinese input methods↗, i use the Pinyin method↗. Here's how to set them in emacs without using the menu: “Alt+x set-language-environment UTF-8” and “Alt+x set-input-method chinese-py”.

Here's a example of actually typing the Chinese char 美 (meaning beautiful). Type “Alt+x set-input-method RET chinese-py”, then type “mei”. Emacs will show you a list of characters with the pronunciation of mei. Type “2” to pick the right character. Then, emacs will insert the character. To return to your input method, type “Ctrl+\”.

A in-depth tutorial of using Mac with Chinese is at: http://www.yale.edu/chinesemac/. It includes comprehensive info and resources on Chinese fonts, complete tutorials on several Chinese input methods, etc.

Finding Info About a Character

I have this character α on the screen. How to find out its unicode's hex value or name?

You can find out a character's decimal, octal, or hex values by placing your cursor on the character, and type “Alt+x what-cursor-position” (Ctrl+x =). You can get more info if you place your cursor on the character, then type “Ctrl+u Ctrl+x =”.

However, if you want the complete unicode info of a character, you need to download a unicode data file and let emacs know where it is. The unicode data file can be downloaded at: http://www.unicode.org/Public/UNIDATA/UnicodeData.txt. After you downloaded it, place the following code in your “~/.emacs” to let emacs know where it is:

; set unicode data file location. (used by what-cursor-position)
(let ((x "~/Documents/emacs/UnicodeData.txt"))
  (when (file-exists-p x)
    (setq describe-char-unicodedata-file x)))

Then restart emacs. Once you've done this, then place your cursor on a unicode char, and do “Ctrl+u Ctrl+x =”, then emacs will give you all the unicode info about that char, including the code point in decimal, octal, hex notations, as well the unicode character name, category, the font emacs is using, and others.

For example, here's the output on the character “α”:

      character: α (332721, #o1211661, #x513b1, U+03B1)
        charset: mule-unicode-0100-24ff
                 (Unicode characters of the range U+0100..U+24FF.)
     code point: #x27 #x31
         syntax: w 	which means: word
       category: g:Greek
    buffer code: #x9C #xF4 #xA7 #xB1
      file code: #xCE #xB1 (encoded by coding system mule-utf-8-unix)
        display: by this font (glyph code)
     -apple-symbol-medium-r-normal--14-140-72-72-m-140-mac-symbol (#x61)
   Unicode data:  
           Name: GREEK SMALL LETTER ALPHA
       Category: lowercase letter
Combining class: Spacing
  Bidi category: Left-to-Right
      Uppercase: Α
      Titlecase: Α

There are text properties here:
  fontified            t

Other

Is there a way to declare a file with a particular character encoding?

Yes. In the first line of your file, put “-*- coding: utf-8 -*-”. That way, each time emacs open the file, emacs will presume that the file is encoded in utf-8. The line can start with a comment character of your language, such as “#”, “//”, or “;”.

Where can one read more about unicode in emacs?

Reference: (info "(emacs)International").

To learn more about Unicode, see Wikipedia: Unicode↗.


Related essays:


Page created: 2006-07.
© 2006 by Xah Lee.
Xah Signet