Character Sets and Encoding in HTML

Xah Lee, 2005-12

In HTML, you can declare the Character Set for the file. Like this:

<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=utf-8">

If you don't understand what is Character Set and Encoding, please read this essay: The Journey of a Foreign Character thru Internet.

Once you declared your character set, you can have characters from that character set in your html file. There is a character set standand called Unicode, which contains basically all the world's language's characters, including the several thousand Chinese characters. Here is a sample of characters from Unicode:

éåøèü θπ αβγ δε λ ϕρκψ ≤≥≠≈≔⊂⊃⊆⊇∈ ⅇⅈⅉ∞∆° ℵℜℂℝℚℙℤ ℓ∟∠∡ ∀∃ ∫∑∏ ⊕⊗⊙⊚⊛∘∙ ± “”©—‘’ →←↑↓↔↗ ⇐⇑⇒⇓⇔⇗ ■□•‣♥★☆李杀网

Another way to show foreign character in your file is by HTML encoding. For example, the bullet symbol • is unicode character number 8226. In HTML, one can encode it as “&#8226;”. Here's what your browser shows: •

The number 8226 in hexadecimal is 2022. Sometimes you only knew the hexadecimal form. You can encode using hexadecimal by “&#x2022;”. Here's what your browser shows: •

For some commonly used character, HTML provides named encoding for them. For example, the bullet character can be encoded as “&bull;”. Here's what your browser shows: •

References and Notes:


Related Essays:


Page created: 2005-05.
© 2005 by Xah Lee.
Xah Signet