If you enjoyed this site, please consider donating $3. Any amount is appreciated. Thanks!

Text Pattern Matching in Emacs

Xah Lee, 2007-08, 2009-08-20

Emacs's regex is not based on Perl or Python's, but it is basically the same, except in emacs regex, the parenthesis characters “()” are literal. If you want to capture a pattern, you need to escape the paren like this: “\(myPattern\)”.

Common Patterns

Here are some common patterns:

Pattern         Matches
-----------------------------------
.               any single character
\.              one period

[0-9]+          digit sequence
[A-Za-z]+       sequence of letters
[_A-Za-z0-9]+   sequence of alphanumeric char and underscore
[-A-Za-z0-9]+   sequence of alphanumeric char and hyphen
[_-A-Za-z0-9]+  sequence of alphanumeric char and hyphen & underscore
[[:blank:]]+    sequence of tabs and spaces
[[:upper:]]+    sequence of cap letters
[[:lower:]]+    sequence of lowercase letters

"\([^"]+\)"     capture text between double quotes 

+               means match previous pattern 1 or more times
*               means match previous pattern 0 or more times
?               means match previous pattern 0 or 1 time

The above patterns will cover vast majority of regex uses.

Differences from Perl's Regex

If you are familiar with Perl's regex, here are some practical major differences.

Testing Your Regex

Emacs has a interactive regex mode, so that you can type a pattern and see immediately what it matches, as you type out the regex. To go into the mode, type “Alt+x regexp-builder”.

Alternatively, you can type “Alt+x query-replace-regexp” to test your pattern. Its keyboard shortcut is “Ctrl+Alt+%”.

To test regex in lisp code, you can open a empty file and place this code “(search-forward-regexp "yourRegex")” then place the text you want to match below it. Then, move your cursor right next of the closing parenthesis, then type “Ctrl+x Ctrl+e” (or “Alt+x eval-last-sexp”). If your regex matches, it'll move to the last char of the matched text. If you get a lisp error saying search-failed, then your regex didn't match. If you get a lisp syntax error, then you probably screwed up on the backslashs.

Double Backslash in Lisp Code

In a lisp regex function that takes a regex string, such as search-forward-regexp, you will need to use double backslash. This is because, in elisp string, a backslash needs to be prefixed with a backslash, then, this interpreted string is passed to emacs's regex engine.

For example, suppose you have this text:

Sin[x] + Sin[y]

and you need to capture the x or y. You can use:

(search-backward-regexp "\\(\\[[a-z]\\]\\)")

The regex engine really just got:

\(\[[a-z]\]\)

Matching Newline Char

We know that newline char is different in unix and Windows and Mac. However, doesn't matter what OS you are on, when a file is opened in a buffer, emacs always use “\n” to represent newline. So, if you are working on Windows files, where the line ending is CR followed by LF (ascii 13 followed by 10), you don't want to try to match it using “\r\f”. You always just use a single “\n” to match newline. And, you do not need to use double backslash because “\n” already stands for newline inside a string.

(info "(emacs) Regexps")

(info "(elisp) Regular Expressions")

2007-08
© 2007 by Xah Lee.