Text Pattern Matching in Emacs

Xah Lee, 2007-08

Emacs's regex is not based on Perl or Python's, but it is basically the same, except in emacs regex, the parenthesis characters “()” are literal. If you want to capture a pattern, you need to escape the paren like this: “\(myPattern\)”.

Common Patterns

Here are some common patterns:

Pattern         Matches
-----------------------------------
.               any single character
\.              one period

[0-9]+          digit sequence
[A-Za-z]+       sequence of letters
[_A-Za-z0-9]+   sequence of alphanumeric char and underscore
[-A-Za-z0-9]+   sequence of alphanumeric char and hyphen
[[:blank:]]+    sequence of tabs and spaces

"\([^"]+\)"     capture text between double quotes 

+               means match previous pattern 1 or more times
*               means match previous pattern 0 or more times
?               means match previous pattern 0 or 1 time

The above patterns will cover the vast majority of regex uses.

Differences from Perl's Regex

If you are familiar with Perl's regex, here are some practical major difference.

Testing Your Regex

Emacs has a interactive regex mode, so that you can type a pattern and see immediately what it matches, as you type out the regex. To go into the mode, type “Alt+x regexp-builder”.

Alternatively, you can type “Alt+x query-replace-regexp” to test your pattern. Its keyboard shortcut is “Ctrl+Alt+%”.

To test regex in lisp code, you can open a empty file and place this code “(search-forward-regexp "yourRegex")” then place the text you want to match below it. Then, move your cursor right next of the closing parenthesis, then type “Ctrl+x Ctrl+e” (or “Alt+x eval-last-sexp”). If your regex matches, it'll move to the last char of the matched text. If you get a lisp error saying search-failed, then you know your regex didn't match. If you get a lisp syntax error, then you know you probably screwed up on the backslashs.

Double Backslash in Lisp Code

In a lisp regex function that takes a regex string (Example: “search-forward-regexp”), you will need to use double backslash. This is because, in elisp string, a backslash needs to be prefixed with a backslash, then, this interpreted string is passed to emacs's regex engine.

For example, suppose you have this text “Sin[x] + Sin[y]”, and you need to capture the x or y. You can use “(search-backward-regexp "\\(\\[[a-z]\\]\\)")”. The regex engine really just got “\(\[[a-z]\]\)”.

If you want to match “\n”, you don't need double backslash because “\n” already stand for line return inside a string.

(info "(emacs)Regexps")

(info "(elisp)Regular Expressions")


Related essays:

2007-08
© 2007 by Xah Lee.