Xah Lee, 2007-08, 2009-08-20
Emacs's regex is not based on Perl or Python's, but it is basically the same, except in emacs regex, the parenthesis characters “()” are literal. If you want to capture a pattern, you need to escape the paren like this: “\(myPattern\)”.
Here are some common patterns:
Pattern Matches ----------------------------------- . any single character \. one period [0-9]+ digit sequence [A-Za-z]+ sequence of letters [_A-Za-z0-9]+ sequence of alphanumeric char and underscore [-A-Za-z0-9]+ sequence of alphanumeric char and hyphen [_-A-Za-z0-9]+ sequence of alphanumeric char and hyphen & underscore [[:blank:]]+ sequence of tabs and spaces [[:upper:]]+ sequence of cap letters [[:lower:]]+ sequence of lowercase letters "\([^"]+\)" capture text between double quotes + means match previous pattern 1 or more times * means match previous pattern 0 or more times ? means match previous pattern 0 or 1 time
The above patterns will cover vast majority of regex uses.
If you are familiar with Perl's regex, here are some practical major differences.
Emacs has a interactive regex mode, so that you can type a pattern and see immediately what it matches, as you type out the regex. To go into the mode, type “Alt+x regexp-builder”.
Alternatively, you can type “Alt+x query-replace-regexp” to test your pattern. Its keyboard shortcut is “Ctrl+Alt+%”.
To test regex in lisp code, you can open a empty file and place this code “(search-forward-regexp "yourRegex")” then place the text you want to match below it. Then, move your cursor right next of the closing parenthesis, then type “Ctrl+x Ctrl+e” (or “Alt+x eval-last-sexp”). If your regex matches, it'll move to the last char of the matched text. If you get a lisp error saying search-failed, then your regex didn't match. If you get a lisp syntax error, then you probably screwed up on the backslashs.
In a lisp regex function that takes a regex string, such as search-forward-regexp, you will need to use double backslash. This is because, in elisp string, a backslash needs to be prefixed with a backslash, then, this interpreted string is passed to emacs's regex engine.
For example, suppose you have this text:
Sin[x] + Sin[y]
and you need to capture the x or y. You can use:
(search-backward-regexp "\\(\\[[a-z]\\]\\)")
The regex engine really just got:
\(\[[a-z]\]\)
We know that newline char is different in unix and Windows and Mac. However, doesn't matter what OS you are on, when a file is opened in a buffer, emacs always use “\n” to represent newline. So, if you are working on Windows files, where the line ending is CR followed by LF (ascii 13 followed by 10), you don't want to try to match it using “\r\f”. You always just use a single “\n” to match newline. And, you do not need to use double backslash because “\n” already stands for newline inside a string.
(info "(emacs) Regexps")
(info "(elisp) Regular Expressions")