Xah Lee, 2007-08
This page is emacs lisp lesson solving a real-world problem. It show how elisp is used in a text pattern replacement using a function that transforms the matched pattern. You should be familiar with Elisp Language Basics.
I want to do text replacement on a text pattern. However, the replaced text should be a transformed version of the matched text. For example, if the matched text is “emacs_fun”, the replacement text should be “emacs fun”.
Technically, this page shows you how to write a emacs elisp function that takes a input from match and return a text, and tell emacs to use this function in its regex find-replace operation.
Normally, a programer can write a perl or python script to do find-replace operation on all files in a dir. (See Perl-Python: Find and Replace By Regex Text Patterns) However, this process is not interactive. If you want the find-replace based on case-by-case basis, then this approach won't work. If you are going to program interactivity into your script, than it ceases to become a trivial job. Emacs comes to the rescue, by having a build-in feature for Interactively Find and Replace String Patterns on Multiple Files.
However, suppose you want the replacement string to be a transformed version of the matched text. This means, instead of constructing the replacement string using “/1”, “/2” etc, you need to use a function that returns text, using the matched texts as input.
As of today, i have a website with 3319 HTML files. Among the set, it contains 3276 links to articles at wikipedia.org. Because the site is written over the years, the link format are not all consistent. For example, a link to the article on Stanislaw Szukalski might have these formats:
1. <a href="http://en.wikipedia.org/wiki/Stanislaw_Szukalski">Stanislaw Szukalski↗</a> 2. <a href="http://en.wikipedia.org/wiki/Stanislaw_Szukalski">Stanislaw_Szukalski↗</a> 3. <a href="http://en.wikipedia.org/wiki/Stanislaw_Szukalski">http://en.wikipedia.org/wiki/Stanislaw_Szukalski</a>
In a browser, they would be shown like these:
1. Stanislaw Szukalski↗ 2. Stanislaw_Szukalski↗ 3. http://en.wikipedia.org/wiki/Stanislaw_Szukalski
I want format 2 and 3 to be replaced with format 1, but not always. For some pages i want to use full URL as the link text.
For simplicity of this article, let's say i just want to replace format 2 to format 1 on a case by case basis.
Here, regex cannot do the job by itself because i need the underscore char “_” in the matched text be replaced by a space. This means, i need to write a function that takes the matched text and returns a desired text.
In emacs 22, there's a new feature that allows you use a elisp function as your replacement string. This is done by giving the replacement string this form “ \,(fun-name)”, where fun-name is your elisp function.
The task here is to write the replacement function.
Let's say our function will be named ff. ff will need to take one argument, does string replacement of “_” by “ ”, then return the new text.
The function skeleton would be like this then:
(defun ff () "temp function. Returns a string based on current regex match." ; 1. get the matched text ; 2. transform the matched text ; 3. returns the transformed text )
This is conceptually simple. The hard part is to know how does elisp in the emacs environment actually get the matched text, and how does emacs lisp the language do text replacement. Here's the solution:
(defun ff () "temp function. Returns a string based on current regex match." (let (matchedText newText) (setq matchedText (buffer-substring (match-beginning 0) (match-end 0))) (setq newText (replace-regexp-in-string "_" " " matchedText) ) newText ) )
The (match-beginning 0) and (match-end 0) gives you the beginning and end positions of the entire regex match. (“1” is for 1st captured pattern, “2” for 2nd captured pattern, etc.). The function “buffer-substring” takes 2 positions and returns the actual text between them. The replace-regexp-in-string is used to transform the text.
(To make emacs aware of ff, select it, then run eval-region.)
So, with this function written, we can do “Alt+x query-replace-regexp”, then give this pattern:
>\([_A-z0-9]+\)↗</a>
And the replacement expression would be:
\,(ff)
and we are all done.
This function can be of general use. Whenever you need to replace text patterns with complicated heuristics, you can base your replacement function on the above code.
Here's the actual replacement function i used for this job:
(defun wikip-link-clean () "Returns a canonical form of Wikipedia link from a regex match. This function is used for query-replace-regexp, to turn the following forms of links: <a href=\"http://en.wikipedia.org/wiki/event\">event</a> <a href=\"http://en.wikipedia.org/wiki/Middle_distance\">Middle_distance↗</a> <a href=\"http://en.wikipedia.org/wiki/Middle_distance_track_event\">Middle_distance_track_event↗</a> <a href=\"http://en.wikipedia.org/wiki/Sapir-Whorf_Hypothesis\">Sapir-Whorf_Hypothesis↗</a> into a cannonical form. Basically, the link text needs to have “_” replaced by space, and should have a “↗” at the end. Also, it shouldn't match links that's already in canonical form. The regex to be used for this function is: <a href=\"http://\\(..\\)\\.wikipedia.org/wiki/\\([^\"]+\\)\">\\(\\([-A-Za-z]+_\\)+[A-Za-z]+↗*\\)</a> To use a function in query-replace-regexp, do “\\,(function-name)”. ." (let (langCode articlePath linkText linkText2 returnText) (setq langCode (buffer-substring (match-beginning 1) (match-end 1))) (setq articlePath (buffer-substring (match-beginning 2) (match-end 2))) (setq linkText (buffer-substring (match-beginning 3) (match-end 3))) (setq linkText2 (concat (replace-regexp-in-string "_" " " articlePath) "↗")) (setq returnText (concat "<a href=\"http://" langCode ".wikipedia.org/wiki/" articlePath "\">" linkText2 "</a>" )) returnText ) )
For a lesson where you need the replacement string depending on the file's name, see Creating Next/Previous Navigation Bars.
Emacs is beautiful.
Reference: Elisp Manual: Buffer-Contents.
Reference: Elisp Manual: Search-and-Replace.
Related essays:
Page created: 2007-08. © 2007 by Xah Lee.