Regular Expressions in Python


4.2.5 Match Objects

Some functions and methods return a MatchObject. The following are their methods and attributes:

expand( template)
Returns the string in template, with back references replaced from the captured pattern. This is similar to the sub() function. That is:
matchObj=re.compile(pat).search(stringText) 
result=matchObj.expand(template)
is equivalent to
result=re.sub(pat,template,stringText) 
Back refernces include the numeric forms ("\1", "\2, ...") or ("\g<1>", "\g<2>", ... ), as well as named forms ("\g<name>"). Note, the named forms needs to be specified in the regex pattern by (?P<name>pattern) (see regex syntax).

Here's a complete usage example of expand():

patObj=re.compile(r'([^<]+)<img src="([^"]+)">(.+)')
matchObj=patObj.search('click here <img src="some.jpg"> to see it') 
print matchObj.expand(r'\1<a href="\2">\2</a>\3')

# prints: click here <a href="some.jpg">some.jpg</a> to see it

The following methods groups(), group(), groupdict() all returns the captured match in different ways. groups() returns them all, group(n1,n2,...) returns them in a user specified order and combination, and groupdict() returns the named captures as a dictionary.

groups( [default])
Return a tuple containing all the subgroups of the match.

Example:

myText='some a1 a2 a3 list'
patObj=re.compile(r'.+(\w\d+) (\w\d+) (\w\d+).+')
matchObj=patObj.search(myText)
print matchObj.groups()

# prints: ('a1', 'a2', 'a3')

NOTE TO DOC WRITERS: The doc sayz: β€œThe default argument is used for groups that did not participate in the match; it defaults to None.” I have read some 8 Perl books and also the Regex book in 1999, and i've read the python regex tutorial as well as all the re docs. I can't really fucking understand what is this doc is trying to say about groups()'s default argument. Can groups() take ANY argument at all? If so, what form and what do they mean? And, if groups() does not take any arguments, it is quite stupid to say it takes None. I give it arguments like groups(0), groups(None), groups(default), and doesn't seems to do anything. . Add whatever is worth it is trying to say here.

group( [n1, n2, n3 ...])
Returns one or more captured patterns. If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument. Each argument is a integer reference to the captured pattern, with 0 denoting the entire matched pattern. No argument group() is equivalent to group(0).

Example:

myText='some a1 a2 a3 list'
patObj=re.compile(r'.+(\w\d+) (\w\d+) (\w\d+).+')
matchObj=patObj.search(myText)
print matchObj.groups()       # prints: ('a1', 'a2', 'a3')
print matchObj.group()        # prints: 'some a1 a2 a3 list'
print matchObj.group(0)       # prints: 'some a1 a2 a3 list'
print matchObj.group(1)       # prints: 'a1'
print matchObj.group(2)       # prints: 'a2'
print matchObj.group(1,2)     # prints: ('a1', 'a2')
print matchObj.group(2,1,1)   # prints: ('a2', 'a1', 'a1')
print matchObj.group(0,1)     # prints: ('some a1 a2 a3 list', 'a1')
If an argument is negative or larger than the number of groups defined in the pattern, an IndexError exception is raised. If a group is contained in a part of the pattern that did not match, the corresponding result is None. (NEED EXAMPLE) If a group is contained in a part of the pattern that matched multiple times, the last match is returned. (NEED EXAMPLE)

If the regular expression uses the (?P<name>...) syntax, the arguments may also be strings identifying groups by name.

Example:

myText='some a1 a2 a3 list'
patObj=re.compile(r'.+(\w\d+) (?P<second>\w\d+) (\w\d+).+')
matchObj=patObj.search(myText)
print matchObj.group(1,'second',3)     # prints: ('a1', 'a2', 'a3')
If a string argument is not used as a group name in the pattern, an IndexError exception is raised.
groupdict( [default])
Return a dictionary containing all the named subgroups of the match, keyed by the subgroup name. The default argument is used for groups that did not participate in the match; it defaults to None. Example:
myText='some a1 a2 a3 list'
patObj=re.compile(r'.+(?P<this>\w\d+) (?P<second>\w\d+) (?P<thatt>\w\d+).+')
matchObj=patObj.search(myText)
print matchObj.groupdict()

# prints {'this': 'a1', 'thatt': 'a3', 'second': 'a2'}

start( [n])
end( [n])
Return the indices of the start and end of the substring matched by nth captured pattern. start() is equivalent to start(0), similarly for end(). (0 represents to string matched by the whole regex pattern.) Example:
myText='some a1 a2 a3 list'
patObj=re.compile(r'.+(?P<this>\w\d+) (?P<second>\w\d+) (?P<thatt>\w\d+).+')
matchObj=patObj.search(myText)
print matchObj.start(1)    # prints 5
print matchObj.end(1)      # prints 7

Return -1 if group exists but did not contribute to the match. (NOTE QUITE UNDERSTAND THIS. NEED EXAMPLE HERE) For a match object m, and a group g that did contribute to the match, the substring matched by group g (equivalent to m.group(g)) is

m.string[m.start(g):m.end(g)]

Note that m.start(group) will equal m.end(group) if group matched a null string. For example, after m = re.search('b(c?)', 'cba'), m.start(0) is 1, m.end(0) is 2, m.start(1) and m.end(1) are both 2, and m.start(2) raises an IndexError exception.

span( [n])
For MatchObject m, return the 2-tuple (m.start(n), m.end(n)). Note that if the given captured pattern did not contribute to the match, this is (-1, -1). (MAY NEED AN EXAMPLE HERE). span() is equivalent to span(0).

The following are various attributes of the MatchObject.

string
The string passed to match() or search(). Example:
mm=re.compile(r'some.+').search('some text')
print mm.string    # prints 'some text'
re
The regular expression object whose match() or search() method produced this MatchObject instance.
pos
The value of pos which was passed to the search() or match() method of the RegexObject. This is the index into the string at which the RE engine started looking for a match.
endpos
The value of endpos which was passed to the search() or match() method of the RegexObject. This is the index into the string beyond which the RE engine will not go.
lastindex
The integer index of the last matched capturing group, or None if no group was matched at all. For example, the expressions (a)b, ((a)(b)), and ((ab)) will have lastindex == 1 if applyied to the string 'ab', while the expression (a)(b) will have lastindex == 2, if applyied to the same string.
lastgroup
The name of the last matched capturing group, or None if the group didn't have a name, or if no group was matched at all.

Page created: 2005-04, by Xah Lee.
For copyright and terms, see terms.html
Xah Signet