The shortest matching pattern
You're trying to match a text pattern with regular expressions , But what it finds is the longest possible match for the pattern . And you want to change it to Find shortest May match .
This problem usually occurs when you need to match text between a pair of separators ( For example, quotation marks contain strings ). To make it clear , Consider the following example ：
r'\"(.*)\"') text1 = 'Computer says "no."' str_pat.findall(text1) ['no.'] text2 = 'Computer says "no." Phone says "yes."' str_pat.findall(text2) ['no." Phone says "yes.'] >>>str_pat = re.compile(
In this case , Pattern
r'n"(.*)n"' The intention is to match the text contained in double quotation marks . But in regular expressions * Operators are Greedy , So the match operation looks for the longest possible match . So search for... In the second example text2 It's not what we want .
To fix this problem , It can be in patterns * Add... After the operator ? Modifier , Just like this. ：
r'\"(.*?)\"') str_pat.findall(text2) ['no.', 'yes.'] >>>str_pat = re.compile(
This makes the match become Non greedy model , To get the shortest match , That's what we want .
This article shows the inclusion points in writing (.) Some common problems encountered in the regular expression of characters . In a pattern string , spot (.) Match any character except line break . However , If you're going to order (.) The sign is placed at the beginning and the end ( Like quotes ) In between , Then the match operation looks for the longest possible match that matches the pattern .
This usually results in a lot of the middle of the text contained by the start and end characters to be ignored , And is finally included in the matching result string . By means of * perhaps + This operator is followed by a ? The forced matching algorithm can be changed to find the shortest possible match .