Actually, there are four main steps for reptiles :
Regular expressions are often used for retrieval 、 Replace those that match a pattern ( The rules ) The text of .
Regular expression is a logical formula for string operation , It is to use some specific characters defined in advance 、 And the combination of these specific characters , Form a “ Rule string ”, This “ Rule string ” A filter logic used to express strings .
Whether the given string conforms to the filtering logic of regular expression (“ matching ”);
Through regular expressions , Get the specific part we want from the text string (“ Filter ”).
####### Common characters
Regular expressions use Escape special characters , So if we're going to use the original string , Just add one r Prefix ,
r'chuanzhiboke\t\.\tpython'
Use compile() Function to compile a regular expression as a string Pattern object
adopt Pattern Object provides a series of methods for matching text , Get a match , One Match object .
Finally using Match Object provides properties and methods to get information , Do other operations as needed
compile Function to compile regular expressions , Generate a Pattern object , Its general usage is as follows :
import re
# Compile regular expressions into Pattern object
pattern = re.compile(r'\d+')
Regular expressions are compiled into Pattern object , You can use pattern A series of methods for text matching to find .
Pattern Some commonly used methods of object mainly include :
match Method : Find from start , One match
search Method : Search from anywhere , One match
findall Method : All match , Returns a list of
finditer Method : All match , Return iterator
split Method : Split string , Returns a list of
sub Method : Replace
match Method to find the header of a string ( You can also specify the starting position ), It's a match , As long as a match is found, it will return , Instead of finding all the matches . Its general usage is as follows :
string String to match
pos Starting position of string , The default value is 0
endpos The end of the string , The default value is len ( String length )
group([group1, …]) Method is used to get a string that matches one or more groups , When you want to get the whole matching substring , Can be used directly group() or group(0);
start([group]) Method is used to get the starting position of the substring in the whole string ( Index of the first character of the substring ), Parameter default The value is 0;
end([group]) Method is used to get the end position of the substring in the whole string ( Index of the last character of the substring +1), Parameters The default value is 0;
span([group]) Method returns (start(group), end(group)).
search Method is used to find any location of a string , It's also a match , As long as a match is found, it will return , Instead of looking for all Matching results , Its general usage is as follows :
When the match is successful , Return to one Match object , If there is no match , Then return to None.
split Method to split the string according to the matching substring and return the list , It is used in the following form :
maxsplit Specify the maximum number of divisions , Do not specify to split all
sub Method to replace . It is used in the following form :
repl It can be a string or a function :
1). Is string , Use repl To replace every matching substring of a string , And return the replaced string ,
2). If repl Is the function , This method should take only one parameter (Match object ), And return a string to replace .
count Used to specify the maximum number of replacements , Replace all if not specified .