## Regular expression of Python

hpl201314 2020-11-12 17:02:14
regular expression python

1, What is regular expression ？

Regular expressions （regular expression） Is an expression used to express a set of strings concisely .

2, What is the role ？

① The characteristics of expressing text types . ② Find or replace a set of strings at the same time . ③ Match all or part of a string .

3, Common operators ：

 The operator explain Example . Represents any single character [] Character set , Give a value range for a single character [abc] Express a,b,c,[a-z] Express a-z Single character [^] Non character set , Give the exclusion range for a single character [^abc] Express Division a,b,c A single character other than * Previous character 0 Times or infinitely abc* Express ab,abc,abcc,abccc wait + Previous character 1 Times or infinitely abc+ Express abc,abcc,abccc wait ？ The previous character appears or does not appear abc Express ab,abc | Any one of the left and right expressions abc|def Express abc,def {m} Extend the previous character m Time ab{4}c Express abbbbc {m,n} Extend the previous character m To n Time , contain m,n ab{1,2}c Express abc,abbc ^ Match the beginning of a string ^abc Express abc And at the beginning of the string \$ Match string end abc\$ Express abc And at the end of the string () Group markers , The interior can only be used | The operator （abc） Express abc,（abc | def） Express abe、def \d Numbers , Equivalent to [0,9] \w Word characters , Equivalent to [A-Za-z0-9_]

4, Some syntax examples of regular expressions

 Regular expressions The corresponding string P(Y|YT|YTH|YTHO)?N "PN","PYN","PYTN","PYTHN","PYTHON" PYTHON+ "PYTHON","PYTHONN","PYTHONNN"....... PY[TH]ON "PYTON","PYHON" PY[^TH]?ON "PYON","PYAON","PYBON","PYCON"...... PY{:3}N "PN","PYN","PYYN","PYYYN"

5, Classic examples of regular expressions

 ^[A-Za-z]+\$ from 26 A string of letters ^[A-Za-z0-9]+\$ from 26 A string of letters and numbers ^-?\d+\$ String in integer form ^[0-9]*[1-9][0-9]*\$ A string in the form of a positive integer [1-9]\d{5} Postcode in China [\u4e00-\u9fa5] Match Chinese characters \d{3}-\d{8}|\d{4}-\d{7} Domestic phone number ,010-12345678 [1-9]?\d 0-99 1\d{2} 100-199 2[0-4]\d 200-249 25[0-5] 250-255 (([1-9]?\d|1\d{2}|2[0-4]\d|25[0-5]).){3}([1-9]?\d|1\d{2}|2[0-4]\d|25[0-5]) matching ip Address

6,re Basic use of Library

 re.search() Search a string for the first place to match a regular expression , return match object re.match() Match regular expressions from the beginning of a string , return match object re.findall() Search string , Return all matching substrings with list type re.split() Split a string according to the regular expression matching result , Return list type re.finditer() Search string , Returns the iteration type of a matching result , Each iteration element is match object re.sub() Replace all substrings matching regular expressions in a string , Return the replaced string

①search(pattern, string, flags=0)

pattern： The string or native string representation of a regular expression
string： String to match
flags： Control flags when regular expressions are used

```1 import re
2 match = re.search(r"[1-9]\d{5}", "haha 723300")
3 if match:
4 print(match.group())
5
6 G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
7 723300
8
9 Process finished with exit code 0```
search

②match(pattern,string,flags=0)

It should be noted that match The function starts at the beginning of a string , If the start doesn't match , No more searching for , If found, the return value is One match object , Return when you can't find it None

``` 1 import re
2 match = re.match(r"[1-9]\d{5}", "haha 723300")
3 print(type(match))
4 match = re.match(r"[1-9]\d{5}", "723300 haha")
5 if match:
6 print(match.group())
7
8 G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
9 <class 'NoneType'>
10 723300
11
12 Process finished with exit code 0```
match

so search And match The difference is that ：
match The substring to be matched must be at the beginning of the string , Otherwise, we can't find , and search There is no such requirement

③findall（pattern,string,flags=0）

``` 1 import re
2 c = re.findall(r"[1-9]\d{5}", "haha723300 xixi612203")
3 print(type(c))
4 print(c)
5
6 G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
7 <class 'list'>
8 ['723300', '612203']
9
10 Process finished with exit code 0```
findall

④split(pattern,string,maxsplit=0,flags=0)

maxsplit： Maximum number of divisions , The rest is output as the last element

``` 1 import re
2 a = re.split(r"[1-9]\d{5}", "haha723300 xixi612203")
3 print(type(a))
4 print(a)
5
6 a = re.split(r"[1-9]\d{5}", "haha723300 xixi612203", maxsplit=1)
7 print(a)
8
9 str1 = "name: hpl, age: 18"
10 b = re.split(r'\:|\,', str1)
11 print(b)
12
13
14 G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
15 <class 'list'>
16 ['haha', ' xixi', '']
17 ['haha', ' xixi612203']
18 ['name', ' hpl', ' age', ' 18']
19
20 Process finished with exit code 0```
split

⑤finditer(pattern,string,flags=0)

``` 1 import re
2 for m in re.finditer(r"[1-9]\d{5}", "haha723300 xixi612203"):
3 if m:
4 print(m.group())
5
6 G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
7 723300
8 612203
9
10 Process finished with exit code 0```
finditer

⑥sub(pattern,repl,string,count=0,flags=0)

repl： Replace string matching string
count： The maximum number of replacements to match

```1 import re
2 m = re.sub(r"[1-9]\d{5}", "love", "haha723300 xixi612203")
3 if m:
4 print(m)
5
6 G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
7 hahalove xixilove
8
9 Process finished with exit code 0```
sub

## 7,re Library match object

attribute ：
string The text to be matched
re Used when matching pattern object （ Regular expressions ）
pos The beginning of regular expression search text
endpos The end of regular expression search text

Method ：
group() Get the matching string
start() Match string at the beginning of the original string
end() Match string at the end of the original string
span() return （start）（end）

``` 1 import re
2 match = re.search(r"[1-9]\d{5}", "haha723300 xixi612203")
3 print(match.string)
4 print(match.re)
5 print(match.pos)
6 print(match.endpos)
7 print(match.group())
8 print(match.start())
9 print(match.end())
10 print(match.span())
11
12 G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
13 haha723300 xixi612203
14 re.compile('[1-9]\\d{5}')
15 0
16 21
17 723300
18 4
19 10
20 (4, 10)
21
22 Process finished with exit code 0```
re Library match object

## 8,re Library Greedy matching and minimum matching

①re The library defaults to greedy matching , That is, the output matches the longest substring

```1 import re
2 match = re.search(r'PY.*N','PYANBNCNDN')
3 print(match.group())
4
5 G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
6 PYANBNCNDN
7
8 Process finished with exit code 0```
Greedy matching

② The method of minimum matching ： Add... After the extension operator ？

 The operator explain *？ Previous character 0 Times or infinitely , Minimum match +？ Previous character 1 Times or infinitely , Minimum match ？？ Previous character 0 Time or 1 Second expansion , Minimum match [m,n]? Extend the previous character m to n Time ( contain n), Minimum match
```1 import re
2 match = re.search(r'PY.*?N','PYANBNCNDN')
3 print(match.group())
4
5 G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
6 PYAN
7
8 Process finished with exit code 0```
Minimum match