Use multiple qualifiers to split strings
You need to split a string into multiple fields , But the separator ( And the space around it ) It's not fixed .
string Object's split() Method is only suitable for very simple string segmentation , It doesn't allow multiple separators or indefinite spaces around them . When you need to cut strings more flexibly , Best use re.split() Method ：
'asdf fjdk; afed, fjek,asdf, foo' import re re.split(r'[;,\s]\s*', line) ['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']line =
function re.split() It's very practical , Because it allows you to specify multiple regular patterns for the separator . such as , In the example above , The separator can be a comma , Semicolons or spaces , And it's followed by any space . As long as the pattern is found , The entities on either side of the matching separator are returned as elements in the result . The return result is a list of fields , This heel str.split() The return value type is the same .
When you use re.split() Function time , It is important to note whether the regular expression contains a bracket to capture the group . If capture packets are used , Then the matched text will also appear in the result list . such as , Take a look at the results of this code run ：
r'(;j,j\s)\s*', line) fields ['asdf', ' ', 'fjdk', ';', 'afed', ',', 'fjek', ',', 'asdf', ',', 'foo'] >>>fields = re.split(
Getting split characters is also useful in some cases . such as , You may want to keep the split string , Used to reconstruct a new output string later ：
::2] > delimiters = fields[1::2] + [''] > values ['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo'] > delimiters [' ', ';', ',', ',', ',', ''] > # Reform the line using the same delimiters > ''.join(v+d for v,d in zip(values, delimiters)) 'asdf fjdk;afed,fjek,asdf,foo' >> values = fields[
If you don't want to keep the split string in the result list , But if you still need to use parentheses to group regular expressions , Make sure your group is a non capture group , Form like (?:...) . such as ：
r'(?:,j;j\s)\s*', line) ['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo'] >>>re.split(