regex examples
1 | import regex as re |
语法
. 匹配除了newline的任意character,如果要匹配newline,需要添加re.DOTALL flag
* 重复至少$0$次
+ 重复至少$1$次
? 重复$0$次或者$1$次
{} 重复多少次,如a{3,5}表示重复$3-5$次
[] 匹配方括号内的内容,如[1-9]表示匹配$1-9$中任意一个
^ matching the start of the string
$ matching the end os the string
+,*.? 都是贪婪匹配,如果加一个?为非贪婪匹配
+?,*?,?? 为非贪婪匹配
() 匹配括号内的正则表达式,表示一个group的开始和结束
| 或
\number
\b 匹配empty string
\B
\d 匹配数字
\D 匹配非数字
\s 匹配空白符[ \t\n\r\f\v]
\S 匹配非空白符
\w 匹配unicode
\W
\A
\Z
模块
- re.compile(patern, flags=0)
- re.match(pattern, string, flags=0)
- re.fullmatch(pattern,string,flags=0)
- re.search(pattern, string, flags=0)
- re.split(pattern, string, maxflit=0, flags=0)
- re.findall(pattern,string,flags=0)
- re.sub(pattern,repl,string,count=0,flags=0)
- re.subn(pattern,repl,string,count=0,flags=0)
flags
flags can be re.DEBUG, re.I, re.IGNORECASE, re.L, re.LOCALE, re.M, re.MULTILINE, re.S, re.DOTALL, re.U, re.UNICODE, re.X, re.VERBOSE
- re.I(re.IGNORECASE) 忽略大小写
- re.L(re.LOCALE)
- re.M(re.MULTILINE) 多行模式,设置以后.匹配newline。指定re.S时,’^'匹配string的开始和each line的开始(紧跟着each newline); '$'匹配string的结束和each line的结束($在newline之前,immediately preceding each newline)。如果不指定的话, '^‘只匹配string的开始,’$'只匹配string的结束和immediately before the newline (if any) at the end of the string,对应inline flag (?m).
- re.S(re.DOTALL)
- re.U(re.UNICODE)
- re.X(re.VERBOSE)
- re.DEBUG
1 | import regex as re |
re.M例子
1 | text = """First line. |
re.compile(patern, flags=0)
将一个正则表达式语句编译成一个正则表达式对象,可以调用正则表达式的match()和search()函数进行matching。
complie a regular expression pattern into a regular expression object,which can be used for matching using its match() and search()
1 | str = "https://abc https://dcdf https://httpfn https://hello" |
re.match(pattern, string, flags=0) or re.fullmatch(pattern,string,flags=0)
在给定的string开始位置进行查找,返回一个match object。即使设置了MULTILINE mode, re.match()也只会在string的开始而不是each line的每一行开始匹配。
re.search(pattern, string, flags=0)
在给定的string任意位置进行查找,返回一个match object。
locat a match anywhere in string
search() vs. match()
re.macth()在string的开头查找,而re.search在string的任意位置查找,他们都返回match object对象。如果不匹配,返回None。
1 | import re |
re.findall(pattern,string,flags=0)
查找字符string所有匹配pattern的字符
1 | import regex as re |
['https://abc ', 'https://dcdf ', 'https://httpfn '] # pay attention to the last ,since the end of str is \n
re.split(pattern, string, maxflit=0, flags=0)
按照patten对string进行分割
1 | import regex as re |
[‘https://abc’, ‘https://dcdf’, ‘https://httpfn’, ‘https://hello’]
re.sub(pattern,repl,string,count=0,flags=0)
re.subn(pattern,repl,string,count=0,flags=0)
…
正则表达式对象(regular express object)
class re.RegexObject
只有re.compile()函数会产生正则表达式对象,正则
only re.compile() will create a direct regular express object,
it’s a special class which design for re.compile().
正则表达式对象支持下列方法和属性
- match(string[,pos[,endpos]])
- search(string[,pos[,endpos]])
- findall(string[,pos[,endpos]])
- split(string,maxsplit=0)
- sub()
- flags
- groups
- groupindex
- pattern
match(string[,pos[,endpos]])
search(string[,pos[,endpos]])
findall(string[,pos[,endpos]])
split(string,maxsplit=0)
sub()
flags
groups
groupindex
pattern
匹配对象(match objects)
class re.MatchObject
匹配是否成功
match objects have a boolean value of True.
1 | match = re.search(pattern, string) |
MatchObject支持以下方法和属性
- group([group1,…])
- groups([default=None])
- groupdict(default=None)
- start([group])
- end([group])
- span([group])
- pos
- endpos
- lstindex
- lastgroup
- re
- string
group([group1,…])
group的话pattern需要多个()
groups([default])
返回一个元组
return a tuple containing all the subgroups of the match.
1 | re.match(r"(\d+)\.(\d+)","24.1632") |
(‘24’,‘1632’)
show default
1 | m = re.match(r"(\d+)\.?(\d+)?", "24") |
(‘24’, None)
change default to 0
1 | m.groups('0') # Now, the second group defaults to '0'. |
参考文献
1.https://stackoverflow.com/questions/180986/what-is-the-difference-between-re-search-and-re-match
2.https://devdocs.io/python~3.7/library/re
3.https://mail.python.org/pipermail/python-list/2014-July/674576.html