python 正则表达式,个人阶段总结

By：Roy.LiuLast updated：2012-08-17

前端时间到处写爬虫，被迫去学了这个东西，虽然还不是很精通，也不是很熟悉，但应付一般的足够了。
我现在在写爬虫的时候，运用上面的一些基础东西，就可以写出正则表达式。基本满足自己需要，然后再通过程序配合处理。
即使是一些简单的用法，也要大量的实验才能掌握,自己做一个总结，采用闭卷考试的方法，自己默写出一些简单的常用的东西。
并做一个例子来测试常用方法.

python 常用正则表达式符号，以及re模块基本方法
一定要掌握的几个
? 匹配 0 个或一个, 另外在非贪婪模式，最小化匹配时有用。python 默认是贪婪模式匹配的。
. 匹配任何非换行字符，但可以通过 re.S 绕过这一点.
+ 匹配一个或多个
* 匹配0个或多个.
[] 匹配里面的任何一个字符
[^ ] 不匹配里面的任何一个字符. 在 [] 中所有字符均失去特殊意义，比如'.' 不再代表仍何非换行字符，就是代表'.' 号 .
{} 匹配个数，比如{3}表示匹配3个，{1,3} 表示可以匹配1个到3个。
^ 匹配字符串开始
$ 匹配字符串结束
\d 匹配任何数字,也就是0-9
\D 匹配仍何非数字 [^0-9] 与 \d 相反
\w 匹配数字和字幕 0-9 a-z A-Z
\W 与\w 相反
\s 匹配空白字符,比如 \n \t \r \v \f
\S 与 \s 相反
\ 本身，转义符. 比如要明确的匹配'.'号，就应该用'\.' 否则 '.' 代表匹配任何非换行符了

掌握了这些基础知识后，可以写出一般的正则表达式。比较复杂的，比如分组等。用到的时候再去查资料。如果想全部用脑子记住，
只有一个方法，就是多练习。

自己随手写了一个简单的例子，来测试比较常用的方法.

import re
teststring="""what is this ,
aha,i can't believe it.you must configure you definition file.

for example:

xxx=summer@yihaomen.com

yyy=192.168.1.1
then restart tomcat , check the log.what,what.
"""

#======用来替换 字符串中能匹配正则表达式的部分====
def test_sub():
    pattern=r'\d{3}\.\d{3}\.\d{1,3}\.\d{1,3}'
    content=re.sub(pattern, "IP Address", teststring,re.S)
    print content

#==============search 可以从任意位置去匹配===================    
def test_search():
    pattern=r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}.*?log'
    content=re.search(pattern, teststring, re.S)
    print content.group()

def test_findall():
    pattern=r''
    content=re.findall(pattern, teststring, re.S)
    print content
    
def test_finditer():
    pattern=r''
    content=re.finditer(pattern, teststring, re.S)
    print content
    
def test_split():
    pattern=r''
    content=re.split(pattern, teststring)
    print content

#=====mathch 是尝试从字符串开头去匹配,search 可以从任意位置,最大区别    
def test_match():
    content=re.match('what', teststring,re.S)
    if content:
        print content.group()
    else:
        print 'None'
    
if __name__=='__main__':
    print '==========test sub function=================='
    test_sub()
    print '==========test search function==============='
    test_search()
    print '==========test findall function=============='
    test_findall()
    print '==========test finditer function============='
    test_finditer()
    print '==========test split ========================'
    test_split()
    print '==========test match ========================'
    test_match()

From：一号门

Tags: python 正则表达式

Previous:java vnc reverse connection(Java vnc server 反向连接)

Next:Flexpaper在线阅读器初体验(类似百度文库，豆丁网)

COMMENTS

python 正则表达式,个人阶段总结

RELATED ARTICLES

COMMENTS