请教python匹配中文字符的方法
有如下的字符s="""{"headurl":"","nickname":"","loginstatus":"","loginstate":"","tip":"未注册服务"...
有如下的字符
s = """{"headurl":"","nickname":"","loginstatus":"","loginstate":"","tip":"未注册服务","idUser":"","sessionId":"","upgradeUrl":"","checkCodeKey":"false"}"""
在python2.7中如何通过正则表达式提取“未注册服务”这几个中文字符? 展开
s = """{"headurl":"","nickname":"","loginstatus":"","loginstate":"","tip":"未注册服务","idUser":"","sessionId":"","upgradeUrl":"","checkCodeKey":"false"}"""
在python2.7中如何通过正则表达式提取“未注册服务”这几个中文字符? 展开
3个回答
展开全部
# -*-coding:UTF-8-*-
__author__ = u'丽江海月客栈'
s = """{"headurl":"","nickname":"","loginstatus":"","loginstate":"","tip":"未注册服务","idUser":"","sessionId":"","upgradeUrl":"","checkCodeKey":"false"}"""
ss = s.decode('utf-8')
import re
re_words = re.compile(u"[\u4e00-\u9fa5]+")
m = re_words.search(ss, 0)
print m.group()
展开全部
在Python的string前面加上‘r’, 是为了告诉编译器这个string是个raw string,不要转意backslash '\' 。 例如,\n 在raw string中,是两个字符,\和n, 而不会转意为换行符。由于正则表达式和 \ 会有冲突,因此,当一个字符串使用了正则表达式后,最好在前面加上'r'。
在[]中
-长用来指定一个字符集,在这个字符集中的一个可以拿来匹配:[abc] [a-z]
-元字符在在字符集中不起作用
-在[]内用^表示补集,用来匹配不在区间范围内的字符
s=r'aba' 匹配abc
s=r't[io]p' 匹配tip或者top
s=r't[a-z0-9A-Z]'匹配t+0-9或者a-z或者A-Z
[abc]表示“a”或“b”或“c”
[0-9]表示0~9中任意一个数字,等价于[0123456789]
[\u4e00-\u9fa5]表示任意一个汉字
[^a1<]表示除“a”、“1”、“<”外的其它任意一个字符
[^a-z]表示除小写字母外的任意一个字符
在[]中
-长用来指定一个字符集,在这个字符集中的一个可以拿来匹配:[abc] [a-z]
-元字符在在字符集中不起作用
-在[]内用^表示补集,用来匹配不在区间范围内的字符
s=r'aba' 匹配abc
s=r't[io]p' 匹配tip或者top
s=r't[a-z0-9A-Z]'匹配t+0-9或者a-z或者A-Z
[abc]表示“a”或“b”或“c”
[0-9]表示0~9中任意一个数字,等价于[0123456789]
[\u4e00-\u9fa5]表示任意一个汉字
[^a1<]表示除“a”、“1”、“<”外的其它任意一个字符
[^a-z]表示除小写字母外的任意一个字符
已赞过
已踩过<
评论
收起
你对这个回答的评价是?
展开全部
>>>import re
>>> s = """{"headurl":"","nickname":"","loginstatus":"","loginstate":"","tip":"未注册服务","idUser":"","sessionId":"","upgradeUrl":"","checkCodeKey":"false"}"""
>>> s
'{"headurl":"","nickname":"","loginstatus":"","loginstate":"","tip":"\xce\xb4\xd7\xa2\xb2\xe1\xb7\xfe\xce\xf1","idUser":"","sessionId":"","upgradeUrl":"","checkCodeKey":"false"}'
>>> matches = re.findall("未注册服务",s,re.S)
>>> print matches
['\xce\xb4\xd7\xa2\xb2\xe1\xb7\xfe\xce\xf1']
>>> matches = re.findall("未0注册服务",s,re.S)
>>> print matches
[]
>>>
>>> s = """{"headurl":"","nickname":"","loginstatus":"","loginstate":"","tip":"未注册服务","idUser":"","sessionId":"","upgradeUrl":"","checkCodeKey":"false"}"""
>>> s
'{"headurl":"","nickname":"","loginstatus":"","loginstate":"","tip":"\xce\xb4\xd7\xa2\xb2\xe1\xb7\xfe\xce\xf1","idUser":"","sessionId":"","upgradeUrl":"","checkCodeKey":"false"}'
>>> matches = re.findall("未注册服务",s,re.S)
>>> print matches
['\xce\xb4\xd7\xa2\xb2\xe1\xb7\xfe\xce\xf1']
>>> matches = re.findall("未0注册服务",s,re.S)
>>> print matches
[]
>>>
已赞过
已踩过<
评论
收起
你对这个回答的评价是?
推荐律师服务:
若未解决您的问题,请您详细描述您的问题,通过百度律临进行免费专业咨询