Python中利用正則表達式的 16個常見任務(wù)

作者：小白PythonAI編程 2024-09-23 20:00:00

正則表達式是一種用于匹配字符串的語言。它由一系列字符和特殊符號組成，用來描述搜索模式。在Python中，re模塊提供了支持正則表達式的功能。

安裝與導入

首先，確保你的Python環(huán)境中已經(jīng)安裝了re模塊。這是Python的標準庫之一，所以通常不需要額外安裝。

import re

字符匹配

單個字符：使用方括號[]表示一組字符中的任意一個。

# 匹配任何字母
pattern = "[a-zA-Z]"
string = "Hello World!"
match = re.search(pattern, string)
print(match.group())  # 輸出: H

多個字符：使用*表示零次或多次出現(xiàn)。

# 匹配任意數(shù)量的空格
pattern = "\s*"
string = "    Hello World!"
match = re.match(pattern, string)
print(match.group())  # 輸出: '    '

范圍匹配

使用-定義一個范圍內(nèi)的字符。

# 匹配小寫字母a到e
pattern = "[a-e]"
string = "abcdeABCDE"
matches = re.findall(pattern, string)
print(matches)  # 輸出: ['a', 'b', 'c', 'd', 'e']

排除字符

使用^排除某些字符。

# 匹配除了a到z之外的所有字符
pattern = "[^a-z]"
string = "123ABCdef"
matches = re.findall(pattern, string)
print(matches)  # 輸出: ['1', '2', '3', 'A', 'B', 'C']

字符集組合

可以將多個字符集組合起來使用。

# 匹配數(shù)字或大寫字母
pattern = "[0-9A-Z]+"
string = "Hello123World"
matches = re.findall(pattern, string)
print(matches)  # 輸出: ['123']

位置錨定

^表示行首，$表示行尾。

# 匹配以大寫字母開頭的單詞
pattern = "^[A-Z][a-zA-Z]*"
string = "Hello world"
matches = re.findall(pattern, string)
print(matches)  # 輸出: ['Hello']

分組與引用

使用圓括號()來創(chuàng)建一個捕獲組。

# 捕獲日期格式
pattern = "(\d{4})-(\d{2})-(\d{2})"
string = "Today is 2023-04-01."
match = re.match(pattern, "2023-04-01")
if match:
    year = match.group(1)
    month = match.group(2)
    day = match.group(3)
    print(f"Year: {year}, Month: {month}, Day: {day}")

非捕獲組

如果不關(guān)心某部分的內(nèi)容，可以使用(?:)。

# 不捕獲中間的冒號
pattern = r"(\d{2}):(?:\d{2}):\d{2}"
string = "09:30:15"
match = re.match(pattern, string)
if match:
    hour = match.group(1)
    print(hour)  # 輸出: 09

替換文本

使用re.sub()方法替換字符串中的匹配項。

# 將所有空格替換成下劃線
pattern = "\s+"
string = "Hello   World"
new_string = re.sub(pattern, "_", string)
print(new_string)  # 輸出: Hello_World

貪婪與非貪婪匹配

貪婪匹配：默認情況下，正則表達式會盡可能多地匹配字符。
非貪婪匹配：使用?使匹配變得“懶惰”，即盡可能少地匹配字符。

# 貪婪匹配
pattern = "<.*>"
string = "<p>Hello <span>World</span></p>"
match = re.search(pattern, string)
print(match.group())  # 輸出: <p>Hello <span>World</span></p>

# 非貪婪匹配
pattern = "<.*?>"
string = "<p>Hello <span>World</span></p>"
match = re.search(pattern, string)
print(match.group())  # 輸出: <p>

條件分支

使用(?P<name>)命名捕獲組，并通過(?P=name)引用它們。

# 匹配重復(fù)的單詞
pattern = r"\b(\w+)\b\s+\1\b"
string = "hello hello world"
match = re.sub(pattern, r"\1", string, flags=re.IGNORECASE)
print(match)  # 輸出: hello world

重復(fù)限定符

使用{n}指定精確重復(fù)次數(shù)。
使用{n,}指定至少重復(fù)n次。
使用{n,m}指定重復(fù)n到m次。

# 匹配恰好重復(fù)三次的字符
pattern = r"a{3}"
string = "aaabbbccc"
matches = re.findall(pattern, string)
print(matches)  # 輸出: []

# 匹配至少重復(fù)兩次的字符
pattern = r"a{2,}"
string = "aaabbbccc"
matches = re.findall(pattern, string)
print(matches)  # 輸出: ['aaa']

# 匹配重復(fù)兩到四次的字符
pattern = r"a{2,4}"
string = "aaabbbccc"
matches = re.findall(pattern, string)
print(matches)  # 輸出: ['aaa']

特殊字符

點號 .：匹配除換行符之外的任何字符。
反斜杠 \：轉(zhuǎn)義特殊字符。

# 匹配包含任何字符的單詞
pattern = r"\w+"
string = "hello\nworld"
matches = re.findall(pattern, string)
print(matches)  # 輸出: ['hello']

# 匹配特殊字符
pattern = r"\."
string = "hello.world"
matches = re.findall(pattern, string)
print(matches)  # 輸出: ['.']

邊界限定符

單詞邊界 \b：匹配單詞的開始和結(jié)束。
非單詞邊界 \B：匹配非單詞的位置。

# 匹配單詞邊界
pattern = r"\bhello\b"
string = "hello world"
matches = re.findall(pattern, string)
print(matches)  # 輸出: ['hello']

# 匹配非單詞邊界
pattern = r"\Bworld\B"
string = "hello world"
matches = re.findall(pattern, string)
print(matches)  # 輸出: []

標志位

忽略大小寫 re.IGNORECASE：使匹配不區(qū)分大小寫。
多行模式 re.MULTILINE：使^和$分別匹配每一行的開始和結(jié)束。
點號匹配換行符 re.DOTALL：使.匹配包括換行符在內(nèi)的任何字符。

# 忽略大小寫
pattern = r"hello"
string = "Hello world"
matches = re.findall(pattern, string, re.IGNORECASE)
print(matches)  # 輸出: ['Hello']

# 多行模式
pattern = r"^hello"
string = "hello\nworld"
matches = re.findall(pattern, string, re.MULTILINE)
print(matches)  # 輸出: ['hello']

# 點號匹配換行符
pattern = r"hello.*world"
string = "hello\nworld"
matches = re.findall(pattern, string, re.DOTALL)
print(matches)  # 輸出: ['hello\nworld']

實戰(zhàn)案例分析

假設(shè)我們需要從一段文本中提取所有的郵箱地址。這可以通過正則表達式輕松實現(xiàn)。郵箱地址的一般形式為 username@domain.com。下面是一個簡單的示例：

責任編輯：趙寧寧來源：小白PythonAI編程

正則表達式 Python

成人免费xxxxx在线视频软件_久久精品久久久_亚洲国产精品久久久_天天色天天色_亚洲人成一区_欧美一级欧美三级在线观看

Python中利用正則表達式的 16個 常見任務(wù)