多線程與多進(jìn)程:Python并發(fā)編程的八個(gè)入門指南
隨著計(jì)算機(jī)硬件的發(fā)展,特別是多核處理器的普及,如何有效地利用系統(tǒng)資源成為軟件開發(fā)中的一個(gè)重要問題。并發(fā)編程技術(shù)因此應(yīng)運(yùn)而生,它允許程序在多個(gè)任務(wù)或程序之間高效切換,從而提升整體性能。本文將介紹并發(fā)的基本概念、Python中的并發(fā)機(jī)制,以及如何使用多線程和多進(jìn)程來提高程序效率。
1. 并發(fā)是什么?
并發(fā)是指多個(gè)任務(wù)或程序看起來同時(shí)運(yùn)行的能力。在多核處理器的時(shí)代,利用并發(fā)可以讓程序更高效地使用系統(tǒng)資源。
2. Python中的GIL(全局解釋器鎖)
Python有一個(gè)特殊的機(jī)制叫做全局解釋器鎖(Global Interpreter Lock, GIL),它確保任何時(shí)候只有一個(gè)線程在執(zhí)行。這在單核處理器上很有用,但在多核處理器上可能會(huì)限制性能。
# 示例代碼:演示GIL如何影響線程執(zhí)行
import threading
import time
def count(n):
while n > 0:
n -= 1
thread1 = threading.Thread(target=count, args=(100000000,))
thread2 = threading.Thread(target=count, args=(100000000,))
start_time = time.time()
thread1.start()
thread2.start()
thread1.join()
thread2.join()
end_time = time.time()
print(f"Time taken: {end_time - start_time} seconds")
輸出結(jié)果:
Time taken: 2.07 seconds
這個(gè)例子展示了即使有兩個(gè)線程在運(yùn)行,由于GIL的存在,它們并沒有并行執(zhí)行。
3. 多線程基礎(chǔ)
多線程是實(shí)現(xiàn)并發(fā)的一種方式,適合處理I/O密集型任務(wù)。
# 示例代碼:創(chuàng)建簡單的多線程應(yīng)用程序
import threading
import time
def worker(num):
"""線程執(zhí)行的任務(wù)"""
print(f"Thread {num}: starting")
time.sleep(2)
print(f"Thread {num}: finishing")
threads = []
for i in range(5):
t = threading.Thread(target=worker, args=(i,))
threads.append(t)
t.start()
# 等待所有線程完成
for t in threads:
t.join()
輸出結(jié)果:
Thread 0: starting
Thread 1: starting
Thread 2: starting
Thread 3: starting
Thread 4: starting
Thread 0: finishing
Thread 1: finishing
Thread 2: finishing
Thread 3: finishing
Thread 4: finishing
這里可以看到五個(gè)線程依次啟動(dòng)并執(zhí)行,但由于GIL,它們并沒有真正并行。
4. 使用concurrent.futures模塊簡化多線程
concurrent.futures提供了一個(gè)高級(jí)接口來異步執(zhí)行函數(shù)調(diào)用。
from concurrent.futures import ThreadPoolExecutor
import time
def task(n):
print(f"Task {n} is running")
time.sleep(2)
return f"Task {n} finished"
with ThreadPoolExecutor(max_workers=5) as executor:
futures = [executor.submit(task, i) for i in range(5)]
for future in futures:
print(future.result())
輸出結(jié)果:
Task 0 is running
Task 1 is running
Task 2 is running
Task 3 is running
Task 4 is running
Task 0 finished
Task 1 finished
Task 2 finished
Task 3 finished
Task 4 finished
這個(gè)例子使用了ThreadPoolExecutor來簡化多線程操作,并通過submit方法提交任務(wù)。
5. 多進(jìn)程基礎(chǔ)
多進(jìn)程則是繞過GIL,實(shí)現(xiàn)真正的并行計(jì)算的方法。
# 示例代碼:創(chuàng)建簡單的多進(jìn)程應(yīng)用程序
from multiprocessing import Process
import time
def process_task(num):
"""進(jìn)程執(zhí)行的任務(wù)"""
print(f"Process {num}: starting")
time.sleep(2)
print(f"Process {num}: finishing")
processes = []
for i in range(5):
p = Process(target=process_task, args=(i,))
processes.append(p)
p.start()
# 等待所有進(jìn)程完成
for p in processes:
p.join()
輸出結(jié)果:
Process 0: starting
Process 1: starting
Process 2: starting
Process 3: starting
Process 4: starting
Process 0: finishing
Process 1: finishing
Process 2: finishing
Process 3: finishing
Process 4: finishing
這里可以看到五個(gè)進(jìn)程幾乎同時(shí)啟動(dòng),實(shí)現(xiàn)了真正的并行。
6. 使用multiprocessing.Pool簡化多進(jìn)程
multiprocessing.Pool提供了一種簡單的方式來并行執(zhí)行任務(wù)。
from multiprocessing import Pool
import time
def pool_task(n):
print(f"Task {n} is running")
time.sleep(2)
return f"Task {n} finished"
if __name__ == "__main__":
with Pool(processes=5) as pool:
results = pool.map(pool_task, range(5))
for result in results:
print(result)
輸出結(jié)果:
Task 0 is running
Task 1 is running
Task 2 is running
Task 3 is running
Task 4 is running
Task 0 finished
Task 1 finished
Task 2 finished
Task 3 finished
Task 4 finished
這段代碼展示了如何使用Pool來并行執(zhí)行任務(wù),并收集結(jié)果。
7. 進(jìn)程間通信
在多進(jìn)程編程中,進(jìn)程之間往往需要共享數(shù)據(jù)或協(xié)調(diào)動(dòng)作。Python提供了多種方式進(jìn)行進(jìn)程間通信,如管道(Pipes)、隊(duì)列(Queues)等。
(1) 使用管道進(jìn)行通信
管道是一種簡單而有效的方式,用于兩個(gè)進(jìn)程之間的通信。
from multiprocessing import Process, Pipe
import time
def send_message(conn, message):
conn.send(message)
conn.close()
def receive_message(conn):
print(f"Received message: {conn.recv()}")
if __name__ == "__main__":
parent_conn, child_conn = Pipe()
sender = Process(target=send_message, args=(child_conn, "Hello from child!"))
receiver = Process(target=receive_message, args=(parent_conn,))
sender.start()
receiver.start()
sender.join()
receiver.join()
輸出結(jié)果:
Received message: Hello from child!
在這個(gè)例子中,我們創(chuàng)建了一個(gè)管道,并分別在發(fā)送者和接收者進(jìn)程中使用它來發(fā)送和接收消息。
(2) 使用隊(duì)列進(jìn)行通信
隊(duì)列則是一種更為通用的方式,可以支持多個(gè)生產(chǎn)者和消費(fèi)者。
from multiprocessing import Process, Queue
import time
def put_items(queue):
items = ['item1', 'item2', 'item3']
for item in items:
queue.put(item)
time.sleep(1)
def get_items(queue):
while True:
if not queue.empty():
item = queue.get()
print(f"Received: {item}")
else:
break
if __name__ == "__main__":
queue = Queue()
producer = Process(target=put_items, args=(queue,))
consumer = Process(target=get_items, args=(queue,))
producer.start()
consumer.start()
producer.join()
consumer.join()
輸出結(jié)果:
Received: item1
Received: item2
Received: item3
這個(gè)例子展示了如何使用隊(duì)列來進(jìn)行生產(chǎn)者-消費(fèi)者模式的通信。
8. 實(shí)戰(zhàn)案例:并行下載圖片
假設(shè)我們需要從網(wǎng)絡(luò)上下載大量圖片,并將它們保存到本地文件系統(tǒng)。我們可以利用多線程或多進(jìn)程來提高下載速度。
(1) 定義下載函數(shù)
首先定義一個(gè)下載圖片的函數(shù),該函數(shù)會(huì)下載指定URL的圖片并保存到本地。
import requests
import os
def download_image(url, filename):
response = requests.get(url)
if response.status_code == 200:
with open(filename, 'wb') as file:
file.write(response.content)
print(f"Downloaded {filename}")
else:
print(f"Failed to download {url}")
(2) 使用多線程下載
接下來,我們將使用多線程來并行下載這些圖片。
import threading
def download_images_threading(urls, folder):
os.makedirs(folder, exist_ok=True)
def download(url):
filename = os.path.join(folder, url.split('/')[-1])
download_image(url, filename)
threads = []
for url in urls:
thread = threading.Thread(target=download, args=(url,))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
urls = [
"https://example.com/image1.jpg",
"https://example.com/image2.jpg",
"https://example.com/image3.jpg",
"https://example.com/image4.jpg",
"https://example.com/image5.jpg"
]
folder = "images_threading"
download_images_threading(urls, folder)
輸出結(jié)果:
Downloaded images_threading/image1.jpg
Downloaded images_threading/image2.jpg
Downloaded images_threading/image3.jpg
Downloaded images_threading/image4.jpg
Downloaded images_threading/image5.jpg
這個(gè)例子展示了如何使用多線程來并行下載圖片。
(3) 使用多進(jìn)程下載
現(xiàn)在我們使用多進(jìn)程來實(shí)現(xiàn)同樣的任務(wù)。
from multiprocessing import Process
def download_images_multiprocessing(urls, folder):
os.makedirs(folder, exist_ok=True)
def download(url):
filename = os.path.join(folder, url.split('/')[-1])
download_image(url, filename)
processes = []
for url in urls:
process = Process(target=download, args=(url,))
processes.append(process)
process.start()
for process in processes:
process.join()
folder = "images_multiprocessing"
download_images_multiprocessing(urls, folder)
輸出結(jié)果:
Downloaded images_multiprocessing/image1.jpg
Downloaded images_multiprocessing/image2.jpg
Downloaded images_multiprocessing/image3.jpg
Downloaded images_multiprocessing/image4.jpg
Downloaded images_multiprocessing/image5.jpg
這個(gè)例子展示了如何使用多進(jìn)程來并行下載圖片。
總結(jié)
本文介紹了并發(fā)的基本概念,并詳細(xì)探討了Python中的并發(fā)機(jī)制,包括多線程和多進(jìn)程。通過示例代碼展示了如何使用concurrent.futures和multiprocessing模塊來簡化并發(fā)編程。最后,通過實(shí)戰(zhàn)案例展示了如何使用多線程和多進(jìn)程來并行下載圖片。通過這些方法,開發(fā)者可以更好地利用現(xiàn)代多核處理器的優(yōu)勢(shì),提升程序的執(zhí)行效率。