終于把機器學習中的混淆矩陣搞懂了！

作者：程序員小寒 2024-08-23 09:06:35

人工智能機器學習

混淆矩陣是用于評估分類模型性能的表格。它通過將實際（真實）標簽與預測標簽進行比較，提供分類問題的預測結(jié)果摘要。混淆矩陣本身是正方形（nxn），其中 n 是模型中的類別數(shù)。

大家好，我是小寒

今天給大家分享一個機器學習中一個重要的概念，混淆矩陣

混淆矩陣是用于評估分類模型性能的表格。它通過將實際（真實）標簽與預測標簽進行比較，提供分類問題的預測結(jié)果摘要。

混淆矩陣本身是正方形（nxn），其中 n 是模型中的類別數(shù)。

對于二元分類問題，混淆矩陣由四個主要部分組成：

True Positive (TP, 真陽性)：實際為正類，預測也為正類的數(shù)量。
True Negative (TN, 真陰性)：實際為負類，預測也為負類的數(shù)量。
False Positive (FP, 假陽性)：實際為負類，預測卻為正類的數(shù)量，通常稱為"Type I 錯誤"或"誤報"。
False Negative (FN, 假陰性)：實際為正類，預測卻為負類的數(shù)量，通常稱為"Type II 錯誤"或"漏報"。

圖片

為什么要使用混淆矩陣？

混淆矩陣是評估分類模型性能的基本工具。

錯誤分析
它有助于識別模型所犯的錯誤類型，無論模型更容易出現(xiàn)假陽性還是假陰性，這在應(yīng)用范圍內(nèi)（例如在醫(yī)學診斷中）可能至關(guān)重要。
模型改進
通過分析混淆矩陣，你可以專注于改進模型的特定方面，例如減少誤報或提高召回率。
類別不平衡處理
在類別不平衡的情況下，一個類別出現(xiàn)的頻率高于另一個類別，單憑準確率可能會產(chǎn)生誤導。
混淆矩陣可讓你更好地了解模型在每個類別中的表現(xiàn)。
性能指標計算

分類中的評估指標

1.準確率

準確率是分類任務(wù)中最簡單的評估指標之一，用來衡量模型預測正確的比例。

準確率的局限性

當處理不平衡的數(shù)據(jù)集時，一個類別的數(shù)量遠遠超過其他類別，準確率可能會產(chǎn)生誤導。

例如，在 95％的樣本屬于同一類的數(shù)據(jù)集中，預測所有實例為多數(shù)類的模型的準確率為 95％，但在識別少數(shù)類時則無效。

import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix, accuracy_score

# Example true labels (ytest) and predicted labels (ypred)
ytest = [0, 1, 0, 1, 0, 1, 0, 0, 1, 1]
ypred = [0, 1, 0, 0, 0, 1, 0, 1, 1, 1]

# Calculate confusion matrix
cm = confusion_matrix(ytest, ypred)

# Create a heatmap
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=False,
            xticklabels=['1', '0'],
            yticklabels=['1', '0'])

# Add labels and title
plt.xlabel('Predicted Classes')
plt.ylabel('Actual Classes')
plt.title('Confusion Matrix')

# Calculate and display accuracy
accuracy = accuracy_score(ytest, ypred)
plt.text(2.3, 1.5, f'Accuracy: {accuracy:.2f}', fontsize=14, color='black', weight='bold')

plt.show()

圖片

2.精度

精度用來衡量模型預測為正類的樣本中實際為正類的比例。

import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix, precision_score

# Example true labels (ytest) and predicted labels (ypred)
ytest = ['spam', 'spam', 'ham', 'spam', 'ham', 'spam', 'spam', 'ham', 'spam', 'spam', 'ham', 'spam', 'ham', 'ham', 'ham']
ypred = ['spam', 'spam', 'spam', 'spam', 'ham', 'spam', 'spam', 'ham', 'spam', 'spam', 'ham', 'ham', 'ham', 'ham', 'ham']

# Calculate the confusion matrix
cm = confusion_matrix(ytest, ypred, labels=['spam', 'ham'])
print("Confusion Matrix:\n", cm)

# Calculate precision
precision = precision_score(ytest, ypred, pos_label='spam')
print("Precision:", precision)

# Create a heatmap for the confusion matrix
plt.figure(figsize=(8, 6))
ax = sns.heatmap(cm, annot=True, fmt='d', cmap='viridis', cbar=False,
                 xticklabels=['Predicted Spam', 'Predicted Ham'],
                 yticklabels=['Actual Spam', 'Actual Ham'])

# Set labels and title
plt.xlabel('Predicted Classes')
plt.ylabel('Actual Classes')
plt.title(f'Confusion Matrix\nPrecision: {precision:.2f}')

# Show the plot
plt.show()

圖片

3.召回率

召回率用來衡量實際為正類的樣本中模型預測為正類的比例。

import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix, recall_score

# Example true labels (ytest) and predicted labels (ypred)
ytest = ['positive', 'positive', 'negative', 'positive', 'negative']
ypred = ['positive', 'negative', 'negative', 'positive', 'positive']

# Calculate the confusion matrix
cm = confusion_matrix(ytest, ypred, labels=['positive', 'negative'])


# Calculate recall
recall = recall_score(ytest, ypred, pos_label='positive')


# Create a heatmap for the confusion matrix
plt.figure(figsize=(6, 4))
ax = sns.heatmap(cm, annot=True, fmt='d', cmap='viridis', cbar=False,
                 xticklabels=['Predicted Positive', 'Predicted Negative'],
                 yticklabels=['Actual Positive', 'Actual Negative'])

# Set labels and title
plt.xlabel('Predicted Classes')
plt.ylabel('Actual Classes')
plt.title(f'Confusion Matrix\nRecall: {recall:.2f}')

# Show the plot
plt.show()

圖片

4.F1-score

F1-score 是精度和召回率的調(diào)和平均數(shù)，用來綜合考慮精度和召回率的平衡。

責任編輯：武曉燕來源：程序員學長

機器學習混淆矩陣預測

成人免费xxxxx在线视频软件_久久精品久久久_亚洲国产精品久久久_天天色天天色_亚洲人成一区_欧美一级欧美三级在线观看