必學的 20 個 Excel 表格操作 Python 腳本

作者：小白PythonAI編程 2024-10-23 08:00:00

本文介紹了使用 Python 處理 Excel 文件的各種方法，通過這些示例，你可以掌握如何高效地管理和分析 Excel 數據。

本文將介紹使用 Python 處理 Excel 文件的多種方法，涵蓋從基本的讀寫操作到高級的數據分析與可視化。通過這些示例，你可以學習如何高效地管理和分析 Excel 數據。

1. 安裝必要的庫

在開始之前，確保安裝了 pandas 和 openpyxl 這兩個庫。這兩個庫是處理 Excel 文件的基礎。

pip install pandas openpyxl

2. 讀取 Excel 文件

首先，讓我們看看如何讀取一個 Excel 文件。

import pandas as pd

# 讀取 Excel 文件
df = pd.read_excel('example.xlsx', engine='openpyxl')

# 顯示前五行數據
print(df.head())

輸出：

   A    B     C
0  1  100  1000
1  2  200  2000
2  3  300  3000
3  4  400  4000
4  5  500  5000

3. 寫入 Excel 文件

接下來，我們將創建一個新的 DataFrame 并將其寫入新的 Excel 文件。

import pandas as pd

# 創建一個 DataFrame
data = {
    'A': [1, 2, 3, 4, 5],
    'B': [100, 200, 300, 400, 500],
    'C': [1000, 2000, 3000, 4000, 5000]
}
df = pd.DataFrame(data)

# 將 DataFrame 寫入 Excel 文件
df.to_excel('output.xlsx', index=False)

4. 選擇特定列

有時候我們只需要 Excel 文件中的某些列。

import pandas as pd

# 讀取 Excel 文件
df = pd.read_excel('example.xlsx', usecols=['A', 'C'])

# 顯示前五行數據
print(df.head())

輸出：

5. 過濾數據

過濾數據可以幫助我們找到特定條件下的記錄。

import pandas as pd

# 讀取 Excel 文件
df = pd.read_excel('example.xlsx')

# 過濾出 A 列大于 3 的行
filtered_df = df[df['A'] > 3]

# 顯示過濾后的數據
print(filtered_df)

輸出：

   A    B     C
3  4  400  4000
4  5  500  5000

6. 數據排序

排序數據可以讓我們更容易地查看數據的趨勢。

import pandas as pd

# 讀取 Excel 文件
df = pd.read_excel('example.xlsx')

# 按照 A 列降序排列
sorted_df = df.sort_values(by='A', ascending=False)

# 顯示排序后的數據
print(sorted_df)

輸出：

   A    B     C
4  5  500  5000
3  4  400  4000
2  3  300  3000
1  2  200  2000
0  1  100  1000

7. 數據分組

數據分組可以幫助我們分析不同類別的數據。

import pandas as pd

# 讀取 Excel 文件
df = pd.read_excel('example.xlsx')

# 按照 B 列分組并計算平均值
grouped_df = df.groupby('B').mean()

# 顯示分組后的數據
print(grouped_df)

輸出：

          A        C
B                    
100  1.000000  1000.0
200  2.000000  2000.0
300  3.000000  3000.0
400  4.000000  4000.0
500  5.000000  5000.0

8. 添加新列

我們可以根據現有列的數據輕松添加新列。

import pandas as pd

# 讀取 Excel 文件
df = pd.read_excel('example.xlsx')

# 添加新列 D
df['D'] = df['A'] * df['B']

# 顯示更新后的數據
print(df)

輸出：

   A    B     C     D
0  1  100  1000   100
1  2  200  2000   400
2  3  300  3000   900
3  4  400  4000  1600
4  5  500  5000  2500

9. 更新單元格

有時候我們需要更新特定單元格的值。

import pandas as pd

# 讀取 Excel 文件
df = pd.read_excel('example.xlsx')

# 更新 A 列第 2 行的值
df.at[1, 'A'] = 10

# 顯示更新后的數據
print(df)

輸出：

   A    B     C
0  1  100  1000
1 10  200  2000
2  3  300  3000
3  4  400  4000
4  5  500  5000

10. 刪除列

刪除不需要的列可以簡化數據集。

import pandas as pd

# 讀取 Excel 文件
df = pd.read_excel('example.xlsx')

# 刪除 C 列
del df['C']

# 顯示更新后的數據
print(df)

輸出：

11. 合并多個 Excel 文件

合并多個 Excel 文件可以方便地將數據集中到一起。

import pandas as pd

# 讀取多個 Excel 文件
df1 = pd.read_excel('example1.xlsx')
df2 = pd.read_excel('example2.xlsx')

# 合并兩個 DataFrame
merged_df = pd.concat([df1, df2], ignore_index=True)

# 顯示合并后的數據
print(merged_df)

輸出：

   A    B     C
0  1  100  1000
1  2  200  2000
2  3  300  3000
3  4  400  4000
4  5  500  5000

12. 數據透視表

數據透視表是一種強大的工具，可以快速匯總和分析數據。

import pandas as pd

# 讀取 Excel 文件
df = pd.read_excel('example.xlsx')

# 創建數據透視表
pivot_table = pd.pivot_table(df, values='C', index=['A'], columns=['B'], aggfunc=sum)

# 顯示數據透視表
print(pivot_table)

輸出：

B       100   200   300   400   500
A                        
1       1000  NaN   NaN   NaN   NaN
2       NaN   2000  NaN   NaN   NaN
3       NaN   NaN   3000  NaN   NaN
4       NaN   NaN   NaN   4000  NaN
5       NaN   NaN   NaN   NaN   5000

13. 數據合并

合并多個數據集可以讓你更好地分析數據之間的關系。

import pandas as pd

# 讀取兩個 Excel 文件
df1 = pd.read_excel('example1.xlsx')
df2 = pd.read_excel('example2.xlsx')

# 使用內連接合并兩個數據集
merged_df = pd.merge(df1, df2, on='A', how='inner')

# 顯示合并后的數據
print(merged_df)

輸出：

   A    B_x    C_x    B_y    C_y
0  1   100  1000    10    100
1  2   200  2000    20    200
2  3   300  3000    30    300
3  4   400  4000    40    400
4  5   500  5000    50    500

14. 數據清洗

數據清洗是數據分析的重要步驟，可以去除無效或錯誤的數據。

import pandas as pd

# 讀取 Excel 文件
df = pd.read_excel('example.xlsx')

# 去除空值
df.dropna(inplace=True)

# 去除重復行
df.drop_duplicates(inplace=True)

# 顯示清洗后的數據
print(df)

輸出：

   A    B     C
0  1  100  1000
1  2  200  2000
2  3  300  3000
3  4  400  4000
4  5  500  5000

15. 數據類型轉換

正確設置數據類型有助于節省內存并提高性能。

import pandas as pd

# 讀取 Excel 文件
df = pd.read_excel('example.xlsx')

# 將 A 列轉換為整型
df['A'] = df['A'].astype(int)

# 將 B 列轉換為浮點型
df['B'] = df['B'].astype(float)

# 顯示轉換后的數據
print(df.dtypes)

輸出：

A         int64
B        float64
C        float64
dtype: object

16. 數據可視化

使用 matplotlib 庫可以方便地繪制圖表。

import pandas as pd
import matplotlib.pyplot as plt

# 讀取 Excel 文件
df = pd.read_excel('example.xlsx')

# 繪制柱狀圖
plt.bar(df['A'], df['B'])
plt.xlabel('A')
plt.ylabel('B')
plt.title('A vs B')
plt.show()

17. 多頁 Excel 文件操作

處理多頁 Excel 文件時，可以使用 openpyxl 庫。

from openpyxl import load_workbook

# 加載 Excel 文件
wb = load_workbook('example.xlsx')

# 獲取所有工作表名稱
sheet_names = wb.sheetnames
print(sheet_names)

# 選擇特定工作表
sheet = wb['Sheet1']

# 讀取特定單元格的值
cell_value = sheet.cell(row=1, column=1).value
print(cell_value)

輸出：

['Sheet1', 'Sheet2', 'Sheet3']
1

18. 條件格式化

條件格式化可以幫助你突出顯示特定數據。

import pandas as pd
from openpyxl import Workbook
from openpyxl.styles import PatternFill

# 創建一個新的 DataFrame
data = {
    'A': [1, 2, 3, 4, 5],
    'B': [100, 200, 300, 400, 500],
    'C': [1000, 2000, 3000, 4000, 5000]
}
df = pd.DataFrame(data)

# 將 DataFrame 寫入 Excel 文件
wb = Workbook()
ws = wb.active
df.to_excel(ws, index=False)

# 設置條件格式化
for row in ws.iter_rows(min_row=2, max_row=6, min_col=2, max_col=2):
    for cell in row:
        if cell.value > 300:
            cell.fill = PatternFill(start_color="FF0000", end_color="FF0000", fill_type="solid")

# 保存 Excel 文件
wb.save('condition.xlsx')

19. 自定義樣式

自定義樣式可以讓你的 Excel 文件更加美觀。

import pandas as pd
from openpyxl import Workbook
from openpyxl.styles import Font, Border, Side, Alignment

# 創建一個新的 DataFrame
data = {
    'A': [1, 2, 3, 4, 5],
    'B': [100, 200, 300, 400, 500],
    'C': [1000, 2000, 3000, 4000, 5000]
}
df = pd.DataFrame(data)

# 將 DataFrame 寫入 Excel 文件
wb = Workbook()
ws = wb.active
df.to_excel(ws, index=False)

# 設置字體樣式
for row in ws.iter_rows(min_row=1, max_row=1, min_col=1, max_col=3):
    for cell in row:
        cell.font = Font(bold=True, color="FFFFFF")

# 設置邊框樣式
thin_border = Border(left=Side(style='thin'), 
                     right=Side(style='thin'), 
                     top=Side(style='thin'), 
                     bottom=Side(style='thin'))

for row in ws.iter_rows(min_row=1, max_row=6, min_col=1, max_col=3):
    for cell in row:
        cell.border = thin_border

# 設置居中對齊
for row in ws.iter_rows(min_row=1, max_row=6, min_col=1, max_col=3):
    for cell in row:
        cell.alignment = Alignment(horizontal='center', vertical='center')

# 保存 Excel 文件
wb.save('styled.xlsx')

20. 批量處理文件

批量處理多個 Excel 文件可以大大提高效率。

import os
import pandas as pd

# 獲取目錄中的所有 Excel 文件
files = [f for f in os.listdir('.') if f.endswith('.xlsx')]

# 循環處理每個文件
for file in files:
    # 讀取 Excel 文件
    df = pd.read_excel(file)
    
    # 進行數據處理
    df['D'] = df['A'] * df['B']
    
    # 保存處理后的文件
    df.to_excel(f'processed_{file}', index=False)

實戰案例：員工績效分析

假設你有一個包含員工績效數據的 Excel 文件，需要分析每位員工的績效。

import pandas as pd

# 讀取 Excel 文件
performance_data = pd.read_excel('employee_performance.xlsx')

# 計算每位員工的總銷售額
performance_data['Total Sales'] = performance_data['Quantity'] * performance_data['Price']

# 分析每位員工的平均銷售額
average_sales = performance_data.groupby('Employee')['Total Sales'].mean()

# 顯示平均銷售額
print(average_sales)

輸出：

Employee
Alice    5000.0
Bob      6000.0
Charlie  7000.0
Dave     8000.0
Eve      9000.0
dtype: float64

總結

本文介紹了使用 Python 處理 Excel 文件的各種方法，包括讀取、寫入、篩選、排序、分組、添加新列、更新單元格、刪除列、合并多個文件、數據透視表、數據合并、數據清洗、數據類型轉換、數據可視化、多頁文件操作、條件格式化、自定義樣式以及批量處理等。通過這些示例，你可以掌握如何高效地管理和分析 Excel 數據。

責任編輯：趙寧寧來源：小白PythonAI編程

成人免费xxxxx在线视频软件_久久精品久久久_亚洲国产精品久久久_天天色天天色_亚洲人成一区_欧美一级欧美三级在线观看