一次Group By+Order By性能優化分析

作者：周夢康 2019-03-19 14:52:00

最近通過一個日志表做排行的時候發現特別卡，問題得到了解決，梳理一些索引和MySQL執行過程的經驗，但是還是有5個謎題沒解開，希望大家幫忙解答下。

最近通過一個日志表做排行的時候發現特別卡，問題得到了解決，梳理一些索引和MySQL執行過程的經驗，但是還是有5個謎題沒解開，希望大家幫忙解答下

主要包含如下知識點

用數據說話證明慢日志的掃描行數到底是如何統計出來的
從 group by 執行原理找出優化方案
排序的實現細節
gdb 源碼調試

背景

需要分別統計本月、本周被訪問的文章的前10。日志表如下

CREATE TABLE `article_rank` (  
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,  
  `aid` int(11) unsigned NOT NULL,  
  `pv` int(11) unsigned NOT NULL DEFAULT '1',  
  `day` int(11) NOT NULL COMMENT '日期 例如 20171016',  
  PRIMARY KEY (`id`),  
  KEY `idx_day_aid_pv` (`day`,`aid`,`pv`),  
  KEY `idx_aid_day_pv` (`aid`,`day`,`pv`)  
) ENGINE=InnoDB DEFAULT CHARSET=utf8

準備工作

為了能夠清晰的驗證自己的一些猜想，在虛擬機里安裝了一個 debug 版的 mysql，然后開啟了慢日志收集，用于統計掃描行數

安裝

下載源碼
編譯安裝
創建 mysql 用戶
初始化數據庫
初始化 mysql 配置文件
修改密碼

如果你興趣，具體可以參考我的博客，一步步安裝 https://mengkang.net/1335.html

開啟慢日志

編輯配置文件，在[mysqld]塊下添加

slow_query_log=1  
slow_query_log_file=xxx  
long_query_time=0  
log_queries_not_using_indexes=1

性能分析

發現問題

假如我需要查詢2018-12-20 ~ 2018-12-24這5天瀏覽量多的10篇文章的 sql 如下，首先使用explain看下分析結果

mysql> explain select aid,sum(pv) as num from article_rank where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;  
+----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+--------+----------+-----------------------------------------------------------+ 
 
| id | select_type | table        | partitions | type  | possible_keys                 | key            | key_len | ref  | rows   | filtered | Extra                                                | 
 
+----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+--------+----------+-----------------------------------------------------------+ 
 
|  1 | SIMPLE      | article_rank | NULL       | range | idx_day_aid_pv,idx_aid_day_pv | idx_day_aid_pv | 4       | NULL | 404607 |   100.00 | Using where; Using index; Using temporary; Using filesort | 
 
+----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+--------+----------+-----------------------------------------------------------+

系統默認會走的索引是idx_day_aid_pv，根據Extra信息我們可以看到，使用idx_day_aid_pv索引的時候，會走覆蓋索引，但是會使用臨時表，會有排序。

我們查看下慢日志里的記錄信息

# Time: 2019-03-17T03:02:27.984091Z  
# User@Host: root[root] @ localhost []  Id:     6  
# Query_time: 56.959484  Lock_time: 0.000195 Rows_sent: 10  Rows_examined: 1337315  
SET timestamp=1552791747;  
select aid,sum(pv) as num from article_rank where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;

為什么掃描行數是 1337315

我們查詢兩個數據，一個是滿足條件的行數，一個是group by統計之后的行數。

mysql> select count(*) from article_rank where day>=20181220 and day<=20181224;  
+----------+  
| count(*) |  
+----------+  
|   785102 |  
+----------+  
mysql> select count(distinct aid) from article_rank where day>=20181220 and day<=20181224;  
+---------------------+  
| count(distinct aid) |  
+---------------------+  
|              552203 |  
+---------------------+

發現滿足條件的總行數（785102）+group by 之后的總行數（552203）+limit 的值 = 慢日志里統計的 Rows_examined。

要解答這個問題，就必須搞清楚上面這個 sql 到底分別都是如何運行的。

執行流程分析

索引示例

為了便于理解，我按照索引的規則先模擬idx_day_aid_pv索引的一小部分數據

day	aid	pv	id
20181220	1	23	1234
20181220	3	2	1231
20181220	4	1	1212
20181220	7	2	1221
20181221	1	5	1257
20181221	10	1	1251
20181221	11	8	1258

因為索引idx_day_aid_pv最左列是day，所以當我們需要查找20181220~20181224之間的文章的pv總和的時候，我們需要遍歷20181220~20181224這段數據的索引。

查看 optimizer trace 信息  
# 開啟 optimizer_trace  
set optimizer_trace='enabled=on';  
# 執行 sql   
select aid,sum(pv) as num from article_rank where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;  
# 查看 trace 信息  
select trace from `information_schema`.`optimizer_trace`\G;

摘取里面的執行結果如下

{  
  "join_execution": {  
    "select#": 1,  
    "steps": [  
      {  
        "creating_tmp_table": {  
          "tmp_table_info": {  
            "table": "intermediate_tmp_table",  
            "row_length": 20,  
            "key_length": 4,  
            "unique_constraint": false,  
            "location": "memory (heap)",  
            "row_limit_estimate": 838860  
          }  
        }  
      },  
      {  
        "converting_tmp_table_to_ondisk": {  
          "cause": "memory_table_size_exceeded",  
          "tmp_table_info": {  
            "table": "intermediate_tmp_table",  
            "row_length": 20,  
            "key_length": 4,  
            "unique_constraint": false,  
            "location": "disk (InnoDB)",  
            "record_format": "fixed"  
          }  
        }  
      },  
      {  
        "filesort_information": [  
          {  
            "direction": "desc",  
            "table": "intermediate_tmp_table",  
            "field": "num"  
          }  
        ],  
        "filesort_priority_queue_optimization": {  
          "limit": 10,  
          "rows_estimate": 1057,  
          "row_size": 36,  
          "memory_available": 262144,  
          "chosen": true  
        },  
        "filesort_execution": [  
        ],  
        "filesort_summary": {  
          "rows": 11,  
          "examined_rows": 552203,  
          "number_of_tmp_files": 0,  
          "sort_buffer_size": 488,  
          "sort_mode": "<sort_key, additional_fields>"  
        }  
      }  
    ]  
  }  
}

分析臨時表字段

mysql gdb 調試更多細節 https://mengkang.net/1336.html

通過gdb調試確認臨時表上的字段是aid和num

Breakpoint 1, trace_tmp_table (trace=0x7eff94003088, table=0x7eff94937200) at /root/newdb/mysql-server/sql/sql_tmp_table.cc:2306  
warning: Source file is more recent than executable.  
2306      trace_tmp.add("row_length",table->s->reclength).  
(gdb) p table->s->reclength  
$1 = 20  
(gdb) p table->s->fields  
$22 = 2  
(gdb) p (*(table->field+0))->field_name  
$3 = 0x7eff94010b0c "aid"  
(gdb) p (*(table->field+1))->field_name  
$4 = 0x7eff94007518 "num"  
(gdb) p (*(table->field+0))->row_pack_length()  
$5 = 4  
(gdb) p (*(table->field+1))->row_pack_length()  
$6 = 15  
(gdb) p (*(table->field+0))->type()  
$7 = MYSQL_TYPE_LONG  
(gdb) p (*(table->field+1))->type()  
$8 = MYSQL_TYPE_NEWDECIMAL  
(gdb)

通過上面的打印，確認了字段類型，一個aid是MYSQL_TYPE_LONG，占4字節，num是MYSQL_TYPE_NEWDECIMAL，占15字節。

The SUM() and AVG() functions return a DECIMAL value for exact-value arguments (integer or DECIMAL), and a DOUBLE value for approximate-value arguments (FLOAT or DOUBLE). (Before MySQL 5.0.3, SUM() and AVG() return DOUBLE for all numeric arguments.)

但是通過我們上面打印信息可以看到兩個字段的長度加起來是19，而optimizer_trace里的tmp_table_info.reclength是20。通過其他實驗也發現table->s->reclength的長度就是table->field數組里面所有字段的字段長度和再加1。

總結執行流程

嘗試在堆上使用memory的內存臨時表來存放group by的數據，發現內存不夠；
創建一張臨時表，臨時表上有兩個字段，aid和num字段（sum(pv) as num）；
從索引idx_day_aid_pv中取出1行，插入臨時表。插入規則是如果aid不存在則直接插入，如果存在，則把pv的值累加在num上；
循環遍歷索引idx_day_aid_pv上20181220~20181224之間的所有行，執行步驟3；
對臨時表根據num的值做優先隊列排序；
取出留在堆（優先隊列的堆）里面的10行數據，作為結果集直接返回，不需要再回表；

補充說明優先隊列排序執行步驟分析：

在臨時表（未排序）中取出前 10 行，把其中的num和aid作為10個元素構成一個小頂堆，也就是最小的 num 在堆頂。
取下一行，根據 num 的值和堆頂值作比較，如果該字大于堆頂的值，則替換掉。然后將新的堆做堆排序。
重復步驟2直到第 552203 行比較完成。

優化

方案1 使用 idx_aid_day_pv 索引

# Query_time: 4.406927  Lock_time: 0.000200 Rows_sent: 10  Rows_examined: 1337315  
SET timestamp=1552791804;  
select aid,sum(pv) as num from article_rank force index(idx_aid_day_pv) where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;

掃描行數都是1337315，為什么執行消耗的時間上快了12倍呢？

索引示例

為了便于理解，同樣我也按照索引的規則先模擬idx_aid_day_pv索引的一小部分數據

aid	day	pv	id
1	20181220	23	1234
1	20181221	5	1257
3	20181220	2	1231
3	20181222	22	1331
3	20181224	13	1431
4	20181220	1	1212
7	20181220	2	1221
10	20181221	1	1251
11	20181221	8	1258

group by 不需要臨時表的情況

為什么性能上比 SQL1 高了，很多呢，原因之一是idx_aid_day_pv索引上aid是確定有序的，那么執行group by的時候，則不會創建臨時表，排序的時候才需要臨時表。如果印證這一點呢，我們通過下面的執行計劃就能看到

使用idx_day_aid_pv索引的效果：

mysql> explain select aid,sum(pv) as num from article_rank force index(idx_day_aid_pv) where day>=20181220 and day<=20181224 group by aid order by null limit 10; 
 
+----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+--------+----------+-------------------------------------------+ 
 
| id | select_type | table        | partitions | type  | possible_keys                 | key            | key_len | ref  | rows   | filtered | Extra                                     | 
 
+----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+--------+----------+-------------------------------------------+ 
 
|  1 | SIMPLE      | article_rank | NULL       | range | idx_day_aid_pv,idx_aid_day_pv | idx_day_aid_pv | 4       | NULL | 404607 |   100.00 | Using where; Using index; Using temporary | 
 
+----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+--------+----------+-------------------------------------------+

注意我上面使用了order by null表示強制對group by的結果不做排序。如果不加order by null，上面的 sql 則會出現Using filesort

使用idx_aid_day_pv索引的效果：

mysql> explain select aid,sum(pv) as num from article_rank force index(idx_aid_day_pv) where day>=20181220 and day<=20181224 group by aid order by null limit 10; 
 
+----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+------+----------+--------------------------+ 
 
| id | select_type | table        | partitions | type  | possible_keys                 | key            | key_len | ref  | rows | filtered | Extra                    | 
 
+----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+------+----------+--------------------------+ 
 
|  1 | SIMPLE      | article_rank | NULL       | index | idx_day_aid_pv,idx_aid_day_pv | idx_aid_day_pv | 12      | NULL |   10 |    11.11 | Using where; Using index | 
 
+----+-------------+--------------+------------+-------+-------------------------------+----------------+---------+------+------+----------+--------------------------+

查看 optimizer trace 信息

# 開啟optimizer_trace  
set optimizer_trace='enabled=on';  
# 執行 sql   
select aid,sum(pv) as num from article_rank force index(idx_aid_day_pv) where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;  
# 查看 trace 信息  
select trace from `information_schema`.`optimizer_trace`\G;

摘取里面的執行結果如下

{  
  "join_execution": {  
    "select#": 1,  
    "steps": [  
      {  
        "creating_tmp_table": {  
          "tmp_table_info": {  
            "table": "intermediate_tmp_table",  
            "row_length": 20,  
            "key_length": 0,  
            "unique_constraint": false,  
            "location": "memory (heap)",  
            "row_limit_estimate": 838860  
          }  
        }  
      },  
      {  
        "filesort_information": [  
          {  
            "direction": "desc",  
            "table": "intermediate_tmp_table",  
            "field": "num"  
          }  
        ],  
        "filesort_priority_queue_optimization": {  
          "limit": 10,  
          "rows_estimate": 552213,  
          "row_size": 24,  
          "memory_available": 262144,  
          "chosen": true  
        },  
        "filesort_execution": [  
        ],  
        "filesort_summary": {  
          "rows": 11,  
          "examined_rows": 552203,  
          "number_of_tmp_files": 0,  
          "sort_buffer_size": 352,  
          "sort_mode": "<sort_key, rowid>"  
        }  
      }  
    ]  
  }  
}

執行流程如下

1. 創建一張臨時表，臨時表上有兩個字段，aid和num字段（sum(pv) as num）；

2. 讀取索引idx_aid_day_pv中的一行，然后查看是否滿足條件，如果day字段不在條件范圍內（20181220~20181224之間），則讀取下一行；如果day字段在條件范圍內，則把pv值累加（不是在臨時表中操作）；

3. 讀取索引idx_aid_day_pv中的下一行，如果aid與步驟1中一致且滿足條件，則pv值累加（不是在臨時表中操作）。如果aid與步驟1中不一致，則把之前的結果集寫入臨時表；

4. 循環執行步驟2、3，直到掃描完整個idx_aid_day_pv索引；

5. 對臨時表根據num的值做優先隊列排序；

6. 根據查詢到的前10條的rowid回表（臨時表）返回結果集。

補充說明優先隊列排序執行步驟分析：

在臨時表（未排序）中取出前 10 行，把其中的num和rowid作為10個元素構成一個小頂堆，也就是最小的 num 在堆頂。
取下一行，根據 num 的值和堆頂值作比較，如果該字大于堆頂的值，則替換掉。然后將新的堆做堆排序。
重復步驟2直到第 552203 行比較完成。

該方案可行性

實驗發現，當我增加一行20181219的數據時，雖然這行記錄不滿足我們的需求，但是掃描索引的也會讀取這行。因為我做這個實驗，只弄了20181220~201812245天的數據，所以需要掃描的行數正好是全表數據行數。

那么如果該表的數據存儲的不是5天的數據，而是10天的數據呢，更或者是365天的數據呢？這個方案是否還可行呢？先模擬10天的數據，在現有時間基礎上往后加5天，行數與現在一樣785102行。

drop procedure if exists idata;  
delimiter ;;  
create procedure idata()  
begin  
  declare i int;  
  declare aid int;  
  declare pv int;  
  declare post_day int;  
  set i=1;  
  while(i<=785102)do  
    set aid = round(rand()*500000);  
    set pv = round(rand()*100);  
    set post_day = 20181225 + i%5;  
    insert into article_rank (`aid`,`pv`,`day`) values(aid, pv, post_day);  
    set ii=i+1;  
  end while;  
end;;  
delimiter ;  
call idata();  
# Query_time: 9.151270  Lock_time: 0.000508 Rows_sent: 10  Rows_examined: 2122417  
SET timestamp=1552889936;  
select aid,sum(pv) as num from article_rank force index(idx_aid_day_pv) where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;

這里掃描行數2122417是因為掃描索引的時候需要遍歷整個索引，整個索引的行數就是全表行數，因為我剛剛又插入了785102行。

當我數據量翻倍之后，這里查詢時間明顯已經翻倍。所以這個優化方式不穩定。

方案2 擴充臨時表空間上限大小

默認的臨時表空間大小是16MB

mysql> show global variables like '%table_size';  
+---------------------+----------+  
| Variable_name       | Value    |  
+---------------------+----------+  
| max_heap_table_size | 16777216 |  
| tmp_table_size      | 16777216 |  
+---------------------+----------+

https://dev.mysql.com/doc/ref...

max_heap_table_size

This variable sets the maximum size to which user-created MEMORY tables are permitted to grow. The value of the variable is used to calculate MEMORY table MAX_ROWS values. Setting this variable has no effect on any existing MEMORY table, unless the table is re-created with a statement such as CREATE TABLE or altered with ALTER TABLE or TRUNCATE TABLE. A server restart also sets the maximum size of existing MEMORY tables to the global max_heap_table_size value.

tmp_table_size

The maximum size of internal in-memory temporary tables. This variable does not apply to user-created MEMORY tables.

The actual limit is determined from whichever of the values of tmp_table_size and max_heap_table_size is smaller. If an in-memory temporary table exceeds the limit, MySQL automatically converts it to an on-disk temporary table. The internal_tmp_disk_storage_engine option defines the storage engine used for on-disk temporary tables.

也就是說這里臨時表的限制是16M，max_heap_table_size大小也受tmp_table_size大小的限制。

所以我們這里調整為32MB，然后執行原始的SQL

set tmp_table_size=33554432;  
set max_heap_table_size=33554432;

# Query_time: 5.910553  Lock_time: 0.000210 Rows_sent: 10  Rows_examined: 1337315  
SET timestamp=1552803869;  
select aid,sum(pv) as num from article_rank where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;

方案3 使用 SQL_BIG_RESULT 優化

告訴優化器，查詢結果比較多，臨時表直接走磁盤存儲。

# Query_time: 6.144315  Lock_time: 0.000183 Rows_sent: 10  Rows_examined: 2122417  
SET timestamp=1552802804;  
select SQL_BIG_RESULT aid,sum(pv) as num from article_rank where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;

掃描行數是 2x滿足條件的總行數（785102）+group by 之后的總行數（552203）+limit 的值。

順便值得一提的是：當我把數據量翻倍之后，使用該方式，查詢時間基本沒變。因為掃描的行數還是不變的。實際測試耗時6.197484

總結

方案1優化效果不穩定，當總表數據量與查詢范圍的總數相同時，且不超出內存臨時表大小限制時，性能達到更佳。當查詢數據量占據總表數據量越大，優化效果越不明顯；

方案2需要調整臨時表內存的大小，可行；不過當數據庫超過32MB時，如果使用該方式，還需要繼續提升臨時表大小；

方案3直接聲明使用磁盤來放臨時表，雖然掃描行數多了一次符合條件的總行數的掃描。但是整體響應時間比方案2就慢了0.1秒。因為我們這里數據量比較，我覺得這個時間差還能接受。

所以對比，選擇方案3比較合適。

問題與困惑

# SQL1  
select aid,sum(pv) as num from article_rank where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;  
# SQL2  
select aid,sum(pv) as num from article_rank force index(idx_aid_day_pv) where day>=20181220 and day<=20181224 group by aid order by num desc limit 10;

SQL1 執行過程中，使用的是全字段排序后不需要回表為什么總掃描行數還要加上10才對得上？
SQL1 與 SQL2 group by之后得到的行數都是552203，為什么會出現 SQL1 內存不夠，里面還有哪些細節呢？
trace 信息里的creating_tmp_table.tmp_table_info.row_limit_estimate都是838860；計算由來是臨時表的內存限制大小16MB，而一行需要占的空間是20字節，那么最多只能容納 floor(16777216/20) = 838860行，而實際我們需要放入臨時表的行數是785102。為什么呢？
SQL1 使用SQL_BIG_RESULT優化之后，原始表需要掃描的行數會乘以2，背后邏輯是什么呢？為什么僅僅是不再嘗試往內存臨時表里寫入這一步會相差10多倍的性能？
通過源碼看到 trace 信息里面很多掃描行數都不是實際的行數，既然是實際執行，為什么 trace 信息里不輸出真實的掃描行數和容量等呢，比如 filesort_priority_queue_optimization.rows_estimate在SQL1中的掃描行數我通過gdb看到計算規則如附錄圖 1
有沒有工具能夠統計 SQL 執行過程中的 I/O 次數？

責任編輯：龐桂玉來源： segmentfault

成人免费xxxxx在线视频软件_久久精品久久久_亚洲国产精品久久久_天天色天天色_亚洲人成一区_欧美一级欧美三级在线观看

一次Group By+Order By性能優化分析