解鎖多線(xiàn)程死鎖之謎：深入探討使用GDB調(diào)試的技巧

作者：Sun share 2023-11-22 13:13:54

我們將探討多線(xiàn)程死鎖的概念、原理，同時(shí)我們通過(guò)一個(gè)例子來(lái)介紹如何使用GDB（GNU Debugger）這一工具來(lái)排查和解決多線(xiàn)程死鎖問(wèn)題。

多線(xiàn)程編程是現(xiàn)代軟件開(kāi)發(fā)中的一項(xiàng)重要技術(shù)，但隨之而來(lái)的挑戰(zhàn)之一是多線(xiàn)程死鎖。多線(xiàn)程死鎖是程序中的一種常見(jiàn)問(wèn)題，它會(huì)導(dǎo)致線(xiàn)程相互等待，陷入無(wú)法繼續(xù)執(zhí)行的狀態(tài)。這里，我們將探討多線(xiàn)程死鎖的概念、原理，同時(shí)我們通過(guò)一個(gè)例子來(lái)介紹如何使用GDB（GNU Debugger）這一工具來(lái)排查和解決多線(xiàn)程死鎖問(wèn)題。

多線(xiàn)程死鎖的概念

多線(xiàn)程死鎖是多線(xiàn)程編程中的一種關(guān)鍵問(wèn)題。它發(fā)生在多個(gè)線(xiàn)程試圖獲取一組資源（通常是鎖或資源對(duì)象）時(shí)，導(dǎo)致彼此相互等待的情況。具體來(lái)說(shuō)，當(dāng)線(xiàn)程1持有資源A并等待資源B，而線(xiàn)程2持有資源B并等待資源A時(shí)，就可能發(fā)生死鎖。

多線(xiàn)程死鎖原理

為了更好地理解多線(xiàn)程死鎖的原理，讓我們考慮一個(gè)簡(jiǎn)單的示例。假設(shè)有兩個(gè)資源A和B，以及兩個(gè)線(xiàn)程（Thread 1和Thread 2）。線(xiàn)程1需要獲取資源A和B，線(xiàn)程2需要獲取資源B和A。如果線(xiàn)程1獲取了資源A，而線(xiàn)程2獲取了資源B，它們都無(wú)法繼續(xù)，因?yàn)樗鼈兌夹枰獙?duì)方持有的資源才能繼續(xù)。這就是典型的死鎖情況。

多線(xiàn)程死鎖通常發(fā)生在以下情況下：

線(xiàn)程同時(shí)持有一個(gè)資源并等待另一個(gè)資源。
資源分配不當(dāng)，線(xiàn)程沒(méi)有按照相同的順序獲取資源。

多線(xiàn)程死鎖之所以會(huì)發(fā)生，是因?yàn)榫€(xiàn)程之間的相互依賴(lài)和等待。當(dāng)多個(gè)線(xiàn)程需要共享資源時(shí)，它們可能會(huì)按不同的順序獲取這些資源，導(dǎo)致資源互斥問(wèn)題，最終引發(fā)死鎖。

排查多線(xiàn)程死鎖

GDB是一個(gè)強(qiáng)大的調(diào)試工具，可以用來(lái)排查多線(xiàn)程死鎖問(wèn)題。下面通過(guò)一個(gè)例子來(lái)說(shuō)下如何使用gdb調(diào)試死鎖問(wèn)題，這也是前段時(shí)間我碰鎖問(wèn)題新學(xué)到的技能。

簡(jiǎn)單的代碼如下：


#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>

pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t mutex2 = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t exit_condition = PTHREAD_COND_INITIALIZER;
int should_exit = 0;

void *thread1_function(void *arg) {
    while (1) {
        printf("Thread 1: Attempting to acquire mutex1...\n");
        pthread_mutex_lock(&mutex1);
        printf("Thread 1: Acquired mutex1.\n");

        printf("Thread 1: Attempting to acquire mutex2...\n");
        pthread_mutex_lock(&mutex2);
        printf("Thread 1: Acquired mutex2.\n");

        // 在此處檢查是否應(yīng)該退出
        if (should_exit) {
            pthread_mutex_unlock(&mutex2);
            pthread_mutex_unlock(&mutex1);
      break;
        }

        pthread_mutex_unlock(&mutex1);
        pthread_mutex_unlock(&mutex2);
    }
    printf("Thread 1 exit done!\n");
    pthread_exit(NULL);
}

void *thread2_function(void *arg) {
    sleep(5); // 讓線(xiàn)程2休眠10秒鐘

    printf("Thread 2: Attempting to acquire mutex2...\n");
    pthread_mutex_lock(&mutex2);
    printf("Thread 2: Acquired mutex2.\n");

    printf("Thread 2: Notifying Thread 1 to exit...\n");
    should_exit = 1;
    pthread_cond_signal(&exit_condition);

    //通過(guò)不釋放該鎖制造死鎖
    pthread_mutex_unlock(&mutex2);

    printf("Thread 2 exit done!\n");
    //exit執(zhí)行后不會(huì)再執(zhí)行該函數(shù)后面部分
    pthread_exit(NULL);
}

int main() {
    pthread_t thread1, thread2;

    pthread_create(&thread1, NULL, thread1_function, NULL);
    pthread_create(&thread2, NULL, thread2_function, NULL);

    pthread_join(thread1, NULL);
    pthread_join(thread2, NULL);

    return 0;
}

代碼很簡(jiǎn)單，通過(guò)創(chuàng)建兩個(gè)線(xiàn)程，線(xiàn)程1睡眠5s為mutex2加鎖并通知線(xiàn)程1進(jìn)行退出，之后線(xiàn)程2退出，線(xiàn)程1是個(gè)while循環(huán)，不停的對(duì)mutex1進(jìn)行加解鎖，并加鎖后檢測(cè)是否退出，退出則對(duì)mutex2進(jìn)行加鎖打印，然后釋放mutex1、mutex2進(jìn)行退出。

使用：gcc thread.c -g -lpthread -o thread編譯，因?yàn)橐猤db調(diào)試所以需要帶上-g參數(shù)，正常現(xiàn)象會(huì)執(zhí)行結(jié)束打印如下：

現(xiàn)在我們屏蔽掉線(xiàn)程2釋放mutex2進(jìn)行死鎖調(diào)試：

void *thread2_function(void *arg) {
    sleep(5); // 讓線(xiàn)程2休眠10秒鐘

    printf("Thread 2: Attempting to acquire mutex2...\n");
    pthread_mutex_lock(&mutex2);
    printf("Thread 2: Acquired mutex2.\n");

    printf("Thread 2: Notifying Thread 1 to exit...\n");
    should_exit = 1;
    pthread_cond_signal(&exit_condition);

    //通過(guò)不釋放該鎖制造死鎖
    //pthread_mutex_unlock(&mutex2);

    printf("Thread 2 exit done!\n");
    //exit執(zhí)行后不會(huì)再執(zhí)行該函數(shù)后面部分
    pthread_exit(NULL);
}

實(shí)際環(huán)境中我們并不知道死鎖發(fā)生，所以我們通過(guò)gdb先運(yùn)行一次直到程序無(wú)法正常退出時(shí)，執(zhí)行bt查看堆棧：

這里因?yàn)榧恿舜蛴∷院芸炜梢钥吹絤utex2上鎖那里卡住，實(shí)際環(huán)境會(huì)有很多線(xiàn)程運(yùn)行，我們并不直到哪里會(huì)有問(wèn)題，此時(shí)只能通過(guò)bt查看堆棧我們發(fā)現(xiàn)卡在函數(shù)__futex_abstimed_wait_common64，運(yùn)行到./nptl/futex-internal.c文件第57行。

這里我們只需要知道該函數(shù)__futex_abstimed_wait_common64是Linux內(nèi)核中用于處理互斥鎖等待超時(shí)的一個(gè)內(nèi)部函數(shù)即可。

此時(shí)可以斷定代碼存在死鎖問(wèn)題了，我們繼續(xù)排查。

我們繼續(xù)看bt信息，發(fā)現(xiàn)該等待是從#4 0x00005555555553c8 in main () at thread.c:59調(diào)入的，因?yàn)榍懊媸?4，所以使用f 4進(jìn)入該函數(shù)。

我們發(fā)現(xiàn)是main里調(diào)入，同時(shí)在執(zhí)行thread1的pthread_join，所以前面的__futex_abstimed_wait_common64并不是我們真正要找的問(wèn)題，其實(shí)thread1已經(jīng)來(lái)到了join的位置，等待結(jié)束了。我們繼續(xù)執(zhí)行thread apply all bt把所有線(xiàn)程堆棧打出來(lái)看下：

根據(jù)前面分析thread 1已經(jīng)正常退出了，我們這里看到thread 2卡在futex_wait，根據(jù)上下文非常明顯是在等待futex lock，再往下看我們發(fā)現(xiàn)鎖mutex2，這里就是thread2在等待mutex2，那么mutex2被誰(shuí)lock住沒(méi)釋放呢？我們通過(guò)p mutex2來(lái)查看owner即可知道該鎖被誰(shuí)擁有。

這里有個(gè)問(wèn)題，是因?yàn)樵摯a恰巧thread 1退出等待join了，所以這里的23890是個(gè)內(nèi)核線(xiàn)程，在持有著mutex2，實(shí)際環(huán)境中我們會(huì)看到owner大概會(huì)是info threads中的LWP，于是就可以定位到該鎖被誰(shuí)持有沒(méi)有釋放了，再分析代碼即可。

我把thread 1再改下，不直接退出而是一直while(1)的形態(tài)來(lái)測(cè)試，此時(shí)再通過(guò)上述來(lái)查找mutex2被誰(shuí)持有即可直觀看到：