數據可視化發現[吃雞]秘密
大吉大利,今晚吃雞~ 今天跟朋友玩了幾把吃雞,經歷了各種死法,還被嘲笑說論女生吃雞的100種死法,比如被拳頭掄死、跳傘落到房頂邊緣摔死 、把吃雞玩成飛車被車技秀死、被隊友用燃燒瓶燒死的。這種游戲對我來說就是一個讓我明白原來還有這種死法的游戲。但是玩歸玩,還是得假裝一下我沉迷學習,所以今天就用吃雞比賽的真實數據來看看如何提高你吃雞的概率。
那么我們就用python和R做數據分析來回答以下的靈魂發問?
首先來看下數據:
1、跳哪兒危險?
對于我這樣一直喜歡茍著的良心玩家,在經歷了無數次落地成河的慘痛經歷后,我是堅決不會選擇跳P城這樣樓房密集的城市,窮歸窮但保命要緊。所以我們決定統計一下到底哪些地方更容易落地成河?我們篩選出在前100秒死亡的玩家地點進行可視化分析。激情沙漠地圖的電站、皮卡多、別墅區、依波城最為危險,火車站、火電廠相對安全。絕地海島中P城、軍事基地、學校、醫院、核電站、防空洞都是絕對的危險地帶。物質豐富的G港居然相對安全。
- import numpy as np
- import matplotlib.pyplot as plt
- import pandas as pd
- import seaborn as sns
- from scipy.misc.pilutil import imread
- import matplotlib.cm as cm
- #導入部分數據
- deaths1 = pd.read_csv("deaths/kill_match_stats_final_0.csv")
- deaths2 = pd.read_csv("deaths/kill_match_stats_final_1.csv")
- deaths = pd.concat([deaths1, deaths2])
- #打印前5列,理解變量
- print (deaths.head(),'\n',len(deaths))
- #兩種地圖
- miramar = deaths[deaths["map"] == "MIRAMAR"]
- erangel = deaths[deaths["map"] == "ERANGEL"]
- #開局前100秒死亡熱力圖
- position_data = ["killer_position_x","killer_position_y","victim_position_x","victim_position_y"]
- for position in position_data:
- miramar[position] = miramar[position].apply(lambda x: x*1000/800000)
- miramar = miramar[miramar[position] != 0]
- erangel[position] = erangel[position].apply(lambda x: x*4096/800000)
- erangel = erangel[erangel[position] != 0]
- n = 50000
- mira_sample = miramar[miramar["time"] < 100].sample(n)
- eran_sample = erangel[erangel["time"] < 100].sample(n)
- # miramar熱力圖
- bg = imread("miramar.jpg")
- fig, ax = plt.subplots(1,1,figsize=(15,15))
- ax.imshow(bg)
- sns.kdeplot(mira_sample["victim_position_x"], mira_sample["victim_position_y"],n_levels=100, cmap=cm.Reds, alpha=0.9)
- # erangel熱力圖
- bg = imread("erangel.jpg")
- fig, ax = plt.subplots(1,1,figsize=(15,15))
- ax.imshow(bg)
- sns.kdeplot(eran_sample["victim_position_x"], eran_sample["victim_position_y"], n_levels=100,cmap=cm.Reds, alpha=0.9)
2、茍著還是出去干?
我到底是茍在房間里面還是出去和敵人硬拼?這里因為比賽的規模不一樣,這里選取參賽人數大于90的比賽數據,然后篩選出團隊team_placement即最后成功吃雞的團隊數據,1、先計算了吃雞團隊平均擊殺敵人的數量,這里剔除了四人模式的比賽數據,因為人數太多的團隊會因為數量懸殊平均而變得沒意義;2、所以我們考慮通過分組統計每一組吃雞中存活到最后的成員擊殺敵人的數量,但是這里發現數據統計存活時間變量是按照團隊最終存活時間記錄的,所以該想法失敗;3、最后統計每個吃雞團隊中擊殺人數最多的數量統計,這里剔除了單人模式的數據,因為單人模式的數量就是每組擊殺最多的數量。最后居然發現還有擊殺數量達到60的,懷疑是否有開掛。想要吃雞還是得出去練槍法,光是茍著是不行的。
- library(dplyr)
- library(tidyverse)
- library(data.table)
- library(ggplot2)
- pubg_full <- fread("../agg_match_stats.csv")
- # 吃雞團隊平均擊殺敵人的數量
- attach(pubg_full)
- pubg_winner <- pubg_full %>% filter(team_placement==1&party_size<4&game_size>90)
- detach(pubg_full)
- team_killed <- aggregate(pubg_winner$player_kills, by=list(pubg_winner$match_id,pubg_winner$team_id), FUN="mean")
- team_killed$death_num <- ceiling(team_killed$x)
- ggplot(data = team_killed) + geom_bar(mapping = aes(x = death_num, y = ..count..), color="steelblue") +
- xlim(0,70) + labs(title = "Number of Death that PUBG Winner team Killed", x="Number of death")
- # 吃雞團隊最后存活的玩家擊殺數量
- pubg_winner <- pubg_full %>% filter(pubg_full$team_placement==1) %>% group_by(match_id,team_id)
- attach(pubg_winner)
- team_leader <- aggregate(player_survive_time~player_kills, data = pubg_winner, FUN="max")
- detach(pubg_winner)
- # 吃雞團隊中擊殺敵人最多的數量
- pubg_winner <- pubg_full %>% filter(pubg_full$team_placement==1&pubg_full$party_size>1)
- attach(pubg_winner)
- team_leader <- aggregate(player_kills, by=list(match_id,team_id), FUN="max")
- detach(pubg_winner)
- ggplot(data = team_leader) + geom_bar(mapping = aes(x = x, y = ..count..), color="steelblue") +
- xlim(0,70) + labs(title = "Number of Death that PUBG Winner Killed", x="Number of death")
3、哪一種武器干掉的玩家多?
運氣好挑到好武器的時候,你是否猶豫選擇哪一件?從圖上來看,M416和SCAR是不錯的武器,也是相對容易能撿到的武器,大家公認Kar98k是能一槍斃命的好槍,它排名比較靠后的原因也是因為這把槍在比賽比較難得,而且一下擊中敵人也是需要實力的,像我這種撿到98k還裝上8倍鏡但沒捂熱乎1分鐘的玩家是不配得到它的。(捂臉)
- #殺人武器排名
- death_causes = deaths['killed_by'].value_counts()
- ns.set_context('talk')
- fig = plt.figure(figsize=(30, 10))
- ax = sns.barplot(x=death_causes.index, y=[v / sum(death_causes) for v in death_causes.values])
- ax.set_title('Rate of Death Causes')
- ax.set_xticklabels(death_causes.index, rotation=90)
- #排名前20的武器
- rank = 20
- fig = plt.figure(figsize=(20, 10))
- ax = sns.barplot(x=death_causes[:rank].index, y=[v / sum(death_causes) for v in death_causes[:rank].values])
- ax.set_title('Rate of Death Causes')
- ax.set_xticklabels(death_causes.index, rotation=90)
- #兩個地圖分開取
- f, axes = plt.subplots(1, 2, figsize=(30, 10))
- axes[0].set_title('Death Causes Rate: Erangel (Top {})'.format(rank))
- axes[1].set_title('Death Causes Rate: Miramar (Top {})'.format(rank))
- counts_er = erangel['killed_by'].value_counts()
- counts_mr = miramar['killed_by'].value_counts()
- sns.barplot(x=counts_er[:rank].index, y=[v / sum(counts_er) for v in counts_er.values][:rank], ax=axes[0] )
- sns.barplot(x=counts_mr[:rank].index, y=[v / sum(counts_mr) for v in counts_mr.values][:rank], ax=axes[1] )
- axes[0].set_ylim((0, 0.20))
- axes[0].set_xticklabels(counts_er.index, rotation=90)
- axes[1].set_ylim((0, 0.20))
- axes[1].set_xticklabels(counts_mr.index, rotation=90)
- #吃雞和武器的關系
- win = deaths[deaths["killer_placement"] == 1.0]
- win_causes = win['killed_by'].value_counts()
- sns.set_context('talk')
- fig = plt.figure(figsize=(20, 10))
- ax = sns.barplot(x=win_causes[:20].index, y=[v / sum(win_causes) for v in win_causes[:20].values])
- ax.set_title('Rate of Death Causes of Win')
- ax.set_xticklabels(win_causes.index, rotation=90)
4、隊友的助攻是否助我吃雞?
有時候一不留神就被擊倒了,還好我爬得快讓隊友救我。這里選擇成功吃雞的隊伍,最終接受1次幫助的成員所在的團隊吃雞的概率為29%,所以說隊友助攻還是很重要的(再不要罵我豬隊友了,我也可以選擇不救你。)竟然還有讓隊友救9次的,你也是個人才。(手動滑稽)
- library(dplyr)
- 2library(tidyverse)
- 3library(data.table)
- 4library(ggplot2)
- 5pubg_full <- fread("E:/aggregate/agg_match_stats_0.csv")
- 6attach(pubg_full)
- 7pubg_winner <- pubg_full %>% filter(team_placement==1)
- 8detach(pubg_full)
- 9ggplot(data = pubg_winner) + geom_bar(mapping = aes(x = player_assists, y = ..count..), fill="#E69F00") +
- 10 xlim(0,10) + labs(title = "Number of Player assisted", x="Number of death")
- 11ggplot(data = pubg_winner) + geom_bar(mapping = aes(x = player_assists, y = ..prop..), fill="#56B4E9") +
- 12 xlim(0,10) + labs(title = "Number of Player assisted", x="Number of death")
5、敵人離我越近越危險?
對數據中的killer_position和victim_position變量進行歐式距離計算,查看兩者的直線距離跟被擊倒的分布情況,呈現一個明顯的右偏分布,看來還是需要隨時觀察到附近的敵情,以免到淘汰都不知道敵人在哪兒。
- # python代碼:殺人和距離的關系
- import math
- def get_dist(df): #距離函數
- dist = []
- for row in df.itertuples():
- subset = (row.killer_position_x - row.victim_position_x)**2 + (row.killer_position_y - row.victim_position_y)**2
- if subset > 0:
- dist.append(math.sqrt(subset) / 100)
- else:
- dist.append(0)
- return dist
- df_dist = pd.DataFrame.from_dict({'dist(m)': get_dist(erangel)})
- df_dist.index = erangel.index
- erangel_dist = pd.concat([erangel,df_dist], axis=1)
- df_dist = pd.DataFrame.from_dict({'dist(m)': get_dist(miramar)})
- df_dist.index = miramar.index
- miramar_dist = pd.concat([miramar,df_dist], axis=1)
- f, axes = plt.subplots(1, 2, figsize=(30, 10))
- plot_dist = 150
- axes[0].set_title('Engagement Dist. : Erangel')
- axes[1].set_title('Engagement Dist.: Miramar')
- plot_dist_er = erangel_dist[erangel_dist['dist(m)'] <= plot_dist]
- plot_dist_mr = miramar_dist[miramar_dist['dist(m)'] <= plot_dist]
- sns.distplot(plot_dist_er['dist(m)'], ax=axes[0])
- sns.distplot(plot_dist_mr['dist(m)'], ax=axes[1])
6、團隊人越多我活得越久?
對數據中的party_size變量進行生存分析,可以看到在同一生存率下,四人團隊的生存時間高于兩人團隊,再是單人模式,所以人多力量大這句話不是沒有道理的。
7、乘車是否活得更久?
對死因分析中發現,也有不少玩家死于Bluezone,大家天真的以為撿繃帶就能跑毒。對數據中的player_dist_ride變量進行生存分析,可以看到在同一生存率下,有開車經歷的玩家生存時間高于只走路的玩家,光靠腿你是跑不過毒的。
8、小島上人越多我活得更久?
對game_size變量進行生存分析發現還是小規模的比賽比較容易存活。
- # R語言代碼如下:
- library(magrittr)
- library(dplyr)
- library(survival)
- library(tidyverse)
- library(data.table)
- library(ggplot2)
- library(survminer)
- pubg_full <- fread("../agg_match_stats.csv")
- # 數據預處理,將連續變量劃為分類變量
- pubg_sub <- pubg_full %>%
- filter(player_survive_time<2100) %>%
- mutate(drive = ifelse(player_dist_ride>0, 1, 0)) %>%
- mutate(size = ifelse(game_size<33, 1,ifelse(game_size>=33 &game_size<66,2,3)))
- # 創建生存對象
- surv_object <- Surv(time = pubg_sub$player_survive_time)
- fit1 <- survfit(surv_object~party_size,data = pubg_sub)
- # 可視化生存率
- ggsurvplot(fit1, data = pubg_sub, pval = TRUE, xlab="Playing time [s]", surv.median.line="hv",
- legend.labs=c("SOLO","DUO","SQUAD"), ggtheme = theme_light(),risk.table="percentage")
- fit2 <- survfit(surv_object~drive,data=pubg_sub)
- ggsurvplot(fit2, data = pubg_sub, pval = TRUE, xlab="Playing time [s]", surv.median.line="hv",
- legend.labs=c("walk","walk&drive"), ggtheme = theme_light(),risk.table="percentage")
- fit3 <- survfit(surv_object~size,data=pubg_sub)
- ggsurvplot(fit3, data = pubg_sub, pval = TRUE, xlab="Playing time [s]", surv.median.line="hv",
- legend.labs=c("small","medium","big"), ggtheme = theme_light(),risk.table="percentage")
9、最后毒圈有可能出現的地點?
面對有本事能茍到最后的我,怎么樣預測最后的毒圈出現在什么位置。從表agg_match_stats數據找出排名第一的隊伍,然后按照match_id分組,找出分組數據里面player_survive_time最大的值,然后據此匹配表格kill_match_stats_final里面的數據,這些數據里面取第二名死亡的位置,作圖發現激情沙漠的毒圈明顯更集中一些,大概率出現在皮卡多、圣馬丁和別墅區。絕地海島的就比較隨機了,但是還是能看出軍事基地和山脈的地方更有可能是最后的毒圈。
- #最后毒圈位置
- import matplotlib.pyplot as plt
- import pandas as pd
- import seaborn as sns
- from scipy.misc.pilutil import imread
- import matplotlib.cm as cm
- #導入部分數據
- deaths = pd.read_csv("deaths/kill_match_stats_final_0.csv")
- #導入aggregate數據
- aggregate = pd.read_csv("aggregate/agg_match_stats_0.csv")
- print(aggregate.head())
- #找出最后三人死亡的位置
- team_win = aggregate[aggregate["team_placement"]==1] #排名第一的隊伍
- #找出每次比賽第一名隊伍活的最久的那個player
- grouped = team_win.groupby('match_id').apply(lambda t: t[t.player_survive_time==t.player_survive_time.max()])
- deaths_solo = deaths[deaths['match_id'].isin(grouped['match_id'].values)]
- deaths_solo_er = deaths_solo[deaths_solo['map'] == 'ERANGEL']
- deaths_solo_mr = deaths_solo[deaths_solo['map'] == 'MIRAMAR']
- df_second_er = deaths_solo_er[(deaths_solo_er['victim_placement'] == 2)].dropna()
- df_second_mr = deaths_solo_mr[(deaths_solo_mr['victim_placement'] == 2)].dropna()
- print (df_second_er)
- position_data = ["killer_position_x","killer_position_y","victim_position_x","victim_position_y"]
- for position in position_data:
- df_second_mr[position] = df_second_mr[position].apply(lambda x: x*1000/800000)
- df_second_mr = df_second_mr[df_second_mr[position] != 0]
- df_second_er[position] = df_second_er[position].apply(lambda x: x*4096/800000)
- df_second_er = df_second_er[df_second_er[position] != 0]
- df_second_er=df_second_er
- # erangel熱力圖
- sns.set_context('talk')
- bg = imread("erangel.jpg")
- fig, ax = plt.subplots(1,1,figsize=(15,15))
- ax.imshow(bg)
- sns.kdeplot(df_second_er["victim_position_x"], df_second_er["victim_position_y"], cmap=cm.Blues, alpha=0.7,shade=True)
- # miramar熱力圖
- bg = imread("miramar.jpg")
- fig, ax = plt.subplots(1,1,figsize=(15,15))
- ax.imshow(bg)
- sns.kdeplot(df_second_mr["victim_position_x"], df_second_mr["victim_position_y"], cmap=cm.Blues,alpha=0.8,shade=True)