大數(shù)據(jù)技術(shù)的對決——Spark對Impala對Hive對Presto
在大數(shù)據(jù)浪潮全面來襲的歷史背景下,我們一直面臨著同一類難題的困擾——該選擇哪款工具解決相關(guān)問題?這項(xiàng)挑戰(zhàn)在大數(shù)據(jù)SQL引擎領(lǐng)域同樣存在。作為大數(shù)據(jù)報(bào)告工具開發(fā)商,AtScale公司通過基準(zhǔn)測試為我們帶來了如下答案:
1. Spark 2.0在大規(guī)模查詢性能方面可達(dá)1.6版本的2.4倍。二者的小規(guī)模查詢性能基本持平。
Spark 2.0 improved its large query performance by an average of 2.4X over Spark 1.6 (so upgrade!). Small query performance was already good and remained roughly the same.
2. Impala 2.6版本在大規(guī)模查詢性能可達(dá)2.3版本的2.8倍,小規(guī)模查詢基本持平。
Impala 2.6 is 2.8X as fast for large queries as version 2.3. Small query performance was already good and remained roughly the same.
3. Hive 2.1配合LLAP在大規(guī)模查詢場景下可實(shí)現(xiàn)1.2版本性能的3.4倍,小規(guī)模查詢性能則為2倍。
Hive 2.1 with LLAP is over 3.4X faster than 1.2, and its small query performance doubled. If you're using Hive, this isn't an upgrade you can afford to skip.