PostgreSQL的最佳特性 你用了嗎?
SQL語(yǔ)句通常不是很容易理解,特別是你閱讀別人已經(jīng)寫好的語(yǔ)句。因此,很多人指出我們應(yīng)該遵循在其他語(yǔ)言中遵循的原則,像加上注釋和功能模塊化。 我***注意到一個(gè)很多人都沒(méi)有使用的Postgres關(guān)鍵特性,也就是 @timonk在AWS Re:Invent 大會(huì)關(guān)于數(shù)據(jù)倉(cāng)庫(kù)服務(wù)Redshift主題演講時(shí)指出的一個(gè)特性。這個(gè)特性實(shí)際上使得SQL兼具了可讀性和模塊性。在以前,我回頭閱讀自己的幾個(gè)月前的 SQL語(yǔ)句,通常很難理解,而現(xiàn)在我可以做到這一點(diǎn)。
這個(gè)特性就是CTEs,也就是公用表表達(dá)式,你有可能稱做它為WITH
語(yǔ)句。和數(shù)據(jù)庫(kù)中視圖一樣,它的主要好處就是,它允許你在當(dāng)前事務(wù)中創(chuàng)建臨時(shí)表。你可以大量使用它,因?yàn)樗试S你思路清晰的構(gòu)建模塊,別人很容易就理解你在做什么。
讓我們舉個(gè)簡(jiǎn)單的例子
- WITH users_tasks AS (
- SELECT
- users.email,
- array_agg(tasks.name) as task_list,
- projects.title
- FROM
- users,
- tasks,
- project
- WHERE
- users.id = tasks.user_id
- projects.title = tasks.project_id
- GROUP BY
- users.email,
- projects.title
- )
通過(guò)這樣定義臨時(shí)表users_tasks,我就可以在后面加上對(duì)users_tasks基本查詢語(yǔ)句,像:
- SELECT *
- FROM users_tasks;
有趣的是你可以將它們連在一起。當(dāng)我知道分配給每個(gè)用戶的任務(wù)量時(shí),也許我想知道在一個(gè)指定的任務(wù)上,誰(shuí)因?yàn)閷?duì)這個(gè)任務(wù)負(fù)責(zé)超過(guò)了50%而因此造成瓶頸。為了簡(jiǎn)化,我們可以使用多種方式,先計(jì)算每個(gè)任務(wù)的總量,然后是每人針對(duì)每個(gè)任務(wù)的負(fù)責(zé)總量。
- total_tasks_per_project AS (
- SELECT
- project_id,
- count(*) as task_count
- FROM tasks
- GROUP BY project_id
- ),
- tasks_per_project_per_user AS (
- SELECT
- user_id,
- project_id,
- count(*) as task_count
- FROM tasks
- GROUP BY user_id, project_id
- ),
現(xiàn)在我們將組合一下然后發(fā)現(xiàn)超過(guò)50%的用戶
- overloaded_users AS (
- SELECT tasks_per_project_per_user.user_id,
- FROM tasks_per_project_per_user,
- total_tasks_per_project
- WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2)
- )
最終目標(biāo),我想獲得超負(fù)荷工作這的用戶和任務(wù)的逗號(hào)分隔列表。我們只要簡(jiǎn)單地對(duì)overloaded_users和 users_tasks的初始列表進(jìn)行join操作。放在一起可能有點(diǎn)長(zhǎng),但是可讀性強(qiáng)。作為額外幫助,我又在每一層加了注釋。
- --- Created by Craig Kerstiens 11/18/2013
- --- Query highlights users that have over 50% of tasks on a given project
- --- Gives comma separated list of their tasks and the project
- --- Initial query to grab project title and tasks per user
- WITH users_tasks AS (
- SELECT
- users.id as user_id,
- users.email,
- array_agg(tasks.name) as task_list,
- projects.title
- FROM
- users,
- tasks,
- project
- WHERE
- users.id = tasks.user_id
- projects.title = tasks.project_id
- GROUP BY
- users.email,
- projects.title
- ),
- --- Calculates the total tasks per each project
- total_tasks_per_project AS (
- SELECT
- project_id,
- count(*) as task_count
- FROM tasks
- GROUP BY project_id
- ),
- --- Calculates the projects per each user
- tasks_per_project_per_user AS (
- SELECT
- user_id,
- project_id,
- count(*) as task_count
- FROM tasks
- GROUP BY user_id, project_id
- ),
- --- Gets user ids that have over 50% of tasks assigned
- overloaded_users AS (
- SELECT tasks_per_project_per_user.user_id,
- FROM tasks_per_project_per_user,
- total_tasks_per_project
- WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2)
- )
- SELECT
- email,
- task_list,
- title
- FROM
- users_tasks,
- overloaded_users
- WHERE
- users_tasks.user_id = overloaded_users.user_id
CTEs通常不如經(jīng)過(guò)精簡(jiǎn)優(yōu)化過(guò)的SQL語(yǔ)句性能高。大多數(shù)差距小于一倍差距。對(duì)我而言,這種為了可讀性作出的折中是毋庸置疑的。Postgres優(yōu)化器以后肯定會(huì)針對(duì)這點(diǎn)變的更好。
多說(shuō)一句,是的我可以用大約10-15行簡(jiǎn)短的SQL語(yǔ)句做同樣的事情,但是你也許不能很快的理解它。當(dāng)你碰到需要保證SQL做正確的事情時(shí),可讀性的優(yōu)勢(shì)就出來(lái)了。SQL語(yǔ)句總是有個(gè)結(jié)果,你對(duì)此毫無(wú)疑問(wèn)。確保你SQL語(yǔ)句容易推理是保證正確性的關(guān)鍵。
原文鏈接:http://www.craigkerstiens.com/2013/11/18/best-postgres-feature-youre-not-using/