成人免费xxxxx在线视频软件_久久精品久久久_亚洲国产精品久久久_天天色天天色_亚洲人成一区_欧美一级欧美三级在线观看

<nav id="gmgqk"><dl id="gmgqk"></dl></nav>

<button id="gmgqk"><input id="gmgqk"></input></button>

<button id="gmgqk"><input id="gmgqk"></input></button>

<button id="gmgqk"><input id="gmgqk"></input></button>

<dl id="gmgqk"><acronym id="gmgqk"></acronym></dl>

鴻蒙開發者社區

公眾號矩陣

移動端

視頻課免費課排行榜短視頻直播課軟考學堂

全部課程軟考信創認證華為認證廠商認證 IT技術 PMP項目管理免費題庫

文章資源問答課堂專欄直播

51CTO

鴻蒙開發者社區

51CTO技術棧

51CTO官微

51CTO學堂

51CTO博客

CTO訓練營

鴻蒙開發者社區訂閱號

51CTO軟考

51CTO學堂APP

51CTO學堂企業版APP

鴻蒙開發者社區視頻號

51CTO軟考題庫

AI.x社區

登錄/注冊
51CTO

中國優質的IT技術網站

51CTO博客

專業IT技術創作平臺

51CTO學堂

IT職業在線教育平臺

數據分析自動化：LIDA智能可視化的魔法！原創

發布于 2024-11-1 09:09

瀏覽

0收藏

01 概述

在這個數據驅動的時代，我們每天都在產生和處理海量的數據。如何從這些數據中提取有價值的信息，并以一種直觀、易于理解的方式呈現，成為了一個重要的課題。今天，給大家介紹一個強大的工具——Language-Integrated Data Analysis（LIDA），它能夠自動化地創建可視化圖表，讓數據洞察變得觸手可及。

數據分析自動化：LIDA智能可視化的魔法！-AI.x社區

02 LIDA的核心特性

語法無關的可視化

無論你是Python、R還是C++的開發者，LIDA都能幫助你產出視覺輸出，而無需鎖定在特定的編程語言中。這種靈活性讓來自不同編程背景的用戶都能輕松上手。

多階段生成流程

LIDA通過一個無縫的工作流程，從數據總結到可視化創建，幫助用戶輕松駕馭復雜的數據集。

混合用戶界面

LIDA提供了直接操作和多語言自然語言界面的選項，使得從數據科學家到商業分析師的廣泛受眾都能輕松使用。用戶可以通過自然語言命令進行交互，使數據可視化變得直觀而簡單。

03 LIDA的架構

LIDA的架構包括以下幾個關鍵組件：

Summarizer：將數據集轉換為簡潔的自然語言描述，包括所有列名、分布等信息。
GOAL Explorer：基于數據集識別潛在的可視化或分析目標，并生成用戶指定數量的目標。
Viz Generator：根據數據集上下文和指定目標自動生成創建可視化的代碼。
Infographer：創建、評估、完善并執行可視化代碼，以產生完全風格化的規范。

數據分析自動化：LIDA智能可視化的魔法！-AI.x社區

04 LIDA的主要特點

數據總結：LIDA將大型數據集壓縮成密集的自然語言摘要，作為未來操作的基礎。
自動化數據探索：LIDA提供了一個完全自動化的模式，用于基于不熟悉的數據集生成有意義的可視化目標。
信息圖表生成：使用圖像生成模型將數據轉換為風格化的、吸引人的信息圖表，用于個性化的故事講述。
VizOps – 可視化操作：對生成的可視化進行詳細操作，增強可訪問性、數據素養和調試。
可視化解釋：提供可視化代碼的深入描述，幫助無障礙使用、教育和理解。
自我評估：使用大型語言模型（LLMs）根據最佳實踐為可視化生成多維評估分數。
可視化修復：使用自我評估或用戶提供的反饋自動改進或修復可視化。
可視化推薦：根據上下文或現有可視化推薦額外的可視化，以便比較或增加視角。

數據分析自動化：LIDA智能可視化的魔法！-AI.x社區

05 LIDA實戰

安裝

使用pip安裝：

pip install lida

# 設定對應的api keyexport OPENAI_API_KEY=<API_KEY>

也可以.env來進行api key管理：

from dotenv import load_env 
import os load_dotenv() 

# read the .env file 
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

LIDA 功能詳解

初始化

from lida import Manager, TextGenerationConfig , llm 
from lida.utils import plot_raster 
import warnings
from dotenv import load_dotenv
import os

load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

warnings.filterwarnings("ignore")

# 初始化 LIDA
lida = Manager(text_gen = llm("openai", api_key=str(OPENAI_API_KEY))) # !! input your openai or other LLM api key
textgen_config = TextGenerationConfig(n=1, temperature=0.5, model="gpt-3.5-turbo-0301", use_cache=True)

lida.Manager 是 LIDA Lib 中的 Controller，負責設置 LLM 的類型；而 lida.TextGenerationConfig 則是對生成內容的詳細設置，包括生成次數 n、生成參數溫度的變化程度、模型和 use_cache 都在這里設置。

導入數據

import pandas as pd  
# 資料目前是使用官方推薦的資料集 

cars data = pd.read_csv("<https://raw.githubusercontent.com/uwdata/draco/master/data/cars.csv>") data.head()

數據分析自動化：LIDA智能可視化的魔法！-AI.x社區

數據摘要

從數據集生成簡要摘要；內容分別為每個專欄的std, min, max, samples, unique, semantic_type和description

# 數據摘要：從資料集生成簡短摘要
summary = lida.summarize( "https://raw.githubusercontent.com/uwdata/draco/master/data/cars.csv" , summary_method= "default" , textgen_cnotallow=textgen_config)   

print (summary)

數據分析自動化：LIDA智能可視化的魔法！-AI.x社區

目標生成

根據資料摘要輸出，包括Index, Question, Visualizations 和Rationale。

# 目標生成：根據資料摘要生成視覺化圖表的目標, n=3 表示生成3 個目標
goals = lida.goals(summary, n= 3 , textgen_cnotallow=textgen_config) 

# 查看目前要生成的目標
for goal in goals: 
    print ( "=" * 20 ) 
    print ( f"Question: {goal.index} " ) 
    # print the question, visualization and rationale with each goal 
    print (goal.question) 
    print (goal.visualization) 
    print (goal.rationale) 

```輸出結果
==================== 
Question: 0
 What is the distribution of Retail_Price? 
histogram of Retail_Price 
This tells about the spread of prices of cars in the dataset . 
==================== 
Question: 1
 What is the distribution of Engine_Size__l_ among different car types? 
box plot of Engine_Size__l_ for each car type
 This will help  in identifying if there is  any difference in engine size among different car types. 
==================== 
Question: 2
 What is the relationship between Horsepower_HP_ and City_Miles_Per_Gallon? 
scatter plot of Horsepower_HP_ vs City_Miles_Per_Gallon 
This will help  in identifying if there is  any correlation between horsepower and fuel efficiency of cars.

生成可視化圖表

根據Goal 的visualization 建議自動生成圖表。

library = "matplotlib"  # 可選"altair", "seaborn", "plotly", "matplotlib"

 textgen_config = TextGenerationConfig(n= 1 , temperature= 0.2 , use_cache= True ) 
for i in  range ( len (goals)): 
    # print the question, visualization and rationale with each goal 
    print ( "Question: " , goals[i].question) 
    print ( "Visualization: " , goals[i].visualization) 
    print ( "Rationale: " , goals[i] .rationale) 
    charts = lida.visualize(summary=summary, goal=goals[i], textgen_cnotallow=textgen_config, library=library) 
    plot_raster(charts[ 0 ].raster)

數據分析自動化：LIDA智能可視化的魔法！-AI.x社區

圖表編輯

使用自然語言（NLP）編輯圖表，例如顏色、字的大小甚至字型等等。（這個在寫論文或研究報告時感覺很實用XD ）

# 改變圖表顏色和字體大小
instructions = [ "change the color to red " , "scale the word size to 50%" ] 

edited_charts = lida.edit(code=charts[ 0 ].code, summary=summary, instructinotallow=instructions ) 
plot_raster(edited_charts[ 0 ].raster)

數據分析自動化：LIDA智能可視化的魔法！-AI.x社區

視覺化圖表解釋

code = charts[ 0 ].code 
explanations = lida.explain(code=code, library=library, textgen_cnotallow=textgen_config) 

for row in explanations[ 0 ]: 
    print (row[ "section" ], " ** " , row[ "explanation" ]) 
    
# 輸出結果
accessibility ** The code creates a scatter plot using the matplotlib.pyplot library to visualize the relationship between two variables - Horsepower_HP_ and City_Miles_Per_Gallon. The plot is colored blue with an alpha value of 0.5 to show the density of the data points. The x-axis is labeled 'Horsepower_HP_'  and the y-axis is labeled 'City_Miles_Per_Gallon' . The title of the plot is  'What is the relationship between Horsepower_HP_ and City_Miles_Per_Gallon?' . 
transformation ** There is no data transformation happening in this code. The plot is made using the original data as it is . 
visualization ** The code first imports the required libraries - matplotlib.pyplot and pandas. The function plot() takes a pandas DataFrame as  input  and creates a scatter plot using the plt.scatter() method. The x-axis of the plot is the 'Horsepower_HP_' column of the input DataFrame and the y-axis is the 'City_Miles_Per_Gallon' column of the input DataFrame. The alpha parameter controls the transparency of the data points and the color parameter sets the color of the data points. The plt.xlabel() and plt.ylabel() methods add labels to the x-axis and y-axis respectively. The plt.title() method adds a title to the plot. The wrap parameter in plt.title() is  set to True to wrap the title text if it exceeds the width of the plot. Finally, the function returns the plot object .

可視化評估和修復

評估視覺化圖表是否存在問題，評分標準包括：Bug 錯誤, Transformation 轉換程度, Compliance 合規性, type 圖表類別, encoding 編碼方式和aesthetics 美觀程度；令人最意外的居然可以評估美觀程度XDD

evaluations = lida.evaluate(code=code, goal=goals[i], library=library)[ 0 ] 
for  eval  in evaluations: 
    print ( eval [ "dimension" ], "Score" , eval [ "score" ], " / 10" ) 
    print ( "\t" , eval [ "rationale" ][: 200 ]) 
    print ( "\t*********************** ***********" ) 

# 輸出結果
bugs Score 10 / 10
   No bugs, syntax errors, or typos found. 
***************** ***************** 
transformation Score 10 / 10
   No data transformation needed for a scatter plot. 
******************* *************** 
compliance Score 8 / 10
   The code meets the specified visualization goal, but the title could be improved by removing the question mark and rephrasing it as a statement. 
**** ****************************** 
type Score 9 / 10
   A scatter plot is an appropriate visualization type  for exploring the relationship between two continuous variables. 
********************************** 
encoding Score 9 / 10
   The data is encoded appropriately with Horsepower_HP_ on the x-axis and City_Miles_Per_Gallon on the y-axis. 
********************************** 
aesthetics Score 9 / 10
   The aesthetics of the visualization are appropriate with a blue color and an alpha of 0.5 to show overlapping points. ***************************** *****

可視化圖表推薦

針對Summary 的上下文生成對應數量、由LLM 判斷的推薦圖表。

textgen_config = TextGenerationConfig(n= 1 , temperature= 0 , use_cache= True ) 
recommended_charts = lida.recommend(code=code, summary=summary, n= 3 , textgen_cnotallow=textgen_config) 

print ( f"Recommended { len (recommended_charts)} charts " ) 
for chart in recommended_charts: 
    plot_raster(chart.raster) 
    pass

數據分析自動化：LIDA智能可視化的魔法！-AI.x社區

個性化圖表生成

數據分析自動化：LIDA智能可視化的魔法！-AI.x社區

# 先繼承class 'lida.datamodel.Goal' 
from lida.datamodel import Goal 

# datamodel 總共有4 個object，分別是index, question, visualization and rationale
 custom_goal = Goal( 
    index= 0 , 
    questinotallow= "What is the distribution of the Type?" , 
    visualizatinotallow= "Bar Chart" , 
    ratinotallow= "The type of the car is an important feature of the dataset."
 ) 
# 生成圖表
custom_chart = lida.visualize(summary=summary, goal=custom_goal, textgen_cnotallow=textgen_config , library=library) 
plot_raster(custom_chart[ 0 ].raster) 
# 編輯客制化生成圖表
custom_instructions = [ "change the color to blue tone on tone color" ] # 改變Bar Chart 的顏色
edited_custom_charts = lida.edit(code= custom_chart[ 0 ].code, summary=summary, instructinotallow=custom_instructions) 
plot_raster(edited_custom_charts[ 0 ].raster)

Web UI

目前LIDA 官方有推出一個Web UI 可以讓大家上傳自己的資料進行分析，使用方法如下：?

pip install lida 

export OPENAI_API_KEY=<your key> 

lida ui --port=8080 --docs

數據分析自動化：LIDA智能可視化的魔法！-AI.x社區

！！注意事項：

資料集大小：LIDA 目前適合小規模的資料集，因為目前LLM 沒法處理太長的文章（Token 長度）。
LLM 選擇：LIDA 與GPT 3.5, GPT 4，最為相容，因為Summary 維度較高的資料和進行推理時還是需要比較大的LLM 才有較好的成效。
可靠性：論文中顯示錯誤率低于3.5%、但在輸出圖表還是反覆檢查一下結果是否合理。

參考：

??https://github.com/microsoft/lida??
??https://microsoft.github.io/lida/??

本文轉載自公眾號Halo咯咯作者：基咯咯

原文鏈接： ??https://mp.weixin.qq.com/s/smeYr8cUi3yqXYm4jBz7Wg???

?著作權歸作者所有，如需轉載，請注明出處，否則將追究法律責任

標簽

贊

收藏

回復

舉報

回復

相關推薦

激發大語言模型空間推理能力：思維可視化提示

AIGC最前線 ? 5399瀏覽 ? 0回復
AI研發革命：API可視化測試新體驗

ermulong ? 2710瀏覽 ? 0回復
如何構建終極的AI自動化系統：多代理協作指南

ermulong ? 3483瀏覽 ? 0回復
OpenDevin自動化代碼生成工具評述

zhcs333 ? 3605瀏覽 ? 0回復
OpenDevin自動化代碼生成工具評述

zhcs333 ? 6375瀏覽 ? 0回復
RePrompt：提示詞自動化優化策略

大語言模型論文跟蹤 ? 4884瀏覽 ? 0回復
基于LangGraph多智能體技術，搭建AI寫作自動化系統

小虎哦哦 ? 4469瀏覽 ? 0回復
SHAP 模型可視化 + 參數搜索策略在軸承故障診斷中的應用

Tang_Lan ? 3348瀏覽 ? 0回復
「模型量化技術」可視化指南：A Visual Guide to Quantization

Baihai_IDP ? 3562瀏覽 ? 0回復
使用TAG和RAG實現摘要和標簽的自動化來簡化客戶反饋分析

51CTO內容精選 ? 2737瀏覽 ? 0回復
借助LLM實現模型選擇和試驗自動化

51CTO內容精選 ? 2436瀏覽 ? 0回復
聊聊基于 LSTM 的多特征序列預測-SHAP可視化！

Tang_Lan ? 6957瀏覽 ? 0回復
「混合專家模型」可視化指南：A Visual Guide to MoE

Baihai_IDP ? 3559瀏覽 ? 0回復
我們一起聊聊軸承故障特征—SHAP 模型 3D 可視化

Tang_Lan ? 2233瀏覽 ? 0回復
那就在本地部署更好用 Mac和PC都能可視化

數字化助推器 ? 3408瀏覽 ? 0回復
Dify 搭建私有數據可視化智能體，效果直逼 ChatGPT

九歌AI大模型 ? 3660瀏覽 ? 0回復
綜述：基于LLM的數據查詢與可視化

AIGC前沿技術追蹤 ? 798瀏覽 ? 0回復
【一文讀懂】機器人流程自動化（RPA）和智能自動化（IA）

碼農隨心筆記 ? 836瀏覽 ? 0回復
對Transformer中位置編碼的可視化理解

柏企閱文 ? 809瀏覽 ? 0回復

這個用戶很懶，還沒有個人簡介

帖子

聲望

粉絲

關注

最近發布

熱門推薦

LLaMA 4深度解析：多模態、長文本與高效推理，AI模型的“全能戰士”誕生了！ 0回復

2025年必備的八種AI模型：別再把所有AI都叫LLM了！ 0回復

AI Agent面試寶典：30個核心問題及答案，讓你在面試中脫穎而出 0回復

AI Agents開源工具棧全解析~ 1回復

從原理到調參，小白也能讀懂的大模型微調LoRA，不懂線性代數也沒問題 0回復

上一篇： Transformers.js v3 發布：為瀏覽器中的機器學習帶來強大的功能與靈活性

下一篇：超級新星降臨：Arcee AI發布SuperNova-Medius，14億參數的小模型，大作為！

社區精華內容

目錄

主站蜘蛛池模板：自拍偷拍第一页 | 久久精品国产一区二区电影 | 一区二区av| av在线天堂网 | 国产精品久久 | 在线一区 | 91精品国产91久久久久久最新 | 密乳av | 欧美日韩黄色一级片 | 天天干com | 国产精品国产 | 亚洲成人av | 日韩欧美在线不卡 | 国产精品久久久久久一区二区三区 | 又黑又粗又长的欧美一区 | 在线免费亚洲视频 | 国产婷婷色综合av蜜臀av | a免费视频| 精品国产乱码一区二区三 | 国产清纯白嫩初高生视频在线观看 | 天天干天天玩天天操 | 97精品国产一区二区三区 | 欧美中文字幕在线观看 | 国产视频一区在线 | 黑人巨大精品 | 国内精品视频在线 | 福利片在线看 | 老司机精品福利视频 | 成人午夜在线观看 | 在线看无码的免费网站 | 久久久久国产一区二区三区 | 日韩中文字幕在线观看视频 | 久久午夜精品 | 国产精品伦一区二区三级视频 | 亚洲人精品午夜 | 九九亚洲 | 久久国产免费 | 久草在线 | 狠狠色狠狠色综合日日92 | 国产高清一区二区三区 | 一区二区三区四区五区在线视频 |

<code id="cywkk"><acronym id="cywkk"></acronym></code><button id="cywkk"><tbody id="cywkk"></tbody></button>

<button id="cywkk"></button><rt id="cywkk"><tr id="cywkk"></tr></rt>

<center id="cywkk"><acronym id="cywkk"></acronym></center>