如何利用Dify實(shí)現(xiàn)問答系統(tǒng)的高效內(nèi)容審查？含源碼解析與實(shí)戰(zhàn)優(yōu)化指南原創(chuàng)

發(fā)布于 2025-6-13 09:36

瀏覽

0收藏

背景

在當(dāng)今AI應(yīng)用蓬勃發(fā)展的時(shí)代，內(nèi)容安全與合規(guī)性已成為開發(fā)者不可忽視的重要環(huán)節(jié)。比如用戶在客服場(chǎng)景中，可以通過敏感詞審查過濾用戶的辱罵性語言，并返回預(yù)設(shè)的禮貌回復(fù)。

Dify作為一款開源的大語言模型應(yīng)用開發(fā)平臺(tái)，其內(nèi)置的敏感詞審查機(jī)制為開發(fā)者提供了強(qiáng)大的內(nèi)容安全保障。本文將深入解析Dify的敏感詞審查模塊(moderation)的工作原理，并通過源碼分析揭示其實(shí)現(xiàn)細(xì)節(jié)，幫助開發(fā)者更好地理解和應(yīng)用這一功能。

dify 如何開啟敏感詞審查

要開啟敏感詞審查，需要在右下側(cè)的功能管理界面進(jìn)行開啟：

如何利用Dify實(shí)現(xiàn)問答系統(tǒng)的高效內(nèi)容審查？含源碼解析與實(shí)戰(zhàn)優(yōu)化指南-AI.x社區(qū)

dify 提供了三種敏感詞審查的策略：

通過openai 的 moderation(審核) 模型
通過關(guān)鍵詞
通過自定義的api擴(kuò)展

審查可以配置兩個(gè)維度：

審查用戶輸入
審查模型輸出

調(diào)用 OpenAI Moderation API

OpenAI 和大多數(shù) LLM 公司提供的模型，都帶有內(nèi)容審查功能，確保不會(huì)輸出包含有爭(zhēng)議的內(nèi)容，比如暴力，性和非法行為。

from openai import OpenAI
client = OpenAI()

response = client.moderations.create(
    model="omni-moderation-latest",
    input="...text to classify goes here...",
)

print(response)

下面是一個(gè)完整的輸出示例，其中輸入是來自戰(zhàn)爭(zhēng)電影的單個(gè)幀的圖像。該模型正確預(yù)測(cè)了圖像中的暴力指標(biāo)，暴力類別得分大于0.8：

{
  "id": "modr-970d409ef3bef3b70c73d8232df86e7d",
  "model": "omni-moderation-latest",
  "results": [
    {
      "flagged": true,
      "categories": {
        "sexual": false,
        "sexual/minors": false,
        "harassment": false,
        "harassment/threatening": false,
        "hate": false,
        "hate/threatening": false,
        "illicit": false,
        "illicit/violent": false,
        "self-harm": false,
        "self-harm/intent": false,
        "self-harm/instructions": false,
        "violence": true,
        "violence/graphic": false
      },
      "category_scores": {
        "sexual": 2.34135824776394e-7,
        "sexual/minors": 1.6346470245419304e-7,
        "harassment": 0.0011643905680426018,
        "harassment/threatening": 0.0022121340080906377,
        "hate": 3.1999824407395835e-7,
        "hate/threatening": 2.4923252458203563e-7,
        "illicit": 0.0005227032493135171,
        "illicit/violent": 3.682979260160596e-7,
        "self-harm": 0.0011175734280627694,
        "self-harm/intent": 0.0006264858507989037,
        "self-harm/instructions": 7.368592981140821e-8,
        "violence": 0.8599265510337075,
        "violence/graphic": 0.37701736389561064
      },
      "category_applied_input_types": {
        "sexual": [
          "image"
        ],
        "sexual/minors": [],
        "harassment": [],
        "harassment/threatening": [],
        "hate": [],
        "hate/threatening": [],
        "illicit": [],
        "illicit/violent": [],
        "self-harm": [
          "image"
        ],
        "self-harm/intent": [
          "image"
        ],
        "self-harm/instructions": [
          "image"
        ],
        "violence": [
          "image"
        ],
        "violence/graphic": [
          "image"
        ]
      }
    }
  ]
}

在dify 中，可以選擇審查輸入或者輸出內(nèi)容，當(dāng)審查被判斷為不通過時(shí)就會(huì)你設(shè)置的輸出預(yù)設(shè)的回復(fù)內(nèi)容。

如何利用Dify實(shí)現(xiàn)問答系統(tǒng)的高效內(nèi)容審查？含源碼解析與實(shí)戰(zhàn)優(yōu)化指南-AI.x社區(qū)

OpenAI Moderation API

自定義關(guān)鍵詞

開發(fā)者可以自定義需要審查的敏感詞，比如把“kill”作為關(guān)鍵詞，在用戶輸入的時(shí)候作審核動(dòng)作，要求預(yù)設(shè)回復(fù)內(nèi)容為“The content is violating usage policies.”可以預(yù)見的結(jié)果是當(dāng)用戶在終端輸入包含“kill”的語料片段，就會(huì)觸發(fā)敏感詞審查工具，返回預(yù)設(shè)回復(fù)內(nèi)容。

如何利用Dify實(shí)現(xiàn)問答系統(tǒng)的高效內(nèi)容審查？含源碼解析與實(shí)戰(zhàn)優(yōu)化指南-AI.x社區(qū)

Keywords

自定義拓展

不同的企業(yè)內(nèi)部往往有著不同的敏感詞審查機(jī)制，企業(yè)在開發(fā)自己的 AI 應(yīng)用如企業(yè)內(nèi)部知識(shí)庫(kù) ChatBot，需要對(duì)員工輸入的查詢內(nèi)容作敏感詞審查。為此，開發(fā)者可以根據(jù)自己企業(yè)內(nèi)部的敏感詞審查機(jī)制寫一個(gè) API 擴(kuò)展，具體可參考敏感內(nèi)容審查，從而在 Dify 上調(diào)用，實(shí)現(xiàn)敏感詞審查的高度自定義和隱私保護(hù)。

如何利用Dify實(shí)現(xiàn)問答系統(tǒng)的高效內(nèi)容審查？含源碼解析與實(shí)戰(zhàn)優(yōu)化指南-AI.x社區(qū)

Moderation Settings

dify敏感詞審查模塊解析

Dify的moderation模塊采用工廠模式設(shè)計(jì)，提供了靈活多樣的審核策略，開發(fā)者可以根據(jù)實(shí)際需求定制審核流程。整個(gè)模塊的核心架構(gòu)如下：

ModerationFactory - 審核工廠類

作為模塊的核心入口，ModerationFactory負(fù)責(zé)根據(jù)配置創(chuàng)建具體的審核實(shí)例。通過工廠模式，Dify實(shí)現(xiàn)了審核策略的靈活切換和擴(kuò)展。

class ModerationFactory:
    __extension_instance: Moderation

    def __init__(self, name: str, app_id: str, tenant_id: str, config: dict) -> None:
        extension_class = code_based_extension.extension_class(ExtensionModule.MODERATION, name)
        self.__extension_instance = extension_class(app_id, tenant_id, config)
s

    def moderation_for_inputs(self, inputs: dict, query: str = "") -> ModerationInputsResult:
        """
        Moderation for inputs.
        After the user inputs, this method will be called to perform sensitive content review
        on the user inputs and return the processed results.

        :param inputs: user inputs
        :param query: query string (required in chat app)
        :return:
        """
        return self.__extension_instance.moderation_for_inputs(inputs, query)

    def moderation_for_outputs(self, text: str) -> ModerationOutputsResult:
        """
        Moderation for outputs.
        When LLM outputs content, the front end will pass the output content (may be segmented)
        to this method for sensitive content review, and the output content will be shielded if the review fails.

        :param text: LLM output content
        :return:
        """
        return self.__extension_instance.moderation_for_outputs(text)

工廠類主要提供兩個(gè)核心方法：

??moderation_for_inputs()??：執(zhí)行輸入內(nèi)容審核
??moderation_for_outputs()??：執(zhí)行輸出內(nèi)容審核

Moderation - 審核基類

作為所有具體審核類的基類，Moderation定義了審核的基本規(guī)范和通用邏輯

class Moderation(Extensible, ABC):
    module: ExtensionModule = ExtensionModule.MODERATION

    @abstractmethod
    def moderation_for_inputs(self, inputs: dict, query: str = "") -> ModerationInputsResult:
        raise NotImplementedError

    @abstractmethod
    def moderation_for_outputs(self, text: str) -> ModerationOutputsResult:
        raise NotImplementedError

基類強(qiáng)制所有子類必須實(shí)現(xiàn)輸入輸出審核方法，確保了審核接口的一致性。

Moderation 實(shí)現(xiàn)類

KeywordsModeration - 關(guān)鍵詞審核類

這是Dify內(nèi)置的本地敏感詞審查方案，通過匹配預(yù)設(shè)的敏感關(guān)鍵詞來檢測(cè)內(nèi)容是否違規(guī)：

class KeywordsModeration(Moderation):
    # 定義此審核類型的名稱
    name: str = "keywords"

    @classmethod
    def validate_config(cls, tenant_id: str, config: dict) -> None:
        """
        驗(yàn)證關(guān)鍵詞審核的配置數(shù)據(jù)

        確保配置具有正確的結(jié)構(gòu)并滿足要求

        參數(shù):
            tenant_id (str): 工作區(qū)/租戶ID
            config (dict): 要驗(yàn)證的配置數(shù)據(jù)

        異常:
            ValueError: 如果任何驗(yàn)證檢查失敗
        """
        # 首先驗(yàn)證基本的輸入/輸出配置結(jié)構(gòu)
        cls._validate_inputs_and_outputs_config(config, True)

        # 檢查配置中是否提供了關(guān)鍵詞
        if not config.get("keywords"):
            raise ValueError("keywords is required")

        # 驗(yàn)證關(guān)鍵詞字符串的總長(zhǎng)度
        if len(config.get("keywords", [])) > 10000:
            raise ValueError("keywords length must be less than 10000")

        # 按換行符分割關(guān)鍵詞并驗(yàn)證行數(shù)
        keywords_row_len = config["keywords"].split("\n")
        if len(keywords_row_len) > 100:
            raise ValueError("the number of rows for the keywords must be less than 100")

    def moderation_for_inputs(self, inputs: dict, query: str = "") -> ModerationInputsResult:
        flagged = False
        preset_response = ""

        if self.config is None:
            raise ValueError("The config is not set.")

        # 僅在配置中啟用輸入審核時(shí)繼續(xù)
        if self.config["inputs_config"]["enabled"]:
            # 獲取觸發(fā)審核時(shí)要使用的預(yù)設(shè)響應(yīng)
            preset_response = self.config["inputs_config"]["preset_response"]

            # 如果提供了查詢，將其添加到具有特殊鍵的輸入中
            if query:
                inputs["query__"] = query

            # 處理關(guān)鍵詞 - 按換行符分割并過濾空條目
            keywords_list = [keyword for keyword in self.config["keywords"].split("\n") if keyword]

            # 檢查是否有任何輸入值違反關(guān)鍵詞
            flagged = self._is_violated(inputs, keywords_list)

        # 返回審核結(jié)果
        return ModerationInputsResult(
            flagged=flagged, 
            actinotallow=ModerationAction.DIRECT_OUTPUT, 
            preset_respnotallow=preset_response
        )

    def moderation_for_outputs(self, text: str) -> ModerationOutputsResult:
        flagged = False
        preset_response = ""

        if self.config is None:
            raise ValueError("The config is not set.")

        # 僅在配置中啟用輸出審核時(shí)繼續(xù)
        if self.config["outputs_config"]["enabled"]:
            # 處理關(guān)鍵詞 - 按換行符分割并過濾空條目
            keywords_list = [keyword for keyword in self.config["keywords"].split("\n") if keyword]

            # 檢查文本是否違反任何關(guān)鍵詞（包裝在字典中以保持一致性）
            flagged = self._is_violated({"text": text}, keywords_list)

            # 獲取觸發(fā)審核時(shí)要使用的預(yù)設(shè)響應(yīng)
            preset_response = self.config["outputs_config"]["preset_response"]

        # 返回審核結(jié)果
        return ModerationOutputsResult(
            flagged=flagged, 
            actinotallow=ModerationAction.DIRECT_OUTPUT, 
            preset_respnotallow=preset_response
        )

    def _is_violated(self, inputs: dict, keywords_list: list) -> bool:
        """
        檢查任何輸入值是否包含禁止的關(guān)鍵詞

        參數(shù):
            inputs (dict): 要檢查的輸入數(shù)據(jù)
            keywords_list (list): 禁止的關(guān)鍵詞列表

        返回:
            bool: 如果在任何輸入值中找到任何關(guān)鍵詞則為True，否則為False
        """
        # 檢查每個(gè)輸入值是否包含關(guān)鍵詞
        return any(self._check_keywords_in_value(keywords_list, value) for value in inputs.values())

    def _check_keywords_in_value(self, keywords_list: Sequence[str], value: Any) -> bool:
        """
        檢查單個(gè)值中是否存在任何關(guān)鍵詞

        通過將值和關(guān)鍵詞轉(zhuǎn)換為小寫來執(zhí)行不區(qū)分大小寫的比較

        參數(shù):
            keywords_list (Sequence[str]): 禁止的關(guān)鍵詞列表
            value (Any): 要檢查的值（轉(zhuǎn)換為字符串）

        返回:
            bool: 如果在值中找到任何關(guān)鍵詞則為True，否則為False
        """
        # 將值轉(zhuǎn)換為字符串并檢查每個(gè)關(guān)鍵詞（不區(qū)分大小寫）
        return any(keyword.lower() in str(value).lower() for keyword in keywords_list)

關(guān)鍵詞審核的特點(diǎn)：

完全本地化運(yùn)行：不依賴外部服務(wù)，隱私性好
響應(yīng)速度快：簡(jiǎn)單的字符串匹配，性能高效

OpenAIModeration - OpenAI審核類

對(duì)于需要更智能識(shí)別的場(chǎng)景，Dify集成了OpenAI的內(nèi)容審核API :

class OpenAIModeration(Moderation):
    name: str = "openai_moderation"

    def _is_violated(self, inputs: dict):
        text = "\n".join(str(inputs.values()))
        model_manager = ModelManager()
        model_instance = model_manager.get_model_instance(
            tenant_id=self.tenant_id, 
            provider="openai", 
            model_type=ModelType.MODERATION, 
            model="text-moderation-stable"
        )
        return model_instance.invoke_moderation(text=text)

OpenAI審核的優(yōu)勢(shì)：

語義理解：能識(shí)別變體、諧音等復(fù)雜形式的違規(guī)內(nèi)容
多維度檢測(cè)：可識(shí)別仇恨、暴力、色情、自殘等多類違規(guī)
多語言支持：支持多種語言的敏感內(nèi)容識(shí)別

總結(jié)

在實(shí)際應(yīng)用場(chǎng)景中，我傾向于采用一套精細(xì)化的審查優(yōu)化策略。這套策略是經(jīng)過多次實(shí)踐和反饋調(diào)整而得來的，它既考慮到了效率問題，又兼顧到了準(zhǔn)確性和靈活性。

多級(jí)審核策略：結(jié)合關(guān)鍵詞匹配和AI審核，先本地快速過濾，再AI深度分析
上下文感知：對(duì)于某些專業(yè)場(chǎng)景，配置上下文相關(guān)的敏感詞白名單。通過??{{variable}}??語法注入會(huì)話變量（如用戶角色、領(lǐng)域標(biāo)簽），這些變量可被審查模塊調(diào)用，實(shí)現(xiàn)動(dòng)態(tài)規(guī)則切換。
設(shè)置分?jǐn)?shù)閾值（如??score_threshold=0.8??），避免誤判近似詞

本文轉(zhuǎn)載自???AI 博物院??? 作者：longyunfeigu

?著作權(quán)歸作者所有，如需轉(zhuǎn)載，請(qǐng)注明出處，否則將追究法律責(zé)任

標(biāo)簽

Dify

贊

回復(fù)