LLM 系统提示安全措施(概念性)

Author:Tao Hu
2026/01/05 09:12

Description

阐述LLM系统提示的安全防护措施和对抗鲁棒性机制

Tags

Knowledge Q&AExplainImage Generation

Content

提示注入内容分类器 — 专有的机器学习模型,用于检测各种数据格式中的恶意提示和指令。

安全思维强化 — 围绕提示内容添加的有针对性的安全指令。这些指令提醒 LLM(大型语言模型)执行用户指示的任务并忽略对抗性指令。

Markdown 清理和可疑 URL 匿名化 — 使用 Google 安全浏览识别并匿名化外部图片 URL 和可疑链接,以防止基于 URL 的攻击和数据泄露。

用户确认框架 — 一种情境系统,要求用户对潜在风险操作(例如删除日历事件)进行明确确认。

终端用户安全缓解通知 — 当检测到并缓解安全问题时,向用户提供的情境信息。这些通知鼓励用户通过专门的帮助中心文章了解更多信息。

模型弹性 — Gemini 模型的对抗鲁棒性,可保护它们免受明确的恶意操纵。

---

**Original English:**
Prompt injection content classifiers—Proprietary machine-learning models that detect malicious prompts and instructions within various data formats.

Security thought reinforcement—Targeted security instructions that are added around the prompt content. These instructions remind the LLM (large language model) to perform the user-directed task and ignore adversarial instructions.

Markdown sanitization and suspicious URL redaction—Identifying and redacting external image URLs and suspicious links using Google Safe Browsing to prevent URL-based attacks and data exfiltration.

User confirmation framework—A contextual system that requires explicit user confirmation for potentially risky operations, such as deleting calendar events.

End-user security mitigation notifications—Contextual information provided to users when security issues are detected and mitigated. These notifications encourage users to learn more via dedicated help center articles.

Model resilience—The adversarial robustness of Gemini models, which protects them from explicit malicious manipulation.