Prompt Injection Scanner
Scan user input for common prompt-injection and jailbreak patterns before passing it to an LLM. Catches instruction overrides ("ignore previous instructions..."), role-play exploits ("you are DAN"), system-prompt extraction attempts, encoded payloads, and other known patterns. Pattern-based — not foolproof, but cheap defense-in-depth.
How to use the Prompt Injection Scanner
Paste a user message (or any content destined for an LLM prompt). The scanner runs ~30 known prompt-injection patterns and reports matches with severity. Use as one layer of defense — combine with strict system prompts, output filtering, and tool-use allowlists.
What this catches (and what it misses)
Pattern-based detection catches the obvious attacks: "ignore previous instructions," common jailbreak preambles (DAN, Developer Mode), Markdown injection, system-prompt extraction patterns, base64-encoded instructions. It misses novel or obfuscated attacks — adversaries can rephrase the same intent in unlimited ways.
For production: use this as the first cheap filter, then add structural mitigations (don't pass tool-call outputs back to the same model that decides what tools to call, sanitize URLs the model returns, never eval model output). Comprehensive surveys live at llm-attacks.org and Anthropic's responsible-scaling docs.