Prompt injection detection tools compared: 5 real options
Prompt injection detection tools screen prompts and outputs for attacker instructions using heuristics, classifier models, vector similarity to known attacks, and canary tokens. Real options include Rebuff, LLM Guard, and Vigil (open source) plus Lakera Guard and Prompt Security (commercial). None is complete, so pair detection with least-privilege tool scoping.
Independent SEO consultant & AI practitioner who builds and tests these tools.
Prompt injection detection tools compared: 5 real options
Prompt injection detection tools screen prompts and outputs for attacker instructions using heuristics, classifier models, vector similarity to known attacks, and canary tokens. Real options include Rebuff, LLM Guard, and Vigil on the open-source side, plus Lakera Guard and Prompt Security as commercial platforms. None is complete on its own, so pair any detector with least-privilege tool scoping.
TL;DR:
- Detection is a signal, not a wall. Every tool below states that it cannot stop all prompt injection.
- Open source: Rebuff, LLM Guard, and Vigil. Commercial: Lakera Guard and Prompt Security.
- Approaches overlap: heuristics, fine-tuned classifier models, vector similarity to known attacks, and canary tokens.
- Indirect injection is the hard case. Detectors screen the text you hand them; they do not reason across an agent’s full tool chain.
- New to the concept? Start with prompt injection explained, then return here.
What is prompt injection detection?
Prompt injection detection is the practice of inspecting input, and sometimes model output, for text that tries to override a model’s instructions. The goal is to flag or block an attack before the model acts on it. Because a large language model cannot reliably tell a trusted command from untrusted data, detection tools sit in front of the model as a filter, scoring each prompt for risk.
This maps to LLM01 in the OWASP LLM Top 10 from the OWASP GenAI Security Project. Detection is one of several layers OWASP recommends, not a standalone fix.
What tools detect prompt injection?
Five real tools cover most of the field today. The table compares them by type, detection approach, and best fit, based on public documentation and stated features rather than measured benchmarks.
| Tool | Type | Detection approach | Best for |
|---|---|---|---|
| Rebuff | Open source (Apache-2.0, archived May 2025) | Heuristics, a dedicated LLM check, a vector database of known attacks, and canary tokens | Studying a layered design; note it is now read-only |
| LLM Guard | Open source (MIT) | PromptInjection scanner using the fine-tuned ProtectAI/deberta-v3-base-prompt-injection-v2 classifier, plus many other scanners | Teams wanting a maintained toolkit with input and output scanners |
| Vigil | Open source (Apache-2.0) | Vector database similarity, YARA heuristics, a DeBERTa transformer classifier, prompt-response similarity, and canary tokens | Self-hosting a multi-method scanner as a library or REST API |
| Lakera Guard | Commercial (SaaS REST API) | Hosted detectors for prompt attacks, data leakage, and tool misuse via the lakera-guard model | Teams wanting a managed API with regional endpoints |
| Prompt Security | Commercial (cloud or self-hosted) | Platform-level inspection of prompts, outputs, and agent actions, including an MCP gateway | Enterprises securing employee AI use, homegrown apps, and agents |
How do prompt injection detection tools work?
Most tools combine several of the same building blocks. No single method is reliable alone, so layering is the norm.
Heuristics and rule signatures
Heuristic filters look for known injection phrasing, such as “ignore previous instructions”, before the prompt reaches the model. Rebuff uses a heuristics layer for this, and Vigil applies YARA rules to match common injection signatures. These are cheap and fast, but novel phrasing slips past them easily.
Classifier models
A fine-tuned classifier scores a prompt as injection or not. LLM Guard uses the ProtectAI/deberta-v3-base-prompt-injection-v2 model with a configurable threshold, and Vigil uses a DeBERTa-based transformer for the same job. Classifiers generalise better than rules but still miss attacks unlike their training data.
Vector similarity to known attacks
Both Rebuff and Vigil store embeddings of past attacks in a vector database and flag prompts that sit close to them. This catches variants of seen attacks and can auto-update as new attacks arrive, but it cannot recognise a genuinely new pattern.
Canary tokens
Canary tokens are secret strings placed in the prompt. If the model’s output leaks the token, you know an injection persuaded the model to reveal hidden context. Rebuff and Vigil both support this. It is a detective signal after the fact, not prevention.
Open source versus commercial: which should you choose?
The split is about who maintains the defence, not which is inherently safer.
Open-source tools give you full control, on-premise running, and no per-call cost. LLM Guard and Vigil are both actively documented and self-hostable. The trade-off is that you tune thresholds, update attack data, and keep the models current yourself. Rebuff shows the canonical four-layer design but its repository was archived in May 2025, so treat it as a reference rather than a maintained dependency.
Commercial platforms shift that burden. Lakera Guard provides a hosted REST API with regional endpoints, and Prompt Security offers a broader platform spanning employee AI use, homegrown apps, code assistants, and agentic systems with an MCP gateway. You gain managed updates and support; you take on cost and, for the SaaS path, sending prompts to a third party.
Can detection tools catch indirect prompt injection?
Only partially, and this is the most important honest limit. Detectors screen the text you pass to them, not the full intent of your agent. If you route a retrieved web page or PDF through a scanner, it can flag obvious injection in that content. But indirect injection that arrives mid-chain, inside a tool result the scanner never sees, can still reach the model.
A scanner also cannot judge whether a flagged action is genuinely harmful in context. That judgement belongs to your architecture, which is why detection must sit alongside least-privilege for AI agents and the controls in MCP security best practices. For agents specifically, run through the AI agent hardening checklist so a missed injection cannot trigger a damaging tool call.
The honest verdict
There is no winner, because detection is one layer in a defence-in-depth stack. If you want a maintained open-source scanner, start with LLM Guard or Vigil. If you need a managed service and have budget, evaluate Lakera Guard or Prompt Security. Whatever you pick, assume injection will sometimes succeed and contain the blast radius with least privilege and output gating.
Where to go next
Read prompt injection explained for the underlying definition, then compare the broader AI agent guardrail tools that sit around detection. Browse more in the tools directory and apply the OWASP LLM Top 10 to see how detection connects to the wider risk list.
Frequently asked questions
What are prompt injection detection tools?
They are libraries or services that inspect prompts, and sometimes model outputs, for signs of attacker-controlled instructions. They use techniques like heuristic filters, fine-tuned classifier models, vector similarity to known attacks, and canary tokens to flag or block suspicious input.
What is the best open-source prompt injection detection tool?
There is no single best. LLM Guard offers a maintained DeBERTa classifier and many scanners, Vigil layers a transformer model with YARA rules and a vector database, and Rebuff combines heuristics, an LLM check, a vector database, and canary tokens but was archived in 2025. Choose by what you can run and maintain.
Can these tools catch indirect prompt injection?
Only partially. Most detectors screen the text they are given, so they can flag a poisoned document if you pass that document through them. They do not understand intent across an agent's full tool chain, so indirect injection that arrives through retrieved content can still slip past.
Are commercial prompt injection tools better than open source?
Not automatically. Commercial tools like Lakera Guard and Prompt Security add managed updates, hosted infrastructure, and broader policy controls, which suits teams without security engineers. Open-source tools give you control and no per-call cost but need you to maintain and tune them.
Do I still need other defences if I use a detection tool?
Yes. Detection is one layer and can be bypassed by novel phrasing. You still need instruction and data separation, output gating, and least-privilege tool scoping so that a missed injection cannot trigger a damaging action.