_ _ ___ ____ _____ ____ _ _ _ ____ ____ _____ ____ | \ | |/ _ \| _ \| ____/ ___| | | | / \ | _ \| _ \| ____| _ \ | \| | | | | | | | _|| | _| | | |/ _ \ | |_) | | | | _| | |_) | | |\ | | | | |_| | |__| |_| | |_| / ___ \| _ <| |_| | |___| _ < |_| \_|\___/|____/|_____\____|\___/_/ \_\_| \_\____/|_____|_| \_\
Every prompt passes through three detection layers before it reaches the LLM.
--- detection pipeline -------------------------------------------------
NodeGuarder inspects each prompt in three stages:
1. Regex (built-in) — 3 categories, ~15 patterns
Matches API keys, database connection strings, and PII
with specific redaction tags (
Runtime: ~2 ms.
2. ATR Community Rules — 652 rules across 7 categories
Detects prompt injection, code execution, social engineering,
skill compromise, excessive autonomy, model abuse, data poisoning.
Rules auto-update from the community registry every 7 days.
3. Semantic Verification — DeBERTa-v3 (184M params)
An ONNX model that confirms each flag before action is taken.
The false-positive marker system overturns docs examples,
tutorial code, placeholders, and security discussions.
If a flag survives all three checks, the configured action mode takes effect (modal, auto-redact, or auto-block).
1. Regex (built-in) — 3 categories, ~15 patterns
Matches API keys, database connection strings, and PII
with specific redaction tags (
[REDACTED_AWS_KEY], etc.).
Runtime: ~2 ms.
2. ATR Community Rules — 652 rules across 7 categories
Detects prompt injection, code execution, social engineering,
skill compromise, excessive autonomy, model abuse, data poisoning.
Rules auto-update from the community registry every 7 days.
3. Semantic Verification — DeBERTa-v3 (184M params)
An ONNX model that confirms each flag before action is taken.
The false-positive marker system overturns docs examples,
tutorial code, placeholders, and security discussions.
If a flag survives all three checks, the configured action mode takes effect (modal, auto-redact, or auto-block).
--- built-in categories (regex) ---------------------------------------
*api_keys
AWS (AKIA...), GitHub (ghp_...), Stripe (sk_live_/pk_live_), generic secrets
*db_credentials
MongoDB, MySQL, PostgreSQL, Redis connection strings
*pii
Email addresses, SSNs (XXX-XX-XXXX), credit card numbers
--- atr community rules ------------------------------------------------
The ATR (Agent Threat Rules) community maintains 652+ regex patterns
covering 7 categories of agentic threats. NodeGuarder ships the full set
and updates automatically.
Total: 652 rules
Each rule includes a severity level (critical, high, medium), a human-readable title, and one or more regex patterns scoped to specific message fields (user input, tool response, tool args, content).
*injection
219 rules — prompt injection, jailbreaks, system prompt overrides
*code_execution
211 rules — shell commands, eval abuse, reverse shells
*social_engineering
106 rules — goal hijacking, authority escalation, consent bypass
*skill_compromise
41 rules — supply chain attacks, skill impersonation, hidden capabilities
*model_abuse
39 rules — model extraction, malicious fine-tuning, security boundary violations
*excessive_autonomy
30 rules — runaway loops, resource exhaustion, unauthorized agent actions
*data_poisoning
6 rules — training data contamination, memory manipulation
Total: 652 rules
Each rule includes a severity level (critical, high, medium), a human-readable title, and one or more regex patterns scoped to specific message fields (user input, tool response, tool args, content).
--- false-positive marker system ---------------------------------------
To avoid interrupting legitimate workflows, NodeGuarder includes a two-level
false-positive overturn system:
Level 1 — Strong markers (always override without model):
Documentation examples, placeholder values (localhost, 127.0.0.1,
password123, ****), tutorial markers (tutorial, guide, quick start).
Level 2 — Weak markers (gated by DeBERTa confidence):
Code documentation ("npm install", "in your terminal"),
security discussions (CVE-, vulnerability, educational purposes),
creative context (story, novel, roleplay), code review (TODO, FIXME).
When a false-positive is detected, the audit log records
Level 1 — Strong markers (always override without model):
Documentation examples, placeholder values (localhost, 127.0.0.1,
password123, ****), tutorial markers (tutorial, guide, quick start).
Level 2 — Weak markers (gated by DeBERTa confidence):
Code documentation ("npm install", "in your terminal"),
security discussions (CVE-, vulnerability, educational purposes),
creative context (story, novel, roleplay), code review (TODO, FIXME).
When a false-positive is detected, the audit log records
detection_method: "FP_OVERTURN" and the prompt passes through
without modification.
--- action modes -------------------------------------------------------
When a prompt is flagged, NodeGuarder follows the configured action mode:
The default mode is
The HITL modal shows a 15-second countdown. On timeout, text is auto-redacted and attachments are auto-blocked.
| Mode | Behavior |
|---|---|
| permissive | Show modal — user chooses Allow, Redact, or Block |
| enforced_redact | Show modal with Redact/Block only (no Allow) |
| enforced_block | Show modal with Block only |
| auto_redact | No modal — auto-redact and continue |
| auto_block | No modal — return 403 Forbidden |
The default mode is
permissive. When enrolled in an enterprise
portal, it auto-escalates to enforced_redact.
The HITL modal shows a 15-second countdown. On timeout, text is auto-redacted and attachments are auto-blocked.