Detection Rules¶

Oktsec includes 175 detection rules across 15 categories, compiled into the binary. No external files to deploy.

Rule sources¶

Source	Count	Prefix	Description
Aguara	148	`PI-`, `CL-`, `EX-`, `CE-`, etc.	Open-source detection engine for AI security threats
Inter-agent protocol	12	`IAP-`	Oktsec-specific rules for agent-to-agent attacks
OpenClaw config	15	`OCLAW-`	Configuration security checks for OpenClaw installations

Aguara categories¶

The 148 Aguara rules cover these categories:

Category	Description	Example threat
`prompt-injection`	Direct and indirect prompt injection	"Ignore previous instructions and..."
`credential-leak`	API keys, tokens, passwords in transit	AWS keys, GitHub tokens, SSH keys
`exfiltration`	Data exfiltration patterns	Encoding data in URLs, DNS tunneling patterns
`command-execution`	Shell command injection	`$(rm -rf /)`, backtick injection
`mcp-attack`	MCP protocol-level attacks	Malicious tool descriptions, server impersonation
`mcp-config`	MCP configuration weaknesses	Overly permissive tool access, missing auth
`supply-chain`	Dependency and package attacks	Typosquatting, malicious install scripts
`ssrf-cloud`	SSRF targeting cloud metadata	`http://169.254.169.254/latest/meta-data`
`indirect-injection`	Injection via external content	Poisoned documents, hidden instructions in HTML
`unicode-attack`	Unicode-based evasion techniques	Homoglyph attacks, invisible characters
`third-party-content`	Risks from third-party data	Untrusted API responses with embedded instructions
`external-download`	Suspicious download patterns	Binary downloads, script execution from URLs

Inter-agent protocol rules (IAP)¶

These 12 rules are specific to agent-to-agent communication — the unique attack surface that oktsec was built to protect.

Agent message rules¶

Rule	Severity	Description
`IAP-001`	Critical	Relay injection — agent-to-agent hijacking via embedded instructions
`IAP-002`	High	PII in agent messages — SSNs, passport numbers, personal data in transit
`IAP-003`	Critical	Credentials in agent messages — API keys, tokens, passwords between agents
`IAP-004`	High	System prompt extraction — attempts to extract another agent's system prompt
`IAP-005`	High	Privilege escalation — an agent trying to gain elevated permissions
`IAP-006`	High	Data exfiltration via relay — using an agent as a proxy to leak data

Tool description rules¶

These catch attacks embedded in MCP tool descriptions — a vector where a compromised MCP server poisons tool metadata to hijack agents:

Rule	Severity	Description
`IAP-007`	Critical	Tool description prompt injection — hijacking instructions in tool descriptions
`IAP-008`	Critical	Tool description data exfiltration — exfil URLs embedded in tool descriptions
`IAP-009`	High	Tool description privilege escalation — privilege escalation in tool metadata
`IAP-010`	High	Tool description shadowing — a tool that mimics another tool's name/behavior
`IAP-011`	Critical	Tool description hidden commands — concealed execution instructions
`IAP-012`	High	Tool name typosquatting — tool names designed to confuse (`read_flie` vs `read_file`)

OpenClaw config rules (OCLAW)¶

15 rules for detecting security issues in OpenClaw installations.

Rule	Severity	Description
`OCLAW-001`	Critical	Full tool profile without restrictions
`OCLAW-002`	High	Gateway exposed to network
`OCLAW-003`	High	Open DM policy
`OCLAW-004`	Critical	Exec/shell tool without sandbox
`OCLAW-005`	Critical	Path traversal in `$include`
`OCLAW-006`	High	Gateway missing authentication
`OCLAW-007`	High	Hardcoded credentials in config
`OCLAW-008`	Critical	Dangerous security override flag
`OCLAW-009`	Critical	Sandbox mode disabled
`OCLAW-010`	High	Workspace-only restriction disabled
`OCLAW-011`	High	Wildcard in access allowlist
`OCLAW-012`	High	Dangerous tool grants
`OCLAW-013`	High	Sensitive file path in transit
`OCLAW-014`	Medium	mDNS full disclosure mode
`OCLAW-015`	High	Browser control host access

Verdict escalation¶

The pipeline maps findings to verdicts in four stages:

1. Severity mapping (default)¶

Severity	Default verdict
Critical	block (403)
High	quarantine (202)
Medium	flag (200, logged)
Low	clean (200)

2. Blocked content (per-agent)¶

If a finding's category matches the agent's blocked_content list, the verdict is escalated to block regardless of severity:

agents:
  researcher:
    blocked_content: [credentials, pii]
    # Any credentials or PII finding → block, even if medium severity

3. History escalation¶

Based on recent behavior within a 1-hour window:

Condition	Escalation
3+ blocks/quarantines + new flagged content	Flag → quarantine
5+ blocks/quarantines + new flagged content	Flag → block

This catches agents that repeatedly probe boundaries with slightly-below-threshold content.

4. Rule overrides (config)¶

Per-rule action in config can force any verdict, overriding all other logic:

rules:
  - id: "IAP-001"
    action: "block"       # always block, regardless of severity mapping
  - id: "PI-003"
    action: "ignore"      # disable this rule entirely
  - id: "CL-002"
    action: "allow-and-flag"  # deliver but log

Override actions:

Action	Effect
`block`	Reject the message (403)
`quarantine`	Hold for human review (202)
`allow-and-flag`	Deliver but log as flagged (200)
`ignore`	Remove the finding entirely — rule is disabled

Category webhooks¶

Set default webhook channels for all rules in a category:

category_webhooks:
  - category: credential-leak
    notify: [slack-security]
  - category: prompt-injection
    notify: [slack-security]
  - category: inter-agent
    notify: [slack-security, discord-alerts]

Rules with explicit notify take precedence over category-level webhooks.

Custom rules¶

Add org-specific detection rules by setting custom_rules_dir:

custom_rules_dir: ./custom-rules

Rules follow the Aguara YAML schema. Example custom rule:

id: ORG-001
name: "Internal API key pattern"
description: "Detects our org's internal API key format"
severity: critical
category: credentials
targets: ["*.md", "*.txt", "*.json"]
match_mode: any
patterns:
  - type: regex
    value: "(?i)orgkey_[a-z0-9]{32}"
examples:
  true_positive:
    - "Use this key: orgkey_a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4"
  false_positive:
    - "The orgkey format is documented in the wiki"

Guidelines:

Use IAP- prefix for inter-agent rules, org-specific prefix for custom rules
Always include true_positive and false_positive examples
Test with oktsec rules --explain ORG-001 after adding

CLI¶

oktsec rules                     # List all 175 rules with severity
oktsec rules --explain IAP-001   # Show rule patterns, examples, and description

Inline testing (dashboard)¶

The dashboard Rules page includes an inline tester — paste any content and test it against a specific rule to see if it matches. Useful for tuning custom rules and verifying false positive/negative behavior.