OpenClaw: From Viral Growth to Security Crisis

OpenClaw is an autonomous AI personal assistant created by Peter Steinberger (@steipete). It connects to messaging platforms (WhatsApp, Telegram, Slack, Discord, iMessage) and can execute shell commands, control browsers, read and write files, manage calendars, and send emails. All of this is triggered through chat messages, with LLMs acting as the reasoning engine.

The project underwent rapid iteration: Clawd (November 2025) → Clawdbot → Moltbot (after an Anthropic trademark complaint) → OpenClaw (January 29, 2026). On February 14, 2026, Steinberger announced he was joining OpenAI, with OpenClaw transitioning to an open-source foundation.

Metric	Value
GitHub Stars	200,000+
Weekly npm Downloads	720,000+
Exposed Internet Instances	135,000+ (SecurityScorecard)
GitHub Security Advisories	6+ in first 3 weeks
Malicious Skills on ClawHub	341 confirmed (incl. 335 from 'ClawHavoc' campaign)
External Vendor Advisories	Cisco, Bitdefender, CrowdStrike, Palo Alto, Sophos, Gartner

This pace of adoption, combined with the tool's extraordinary access to shell, file system, messaging, and calendar, creates a security surface unlike anything the open-source ecosystem has seen before. When 720,000 npm downloads per week include root-level agent access, the security posture of the framework is not an academic concern.

Previously Disclosed Vulnerabilities

Three CVEs were patched before our assessment began. Our findings have zero overlap with any of these.

CVE	Description	CVSS / Status
CVE-2026-25253	One-click RCE via gateway token exfiltration. Crafted link exfiltrates auth token; attacker connects to victim's local gateway, disables sandbox, achieves full RCE.	8.8 - Patched v2026.1.29
CVE-2026-25157 (GHSA-q284)	SSH command injection via unescaped project paths in sshNodeCommand. Also: SSH target flag injection (-oProxyCommand).	7.8 - Patched v2026.1.29
CVE-2026-24763 (GHSA-mc68)	Docker PATH injection. Unsafe handling of PATH environment variable during shell command construction in container context.	8.8 - Patched v2026.1.29

All three were patched in a single release (v2026.1.29) on January 30, 2026. Belgium's Centre for Cybersecurity issued an emergency advisory with highest-priority patching recommendation. Hunt.io identified over 17,500 instances exposed to CVE-2026-25253 alone.

Kolega.dev Security Assessment: Critical Findings

We conducted an independent deep code analysis of the OpenClaw repository using Kolega.dev's semantic analysis engine. This is our 46th open-source security assessment. Across 45 prior projects we found 225 vulnerabilities with a 90.24% maintainer acceptance rate, including critical findings in vLLM, Phase, Agenta, NocoDB, Weaviate, Qdrant, Langfuse, and Cloudreve.

Assessment Scope
We identified 13 security issues in the OpenClaw codebase. 6 are substantive vulnerabilities; 7 are defence-in-depth improvements that reflect known trade-offs in the project's security-by-design architecture. This section details the two CVSS 9.9 Critical issues first, then summarises the remaining 11 findings and documents the maintainer response.
These two findings share a common root cause: OpenClaw's trust boundary model is conditional. The default sandbox enforces security constraints correctly. When users escalate beyond those defaults, often at the suggestion of an LLM troubleshooting their configuration, those constraints disappear entirely with no intermediate safeguards.

V1 - Command Injection via Unrestricted Elevated Execution Mode

🚨 CRITICAL - Command Injection in elevated=full Mode | CVSS 9.9

OpenClaw's exec tool has three distinct security tiers. Two are safe. The third, elevated=full, removes every security constraint simultaneously, with no allowlist, no approval step, and no human in the loop. A prompt-injected agent operating in this mode has unrestricted shell access to the host system.

How the Trust Tiers Work

OpenClaw's exec tool is architecturally sound at its defaults. The problem is the escalation path:

Mode	Constraints	Risk Level
Default (sandbox + deny)	Docker sandbox, deny-by-default allowlist, all commands blocked unless explicitly permitted	Safe ✓
Gateway + allowlist + approval	Human approval required per command, explicit allowlist enforced, gated execution	Safe ✓
elevated=full	No allowlist. No approval prompt. No sandbox. No human in the loop.	CRITICAL ✗

The Code

The configuration that enables unrestricted mode requires three keys, but imposes no additional validation once set:

1// From exec tool configuration
2// When security="full" and ask="off" and bypassApprovals=true:
3//   → No command allowlist check
4//   → No human approval step  
5//   → No sandbox boundary enforcement
6//   → Agent executes any shell command directly
7 
8if (config.security === "full" && config.ask === "off" && config.bypassApprovals) {
9  // Execute without any validation
10  return shell.exec(command);
11}

Why This Is Exploitable in Practice

The technical mechanism matters less than the real-world attack path. OpenClaw's user base increasingly includes non-technical "vibe coders", people using AI to build and configure software without deep technical knowledge. When something doesn't work, their first instinct is to ask the LLM to fix it.

An LLM troubleshooting an approval friction issue will, in many configurations, suggest enabling elevated=full mode as the path of least resistance. Three config keys later, a prompt-injected agent has unrestricted shell access. The user who enabled this did not understand what they were removing.

This is not hypothetical. It is the documented pattern across vibe-coded applications: safe defaults, combined with AI-guided configuration changes, that end up catastrophically insecure in production.

Impact

Unrestricted shell command execution on the host system
File system read/write without any path restrictions
Exfiltration of API keys, SSH keys, credentials, and any accessible data
Persistence mechanisms (cron jobs, .bashrc, SSH authorized_keys)
Lateral movement to any system accessible from the host

Remediation

Remove elevated=full as a single-flag escape from all security constraints
Require explicit per-command confirmation even in elevated modes. Eliminate bypassApprovals
Add prominent, unavoidable warnings when any security tier is escalated
Log all mode escalations with timestamps and triggering context
Consider making elevated=full require a secondary authentication step (e.g. sudo-style re-authentication)

V3 - Path Traversal in Patch Application (Gateway Mode)

🚨 CRITICAL - Path Traversal via apply-patch in Gateway Mode | CVSS 9.9

The patch application tool has two code paths: one with sandbox enforcement (secure) and one without (gateway mode). In gateway mode, the function resolvePathFromCwd has no boundary enforcement. A prompt-injected agent can write patches to any file on the system, including shell configs, cron jobs, and SSH authorized keys.

The Two Code Paths

The vulnerability is architectural: the same function handles both modes, but only one mode validates path boundaries.

1// WITH sandbox (default mode) - secure
2if (sandboxRoot) {
3  // assertSandboxPath validates the resolved path
4  // is within sandboxRoot boundary
5  assertSandboxPath(sandboxRoot, resolvedPath);
6}
7 
8// WITHOUT sandbox (gateway mode) - no boundary enforcement
9// resolvePathFromCwd resolves the path but performs
10// no validation against any boundary.
11// A path like "../../.ssh/authorized_keys" passes through.
12const resolvedPath = resolvePathFromCwd(targetPath);

Concrete Attack Scenario

In gateway mode, a prompt-injected agent sends a patch application request with a path traversal payload. Because gateway mode skips boundary validation, the patch is applied to the target file without restriction:

Target File	Payload Path	Consequence
~/.bashrc	../../.bashrc	Persistent shell command execution on every login
~/.ssh/authorized_keys	../../.ssh/authorized_keys	Attacker SSH key installed; permanent remote access
/etc/cron.d/malicious	../../../etc/cron.d/malicious	Scheduled task execution as root (if applicable)
Any config file	Relative traversal	Arbitrary file modification anywhere on the filesystem

Why Gateway Mode Is the Dangerous Case

Gateway mode is explicitly the higher-capability mode. It is how users run OpenClaw for remote access and automation scenarios. It is also exactly where the security validation is absent. This inversion, more capability but less validation, is the core of the vulnerability.

Users choosing gateway mode are specifically the users who have given the agent the most access and the most autonomy. Those are the users who most need path boundary enforcement, not least.

Impact

Write arbitrary content to any file the process can access
Establish persistence through shell configs, cron, or systemd
Install attacker SSH keys for permanent access
Modify application configs to exfiltrate future credentials
Chain with V1 (elevated=full) for full host compromise

Remediation

Apply assertSandboxPath validation in all code paths, not only when sandboxRoot is set
In gateway mode, define an explicit boundary (the user's home directory or project root) and enforce it
Reject any patch targeting files outside the established boundary, regardless of mode
Log all file write operations with source, target, and mode context

2.3 The Common Thread: Conditional Trust Boundaries

Root Cause Analysis
Both critical vulnerabilities share the same architectural pattern: security constraints that are conditional on configuration rather than enforced unconditionally. Safe defaults exist and work correctly. The problem is that escalating beyond those defaults strips constraints entirely rather than partially. There is no middle ground: you either have all the guardrails, or none of them.

This pattern is not unique to OpenClaw. It appears consistently across agentic frameworks because the design tension is real: users want to grant more capability to their agents, and the simplest implementation is to remove restrictions entirely. The security solution (graduated permissions, per-action approval chains, explicit boundary enforcement at every tier) adds friction that developers and users actively resist.

The difference between acceptable and unacceptable security is whether the removal of restrictions requires understanding what you are removing. Right now, in OpenClaw, it does not.

The Remaining 11 Findings

Beyond the two critical vulnerabilities, we identified 4 additional substantive vulnerabilities and 7 defence-in-depth improvements.

V2 - Weak Token Generation (CWE-330): Device authentication tokens were generated from UUIDv4 (122 bits of entropy) and compared using standard string inequality, which is susceptible to timing side-channel attacks. Our fix replaced this with 256-bit entropy via crypto.randomBytes and timing-safe comparison using timingSafeEqual, matching the pattern already used in gateway auth. PR #16490 - merged.

V4 - Payload Log Sanitisation (CWE-532): The Anthropic payload logging feature writes full API request payloads to a JSONL file without redaction. When messaging channels are configured with open DM policies, third-party PII flows into these payloads in plaintext. Our fix added a functional sanitiser that redacts secrets (API keys from 8 providers, tokens, JWTs, private keys) and PII (SSNs, credit cards, emails, phone numbers) before writing. 22 tests covering edge cases. PR #16474 - closed by maintainer (see Section 2.5).

V5 and V6 - Additional substantive findings: Two further vulnerabilities involving trust boundary gaps in tool execution and input validation. Details in the full assessment at kolega.dev/security-wins/openclaw-security-assessment/.

The remaining 7 findings are defence-in-depth improvements that align directly with gaps documented in OpenClaw’s own published Trust & Threat Model (phases 2-3): trust boundaries, input validation, and context isolation. These are not surprises to the maintainers. They are known architectural gaps that the project has chosen to treat as design decisions rather than bugs. Our position is that at the scale OpenClaw has reached (200,000+ stars, 135,000+ exposed instances, root-level shell access), these design decisions carry genuine security consequences that should be addressed as features, not deferred indefinitely.

Maintainer Response and Security Posture

We submitted PRs for the findings that had clear code-level fixes. The response was mixed.

PR #16490 (token hardening) - merged. The community response was constructive. A reviewer (gumadeiras) self-assigned, fixed a stray package-lock file, and merged the security improvement within days. This is what good open-source security collaboration looks like.

PR #16474 (payload log sanitisation) - closed without merging. A core contributor (HenryLoenwind) dismissed the finding: "This finding is useless. The system already processes DM content extensively, keeping it in sessions and the agent memory. Hardening a single type of debugging log that isn’t even enabled by default makes no sense. This is a whole lot of code with no benefit at all." Our team responded with a defence citing CWE-532 and OWASP A09:2021, noting that logs travel in ways memory doesn’t: they get copy-pasted into GitHub issues, uploaded to support threads, committed to repositories. Steinberger closed the PR with: "logging is optional - I don’t see why we need to cripple that for local debug logging."

The dismissal reveals a pattern. Logging sensitive data is a well-documented vulnerability class. The argument that logging is "optional" misses the point: non-technical users enable debug logging on LLM advice without understanding PII implications. Log files are common targets for credential theft. The "optional" framing treats security as something only sophisticated users need, which is exactly backwards for a tool whose user base increasingly consists of non-engineers.

The broader posture. OpenClaw’s SECURITY.md states directly: "OpenClaw is a labor of love. There is no bug bounty program and no budget for paid reports." It also notes: "Given the volume of AI-generated scanner findings, we must ensure we’re receiving vetted reports from researchers who understand the issues." ZeroPath, another security firm that assessed OpenClaw, published a characterisation in their blog: "Because Openclaw does not prioritize security, researchers have been able to uncover a number of significant issues."

To be fair, Steinberger has taken positive steps: he published GHSAs personally, hired Jamieson O’Reilly (founder of Dvuln) as security lead, and published a blog post thanking security researchers. These are real and worth acknowledging. But the pattern across all interactions, ours and others, is a project whose security posture has not matched the scale of its access. When 135,000+ instances are running with shell access, "no budget for paid reports" is not a security policy. It is a gap.

This Is Not just an OpenClaw Problem

OpenClaw is the most visible example of an industry-wide pattern. Every major agentic AI framework has been found to have critical security vulnerabilities. The following is not a comprehensive survey. It is a sample of documented findings from reputable security researchers and firms.

Framework	Finding	Severity / Status
Cursor	CVE-2025-54135/54136: MCP trust bypass enabling persistent RCE. CVE-2025-59944: case-sensitivity file overwrite. 94 unpatched Chromium CVEs affecting ~1.8M developers (dismissed as out-of-scope).	Critical - partial patches
Devin (Cognition)	Prompt injection → RCE via expose_port tool. Local files exposed to public internet. No fix after 120+ days; Rehberger published after embargo.	Critical - unpatched 120+ days
Claude Code	CVE-2025-52882: unauthenticated WebSocket (CVSS 8.8). CVE-2025-55284: DNS exfiltration. Zero-click RCE via Desktop Extensions (CVSS 10.0). Anthropic noted it 'falls outside current threat model'.	Critical - mixed response
GitHub Copilot	CVE-2025-53773: YOLO mode RCE via prompt injection enabling ZombAI self-replicating botnet.	Critical - patched
LangChain	CVE-2025-68664 (CVSS 9.3): serialization injection exposing secrets and workflows.	Critical - patched
AutoGPT	CVE-2026-26020 (CVSS 9.4): RCE.	Critical - patched
Windsurf	Prompt injection → data exfiltration of developer secrets and credentials.	High - patched

The IDEsaster research by Ari Marzouk found 30+ vulnerabilities across 10+ AI coding products with 24 CVEs assigned, concluding that 100% of tested AI IDEs were vulnerable to a three-stage attack chain: prompt injection → tool invocation → base IDE exploitation.

The Prompt Injection Problem
All agentic AI frameworks face a threat that has no complete solution: prompt injection. An attacker embeds instructions in content the agent processes (a webpage, a file, a message) that redirect the agent's actions. OpenAI has stated publicly that prompt injection 'is unlikely to ever be fully solved.' The UK NCSC has warned it may never be fully mitigated. Johann Rehberger's 'Month of AI Bugs' (August 2025) demonstrated that 100% of tested coding agents were exploitable via indirect prompt injection, with attack success rates of 41–84% across 314 payloads.

The takeaway is simple: if prompt injection cannot be fully prevented, then the blast radius of a successful injection must be minimised. Every guardrail that is absent, every escalation path that removes constraints, every conditional trust boundary is additional blast radius. OpenClaw's V1 and V3 are, precisely, unconditional blast radius additions.

In November 2025, Anthropic disclosed that Chinese state-sponsored threat actor GTG-1002 used Claude Code to autonomously execute 80–90% of a sophisticated cyber espionage campaign against 30+ global organisations. This is not a future threat model. It is a documented present reality.

Standards Are Forming. The Window to Lead Is Now.

The agentic AI security ecosystem is consolidating rapidly around a set of emerging standards. The timing of this advisory is intentional: these frameworks are new enough that the industry can still shape them, but established enough that ignoring them carries regulatory and reputational risk.

OWASP Top 10 for Agentic Applications 2026

Released December 2025, developed with 100+ industry experts, and already referenced by AWS, Microsoft, and NVIDIA. The framework identifies ten risk categories directly relevant to OpenClaw and similar tools:

ASI01 - Agent Goal Hijack: Prompt injection redirecting agent objectives
ASI02 - Tool Misuse: Agents using legitimate tools for unintended purposes
ASI03 - Identity & Privilege Abuse: Privilege escalation and impersonation
ASI04 - Agentic Supply Chain Vulnerabilities: Malicious skills/plugins
ASI05 - Unexpected Code Execution: Sandbox escapes and unintended execution
ASI06 - Memory & Context Poisoning: Corrupting agent state

OpenClaw's V1 (Command Injection) directly maps to ASI02 and ASI05. V3 (Path Traversal) maps to ASI02. The malicious ClawHub skills campaign maps to ASI04. All of these were foreseeable using published frameworks that existed before OpenClaw shipped.

NIST AI Agent Standards Initiative

Announced February 17, 2026, the day before this advisory was drafted. NIST's Center for AI Standards and Innovation launched a formal initiative with three pillars: facilitating industry-led standards, fostering community-led protocols, and investing in research on agent authentication and identity infrastructure.

NIST also issued a Request for Information on AI Agent Security (January 8, 2026, comment period closes March 9, 2026) and is developing Control Overlays for Securing AI Systems (COSAiS) mapped to SP 800-53 controls. The regulatory direction is clear. Tools that can't show alignment with these frameworks will face growing enterprise and government resistance to deployment.

What the Standards Require

Across OWASP, NIST, MITRE ATLAS, and the Coalition for Secure AI (CoSAI), the consensus is converging on a set of concrete requirements for agentic frameworks:

Requirement	Current OpenClaw Status
Unconditional path boundary enforcement	Conditional - absent in gateway mode (V3)
Minimum privilege by default with explicit escalation	Conditional - elevated=full removes all constraints (V1)
Per-action approval chains that cannot be globally bypassed	Bypassable via bypassApprovals configuration
Prompt injection resistance / input sanitisation	Industry-wide unsolved problem; no OpenClaw-specific mitigations
Plugin/skill supply chain validation	VirusTotal partnership announced; 341 malicious skills found prior
Published security policy with response SLAs	SECURITY.md exists; no bug bounty; no paid reports
Security audit before major version release	No evidence of pre-release audit; CVEs found post-launch

What We Are Calling For

Heartbleed (2014) kicked off the Linux Foundation's Core Infrastructure Initiative and ultimately the Open Source Security Foundation. Log4Shell (2021) triggered a joint advisory from CISA, FBI, NSA, and five allied nations and led to a $147.9 million OpenSSF mobilisation plan. Neither of those incidents required shell access to run.

Agentic AI frameworks with OS-level access represent a fundamentally different threat surface. The industry cannot wait for the equivalent incident before establishing standards.

For OpenClaw Specifically

Remove elevated=full as a configuration that bypasses all security constraints simultaneously. Graduated permissions should be the only escalation path.
Apply path boundary enforcement unconditionally, in all modes, not only when sandboxRoot is set. Gateway mode users have more access, not less need for protection.
Commission an independent security audit before the next major release. The three patched CVEs and our 13 findings were found without source access. A dedicated audit will find more.
Establish a bug bounty programme. 'No budget for paid reports' is not a security policy for a tool running on hundreds of thousands of machines with shell access.
Adopt OWASP Top 10 for Agentic Applications as a published design checklist and make compliance visible in documentation.

For the Agentic AI Ecosystem

All agentic frameworks with OS-level access should treat OWASP Top 10 for Agentic Applications 2026 as a minimum security baseline, not a reference document.
Prompt injection resistance research should be treated as a first-class security investment, not an acknowledged limitation. OpenAI's statement that it 'may never be solved' should be a reason for more investment, not for acceptance.
Plugin/skill marketplaces require mandatory security scanning and a coordinated disclosure process before any skill is listed publicly. Cisco found 26% of 31,000 skills vulnerable; Bitdefender found 341 malicious packages.
Security researchers who disclose vulnerabilities in these frameworks deserve prompt, constructive responses and formal acknowledgement. A SECURITY.md that explicitly states 'no bug bounty and no budget for paid reports' discourages the exact researchers these projects most need.
NIST's AI Agent Standards Initiative comment period is open until March 9, 2026. Security researchers and framework maintainers should submit evidence-based input.

For Users

Run OpenClaw in Docker sandbox mode (the default). Do not disable it without understanding exactly what you are removing.
Never enable elevated=full mode, especially at the suggestion of an LLM. This is the configuration that removes all security constraints.
Do not expose your gateway to the public internet unless you understand the security implications and have applied compensating controls.
Audit any installed skills before granting them permissions. The malicious ClawHub campaign demonstrates that skill supply chain attacks are active and ongoing.
Keep OpenClaw updated. All three patched CVEs were addressed in a single release; the gap between disclosure and patching was under 24 hours. Update cadence matters.

About This Research

Kolega.dev is an automated security remediation platform that combines traditional SAST and SCA scanning with proprietary semantic code analysis. We have conducted security assessments of 40+ open-source projects, finding critical vulnerabilities in 4 out of 5 projects. Our PRs are well received by maintainers with a 90.24% acceptance rate.

Every finding in this advisory is based on code-level analysis with specific file locations, line numbers, and supporting evidence. Every claim about our research is verifiable through linked GitHub PRs and security advisories. (full reports here)

We follow coordinated disclosure practices: report privately, allow time for remediation, publish after the SLA period regardless of response. For OpenClaw, we are coordinating with Peter Steinberger and the team. One PR has already been merged at the time of publication.