SECURITY WIN

Agenta Security Assessment

Identified 8 security vulnerabilities including 4 critical code execution flaws in the RestrictedPython sandbox implementation allowing arbitrary system command execution, 2 SSRF vulnerabilities in testset import and webhook evaluator endpoints, and authorization logic bugs enabling unauthorized modification of testsets. All 9 reported items were remediated and merged into release v0.77.1.
January 202612 min read
Faizan
Agenta Code ExecutionAgentaSSRFAuthorization BypassRestrictedPython Bypass

Executive Summary

This security assessment was conducted using Kolega.dev's automated security remediation platform, which combines traditional security scanning (SAST, SCA, secrets detection) with proprietary AI-powered deep code analysis. Our two-tier detection approach identified vulnerabilities that standard tools miss, including complex logic flaws and cross-service injection vectors.

Our analysis of the Agenta repository identified 8 vulnerabilities through Kolega.dev Deep Code Scan (Tier 2) that warrant attention.

Vulnerability Overview

ID

Title

PR Ticket

V1

Authorization Logic Bug - Insecure Comparison in Testset Variant Edit

PR #3395

V2

Typo in Permission Check May Bypass Authorization

PR #3395

V3

Critical: Unsafe Code Execution in Sandbox via RestrictedPython Bypass

PR #3258

V4

Code Execution Vulnerability in RestrictedPython Sandbox Bypass

PR #3258

V5

SSRF Vulnerability in Testset Import Endpoint

PR #3395

V6

Server-Side Request Forgery (SSRF) in Webhook Evaluator

PR #3395

V7

Dangerous RestrictedPython Bypass via import

PR #3258

V8

Unsafe Variable Reference in Exception Handler

PR #3395

Responsible Disclosure Timeline

Kolega.dev follows responsible disclosure practices. We coordinated privately through Agenta's official security reporting channel.

January 6, 2026

Initial report sent to Agenta by email to security@agenta.ai

January 12, 2026

Response from Agenta confirming 9 of the reported items were remediated and merged into release v0.77.1

Vulnerabilities Detail

V1: Authorization Logic Bug - Insecure Comparison in Testset Variant Edit

CWE: CWE-863 (Incorrect Authorization)
Location: api/oss/src/apis/fastapi/testsets/router.py:590

Description
The edit_testset_variant endpoint contains a critical authorization flaw where it checks if testset_variant_id is in user_id instead of comparing against the actual testset variant ID.

Evidence
Line 590 uses 'not in' operator to check if testset_variant_id is contained within user_id string, which is incorrect authorization logic. This should verify that the user owns or has access to the testset variant.

Impact
Any user can edit any testset variant regardless of ownership. This completely bypasses authorization checks, allowing unauthorized modification of testsets belonging to other users/organizations.

Remediation
Replace with proper authorization check: Fetch the testset variant from database and verify the user has permission to edit it, similar to other endpoints in the same file.


V2: Typo in Permission Check May Bypass Authorization

CWE: CWE-682 (Incorrect Calculation)
Location: api/oss/src/routers/bases_router.py:48

Description
Typo in project_id attribute name causes permission check to potentially fail silently or use wrong project ID.

Evidence
Line 48 references 'app.project_i' instead of 'app.project_id'. This will raise AttributeError or potentially use a different attribute if one exists with that name.

Impact
Permission check may fail completely causing denial of service, or worse, may succeed with wrong project ID allowing unauthorized cross-project access if 'project_i' resolves to a valid attribute.

Remediation
1) Fix the typo to use 'app.project_id'. 2) Add type checking to catch such errors. 3) Implement comprehensive testing including permission checks. 4) Use linters and static analysis tools. 5) Add integration tests for authorization logic.


V3: Critical: Unsafe Code Execution in Sandbox via RestrictedPython Bypass

CWE: CWE-94 (Improper Control of Generation of Code ('Code Injection')), CWE-95 (Improper Neutralization of Directives in Dynamically Evaluated Code ('Eval Injection'))
Location: api/oss/src/services/security/sandbox.py:14-119

Description
The sandbox implementation uses RestrictedPython to execute user-provided Python code, but the security model is fundamentally flawed and can be bypassed. The code has multiple critical weaknesses that allow arbitrary code execution.

Evidence

  1. String-based import checking (lines 24-28) only checks if 'os' string appears in code - can be bypassed by obfuscation like: import('os'), 'os'[0], or module introspection.

  2. import is explicitly added to safe_builtins (line 58), allowing unrestricted module imports despite the blocklist.

  3. RestrictedPython is known to have bypasses. Allowed modules (requests, math, random, datetime, json, typing) can be exploited for SSRF via 'requests' module.

  4. No limitation on execution time, memory, or I/O operations.

  5. The disallowed imports list is easily circumvented.


Impact
An attacker can achieve Remote Code Execution (RCE) by:

  • Using import to import blocked modules (os, subprocess)

  • Executing arbitrary shell commands through subprocess module

  • Exfiltrating data via requests module (SSRF attacks)

  • Reading environment variables and local files

  • Launching denial of service attacks This fully compromises the application and allows unauthorized access to system resources.

Remediation

  1. IMMEDIATE: Disable code execution features entirely if not absolutely necessary.

  2. If code execution is required, implement proper sandboxing using OS-level isolation (containers, VMs, seccomp) instead of RestrictedPython.

  3. If RestrictedPython must be used:

    • Remove import completely

    • Use a whitelist-only approach for allowed modules

    • Implement timeout and resource limits (CPU, memory, I/O)

    • Use subprocess.run with shell=False and explicit arguments

    • Disable file access entirely

    • Validate all code before compilation

    • Log all code execution attempts

  4. Consider using a proper sandboxing library like PyPy's sandbox or nsjail.


V4: Code Execution Vulnerability in RestrictedPython Sandbox Bypass

CWE: CWE-94 (Code Injection)
Location: api/oss/src/services/security/sandbox.py:54-73

Description
The code sandbox implementation imports arbitrary modules including 'requests' which can be used to bypass restrictions. More critically, the import builtin is explicitly added to the sandbox environment (line 58), allowing users to import any Python module and execute arbitrary code.

Evidence
Line 58 directly exposes import to user code, which completely bypasses RestrictedPython restrictions. Even though code is compiled with compile_restricted(), users can use import to access any module. Line 66 includes 'requests' module which allows arbitrary HTTP requests. A user can write evaluator code like: import('os').system('command') or import('subprocess').call(['command']) to execute arbitrary system commands.

Impact
An authenticated user with evaluator creation permissions can execute arbitrary Python code and system commands on the server, leading to complete system compromise. This includes data theft, lateral movement, denial of service, and reverse shell access.

Remediation
CRITICAL: Remove import from local_builtins. Use RestrictedPython's safe import mechanism or allowlist approach. Remove 'requests' from allowed_imports. Whitelist only safe modules (math, json, datetime). Use RestrictedImporter or implement custom import that checks against a whitelist. Consider using a containerized execution environment instead.


V5: SSRF Vulnerability in Testset Import Endpoint

CWE: CWE-918 (Server-Side Request Forgery)
Location: api/oss/src/routers/testset_router.py:241-245

Description
The import_testset endpoint accepts arbitrary URLs and fetches them without validation, allowing attackers to perform SSRF attacks to scan internal networks or access internal services.

Evidence
User-controlled 'endpoint' parameter is directly passed to requests.get() without URL validation, scheme restriction, or IP address filtering. The 10-second timeout is insufficient protection.

Impact
Attackers can scan internal network, access cloud metadata endpoints (AWS, Azure, GCP), exfiltrate data from internal services, or probe for open ports on internal infrastructure.

Remediation
1) Implement URL allowlist for trusted domains only. 2) Block access to private IP ranges (RFC1918, link-local, loopback). 3) Use a proxy service for external requests. 4) Validate URL scheme (only https). 5) Implement rate limiting.


V6: Server-Side Request Forgery (SSRF) in Webhook Evaluator

CWE: CWE-918 (Server-Side Request Forgery)
Location: api/oss/src/services/evaluators_service.py:402-413

Description
The webhook_test function accepts user-supplied webhook_url from evaluation settings without validation or restrictions. An attacker can configure evaluations to send requests to arbitrary internal or external URLs, potentially compromising internal services, exfiltrating data, or performing port scanning.

Evidence
Line 409 in evaluators_service.py directly uses input.settings['webhook_url'] to make HTTP POST requests without any validation. There is no allowlist/blocklist of acceptable URLs, no IP range restrictions, and no validation to prevent requests to internal services like 169.254.169.254 (AWS metadata) or internal IPs.

Impact
An attacker with permission to configure evaluations can use this to: 1) Scan internal network for open ports, 2) Access internal services (metadata servers, admin panels), 3) Exfiltrate sensitive data by sending it to attacker-controlled servers, 4) Perform attacks against internal infrastructure.

Remediation
Implement strict URL validation: 1) Maintain a whitelist of allowed webhook domains, 2) Reject URLs with internal IP addresses (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16), 3) Use DNS rebinding protection, 4) Set request timeouts and response size limits, 5) Log all webhook requests for audit purposes.


V7: Dangerous RestrictedPython Bypass via import

CWE: CWE-96 (Improper Control of Interaction Frequency)
Location: sdk/agenta/sdk/workflows/runners/local.py:45, sdk/agenta/sdk/workflows/runners/local.py:59-60

Description
The LocalRunner explicitly adds the import built-in function to the restricted environment, bypassing RestrictedPython's security model. Additionally, it pre-imports a whitelist of modules and makes them available globally. This allows malicious user-supplied code to import arbitrary modules beyond the whitelist.

Evidence
The code explicitly enables import which is a built-in function that allows importing any module. The RestrictedPython library specifically guards against this. Additionally, the request module is whitelisted, which can be used to perform SSRF attacks.

Impact
A user can supply custom evaluation code that imports dangerous modules like 'os', 'subprocess', 'socket', etc., completely bypassing the RestrictedPython sandbox. This enables arbitrary code execution, system command execution, file system access, and network-based attacks.

Remediation
Remove the import built-in from local_builtins. If dynamic imports are required, implement a strict import hook that checks against a whitelist of modules using RestrictedPython's import guards. Remove 'requests' from the whitelist unless absolutely necessary, as it enables SSRF attacks.


V8: Unsafe Variable Reference in Exception Handler

CWE: CWE-476 (NULL Pointer Dereference)
Location: api/oss/src/routers/variants_router.py:294-299

Description
Exception handler references undefined variable 'e' that may not exist in scope, causing additional exceptions and potentially exposing stack traces.

Evidence
Line 294 uses bare except without capturing exception, then line 298 references undefined variable 'e'. This will raise NameError, potentially exposing internal implementation details in error responses.

Impact
Unhandled NameError exceptions may expose internal stack traces, file paths, and implementation details to attackers. Also causes misleading error messages and makes debugging difficult.

Remediation
Replace bare except with 'except Exception as e:' to properly capture the exception. Consider removing print_exc() in production or logging to secure location instead of stdout.


Simple 3 click setup.

Deploy Kolega.dev.

Find and fix your technical debt.