Control Drift: Why Your SOC 2 Compliance Can't Keep Up With AI Written Code
I've been thinking about compliance lately. A lot. Not in the way most people think about it - not the annual audit, not the checklist, not the badge on your website. I've been thinking about what compliance is supposed to prove. And whether it still does.
Because here's the thing. We scanned 45 open-source repositories over the past few months. Found 225 vulnerabilities. Maintainers accepted 90.24% of them. The traditional SAST tools these projects could have been running? They would have generated 1,183 alerts - 87% of which were false positives - and missed the critical stuff entirely. That's what we found when we ran traditional tools head-to-head with semantic analysis on a sample of 10 of those repositories.
When we ran Semgrep against NocoDB, it found 222 potential issues. 208 were noise. And it completely missed the SQL injection in the Oracle client. The one that could have compromised the entire database.
So imagine you're an auditor looking at that. The tool ran. Alerts were generated. Tickets were closed. Everything looks clean. Except the actual vulnerability shipped to production undetected.
What exactly did that compliance process prove?
I'm calling this Control Drift - the gap between how fast code gets written and your ability to actually govern it. And it's getting worse.
The numbers that should worry you
I've written about the 87% false positive problem and the 225 vulnerabilities we found before. But I haven't really connected those numbers to compliance until now. And honestly, the more I think about it, the worse it looks.
Here's what we saw when we compared traditional SAST against semantic analysis on 10 of those repositories:
Traditional SAST (Semgrep):
1,183 findings
152 real vulnerabilities (12.9%)
1,030 false positives (87%)
That's 120 alerts to find 15 real vulnerabilities per repo scanned.
Semantic analysis (our approach):
346 findings
346 real vulnerabilities (100%)
0 false positives
1,030 fewer false alarms. And the stuff traditional tools missed? Authorization bypasses, race conditions, cross-service injection, logic errors in parameterized queries. Not simple bugs.
The NocoDB case is the one I keep coming back to. Semgrep flagged 222 things, 208 were false positives, and it completely missed the SQL injection in the Oracle client - 17 injection points across OracleClient.ts where user input was concatenated directly into queries. Pattern matching saw a query builder wrapper and assumed everything was parameterised. It wasn't. We found it, submitted PR #12748, and NocoDB fixed it.
Same pattern across all 45 projects. Noise where it doesn't matter, silence where it does.
The assumptions that don't hold anymore
Compliance frameworks like SOC 2 and ISO 27001 were designed with a particular world in mind. Code gets written by humans at human speed. Other humans review it. Security tools catch what review misses. Teams fix what gets caught.
None of that works the way it used to.
AI generates code faster than anyone can review. Veracode tested more than 100 LLMs across 80 coding tasks and found 45% of AI-generated code failed security tests. XSS defenses didn't work 86% of the time. Log injection succeeded 88% of the time. Bigger models didn't help - security performance was flat regardless of model size.
The Stanford research (Perry et al., 2022) is the one that really gets me. Developers with AI access didn't just produce more flaws - they were more confident they'd written secure code. That's backwards. And it's exactly the kind of thing compliance frameworks aren't designed to catch. The process looks fine. The developer feels good. The code is broken.
GitClear tracked 211 million lines of AI-generated code. Refactoring rates going down, copy-paste going up. The codebase is growing faster than ever, and the portion of it that anyone actually understands is shrinking.
Meanwhile the security tools meant to fill the gap are generating 87% noise. And when that happens, developers do the rational thing: they stop paying attention. Can you blame them? If 87 out of 100 alerts are wrong, ignoring all of them starts to look like a reasonable strategy.
That's Control Drift. Your code generation velocity has outrun your governance. And your audit trail can't tell the difference.
What your audit trail actually proves
Let me walk through what happens after a typical SAST scan.
Your team runs the tool. It produces 120 alerts. Your engineers spend around 10 mins per alert to triage, so at $100/hr each project wastes around 20hrs or $2,000 on investigating alerts that are 85-90% noise. The remaining findings get marked as resolved or accepted risk.
The audit trail looks great:
✓ Security tool ran
✓ Alerts generated
✓ Team investigated
✓ Issues resolved
What it doesn't show:
✗ The critical vulnerabilities the tool never detected
✗ Whether the 105 false positives were actually verified as false
✗ Whether anyone reviewed the AI-generated code that shipped between scans
Your audit trail proves you have a process. It doesn't prove the process works. And every quarter, as AI generates more code and SAST tools generate more noise, the gap between those two things gets wider.
This isn't hypothetical
We covered these in our vibe coding piece, but they're worth revisiting through a compliance lens, because every single one of these would have been a SOC 2 disaster.
CVE-2025-48757 - Lovable
303 insecure Supabase endpoints across 170 sites. No row-level security. Emails, payment info, API keys, password reset tokens sitting in the open. CVSS 8.26. And this wasn't one careless developer - it was the platform's default output. Every app it generated came out this way.
EnrichLead
The founder bragged publicly that he'd built it with "Cursor AI, zero hand-written code." Then researchers found hardcoded API keys on the client side, no auth on endpoints, no rate limiting. Someone ran up $14,000 on his OpenAI keys. He shut the whole thing down.
Tea dating app
A women-only platform where safety was the whole point. 72,000 photos including 13,000 government IDs. Over a million private messages. GPS data. 59.3 GB from an unsecured Firebase database. Went to 4chan. Ten lawsuits.
In every one of these cases, you could have had a SAST tool running. It would have generated alerts. The audit trail would have looked compliant. And the actual vulnerabilities would have shipped anyway, because the tool was looking for the wrong things.
What actual evidence looks like
This is where our 90.24% acceptance rate starts to matter for reasons beyond bragging rights.
When we submit a fix, it comes with a root cause analysis, a remediation that respects the project's architecture, tests that prove the fix works, and documentation an auditor could actually follow. That's why maintainers merge them.
And that's what creates a compliance record that actually means something. Not "tool ran, tickets closed" - but a clear chain showing what the vulnerability was, why it mattered, how it was fixed, and proof the fix works. With a qualified reviewer (the maintainer) signing off.
The projects:
Weaviate - AWS credential injection, GitHub issue #10146, confirmed in 24 hours
vLLM - Remote code execution via unsafe deserialization, PR #32045, merged in 3 days
Qdrant - Memory access without bounds validation, PR #7884, merged same day
Phase - 9 authorization bypasses, PRs #722-731, all fixed within a week
Langfuse - SSRF + unencrypted secrets, PRs #11311 and #11395, both deployed
Cloudreve - 5 vulnerabilities including timing attacks, release 4.11.0
Agenta - 8 vulnerabilities including sandbox escape, release v0.77.1
NocoDB - 5 vulnerabilities including critical SQL injection, PRs #12748-12752
Every one of those is verifiable at kolega.dev/security-wins. Real GitHub PRs with real maintainer responses.
Compare that to "Semgrep ran on Tuesday, 1,183 tickets created, 1,030 marked false positive by a junior engineer who was trying to clear the backlog before standup."
One of those is evidence of security. The other is evidence of activity.
Where this goes
Compliance actually requires four things: evidence of what vulnerabilities exist, evidence of how they were fixed, evidence the fixes work, and approval from someone qualified.
Traditional SAST gives you the first one - badly, buried in 87% noise. It doesn't give you the other three.
Semantic analysis with automated remediation gives you all four. Real detection, real fixes, test verification, maintainer approval.
90.24% acceptance rate across 45 projects. That's not a benchmark - it's proof.
At AI code scale, running a scanner and closing tickets isn't compliance. It's theatre.
The full results are public: kolega.dev/security-wins. Technical details, root cause analysis, fix implementations, PR numbers, maintainer responses. Everything an auditor would actually want to see.
Because compliance should require proof, not just process.