Kolega.dev is an AI-powered security vulnerability detection and automated code remediation engine that applies semantic code intelligence to detect and fix complex vulnerabilities in your codebase.

What types of vulnerabilities can Kolega.dev detect?

Kolega.dev can detect a wide range of vulnerabilities including SQL injection, XSS, CORS misconfigurations, race conditions, authentication flaws, and complex logic vulnerabilities that standard SAST tools miss.

How does Kolega.dev differ from traditional SAST tools?

Kolega.dev uses deep semantic analysis and AI to understand code intent and logic flow, not just pattern matching. It provides 0% overlap with standard SAST scanners and can detect complex vulnerabilities across service boundaries.

The AI coding revolution is already here. Developers are shipping code faster than ever, powered by tools like GitHub Copilot, Cursor, Claude, and ChatGPT. Cycle times are shrinking. Merge velocity is up. Productivity metrics look great, and engineering leaders are celebrating.
But there’s a problem few want to acknowledge.
Over the past several months, we analyzed 45 open-source projects and uncovered 225 security vulnerabilities. Maintainers accepted 90.24% of our findings. The pattern was hard to ignore: infrastructure projects built for LLM applications contained the highest concentration of issues.
Projects like Langfuse, LiteLLM, ChromaDB, Ollama, and Milvus, the backbone of today’s AI application ecosystem, all exposed serious security weaknesses that traditional SAST tools failed to detect.
This isn’t random. Modern development workflows are introducing vulnerabilities that follow repeatable, predictable patterns, patterns that existing security tooling simply wasn’t designed to catch.

The Size of Code Made by AI

The 2025 Octoverse report from GitHub said that 40% of all code merged on the site now uses AI to help. Not just suggestions for autocompleting, but whole functions, classes, and features are made all at once.

Metric	Value
Code on GitHub using AI assistance	40%

For many teams, the ratio is even higher. Junior developers increasingly rely on Copilot. Proof-of-concept code gets pushed forward with minimal review. AI-generated code often carries subtle structural patterns that traditional security tools were never designed to detect.

So what’s happening?

We’re seeing the emergence of a new class of vulnerabilities, spreading faster than most teams can identify, and evolving in ways existing tooling struggles to understand.

The Security Patterns AI Gets Wrong

After analyzing hundreds of code samples in our research, we've found five security anti-patterns that keep coming up:

1. Shortcuts for Validation

LLMs look for "working" code, not code that is safe. They cut corners when they make input validation:

1// Common validation pattern - looks reasonable, isn't it?
2function validateEmail(email) {
3    return email.contains('@'); // Missing domain, TLD validation
4})

A common example is URL validation that only checks if a URL is parseable, without blocking private IP ranges. Copilot knows how to check a URL. It doesn't know why you need to block internal IP ranges. Result: SSRF flaws in production.

2. Defaults that aren't safe

LLMs learn from codebases that are open source. A lot of those codebases put ease of use ahead of safety:

1// Insecure pattern - math/rand for security tokens
2st := fmt.Sprintf("ST-%d", rand.Int()) // Predictable! Should use crypto/rand

This isn't a new mistake. The model learnt this pattern from thousands of examples where developers used math/rand because it was easier. LLMs reliably copy these patterns.

Projects with public disclosures: Cloudreve

3. Missing Async/Await

We were surprised by this one. AI-generated async code often doesn't have the await keywords it needs. This is a mistake that makes the code less secure and hard to find during code review:

1# Missing await pattern - auth check never runs
2async def delete_resource(self, request: Request, resource_id: str):
3    self.auth_request(request, admin=True)  # Missing "await!"    # Unauthorised users can perform admin actions

The function seems safe. It calls auth_request(). Without await, though, the authorisation check never runs.

4. Loss of context when copying and pasting

LLMs write code on their own. They don't get the security context of your app:

1// Wrong lifecycle hook pattern - auth runs too late
2server.post('/api/resource', {
3    onResponse: verifyAuth([AuthMode.JWT]), // Wrong hook - auth never runs before handler
4})

The model made authentication middleware in the wrong part of the lifecycle. The auth check was there; it just didn't run before the request was handled. Pattern matching tools found verifyAuth and went on.

5. Blindness to Race Conditions

LLMs don't know what concurrent execution is. AI could have made every race condition we found:

1// Check-then-act outside transaction - race window exists
2const hasApproved = approvals.some((a) => a.oderId === oderId);
3if (hasApproved) {
4    throw new BadRequestError({ message: "Already approved" });}// Transaction starts AFTER the check - both requests can pass
5const { result } = await database.transaction(async (tx) => // ...

The logic seems to be right. Check to see if it has already been approved, then begin the transaction. But both requests can pass the check at the same time before either one starts the transaction. LLMs don't take into account concurrency.

Projects with public disclosures: Cloudreve, Phase

Why Code Review Doesn't Work

Why doesn't code review find predictable flaws in code that was made by AI?

Speed of generation: When Copilot makes 100 lines in a few seconds, the way you think about reviewing changes from "check everything" to "look for clear problems." The problems that aren't obvious get through.

Reviewer fatigue: Developers who look over AI-assisted PRs say they are biased. "The AI wrote it, so it probably knows how to handle edge cases." This is the exact opposite of what you want.

Limitations of pattern matching: Just like SAST tools, human reviewers pattern-match. It looks like verifyAuth() is an auth check. It's not as scary that it's in onResponse instead of onRequest.

Quality is less important than quantity: More code gets written when it's cheap to make code. More code means more work to review. Quality goes down.

What We Found in Our Scans

We found vulnerabilities across modern development projects in our research. These projects are going quickly. They send out new features every week. And they have the density of vulnerabilities to prove it.

The pattern doesn't say, AI-generated code is bad. It says fast-moving projects need security automation that can keep up.

The Traditional Tool Failure

The truth is that traditional SAST tools weren't made to work with AI-generated code.

Pattern matching works when vulnerabilities have a syntax that is easy to guess:

1// Pattern-based detectionquery + user_input  // SAST flags this
2exec(user_input)    // SAST flags this

But AI-made weaknesses are semantic:

The auth check is there but never runs
The validation is there but not complete
The code is correct in terms of syntax, but not in terms of logic

We found 222 problems when we scanned NocoDB with Semgrep. There were 208 false positives. It completely missed the important SQL injection in the Oracle client because the flaw needed to know how data moved between different files.

That's an 87% false positive rate on the noise and a 100% miss rate on the important vulnerability.

What Semantic Analysis Does That Is Different

To find AI-generated vulnerabilities, you need to know what the code means, not just how it looks.

Cross-boundary data flow: We keep track of user input as it moves from the API endpoint to the database query, going through files, functions, and modules along the way. The SQL injection in one file that is linked to user input in another file can now be seen.

Lifecycle analysis: We know how framework lifecycle hooks work. The difference between onResponse and onRequestis more than just a string. It also changes when code runs in relation to processing requests.

Concurrency modelling: We look at which operations need to be done at the same time. Outside of a transaction, a check-then-act pattern makes a race window.

Context-aware filtering: Instead of marking everything that looks like a pattern, we check to see if the flagged code can be reached and used. This gets rid of false positives.

A Safe Workflow for AI-Assisted Development

AI-assisted coding isn't going to stop. This is how to use it safely:

1. Consider AI code to be untrusted input

The way you think about things is important. AI-generated code is a suggestion from a third party that doesn't know what your security needs are. Be careful when you look it over.

2. Automate Things That People Miss

People who review code miss bugs with async/await, lifecycle hooks, and race conditions. Automated semantic analysis always catches them. Put it in your CI/CD pipeline.

3. Check before merging, not after deploying

The least expensive vulnerability is one that is found before it ships. Do security scans on every PR. Stop merges that cause serious problems.

4. Pay attention to business logic

AI is great at boilerplate. It has trouble with business logic that is important for security. Authorisation flows, payment processing, and data validation should all be looked at more closely, even if AI wasn't involved.

5. Check the edges

Modern code often works perfectly on the happy path. Edge cases, like empty inputs, maximum values, and concurrent requests, show where the holes are. Add security edge cases to your test suites.

The Bottom Line

Modern development practices are here to stay. The benefits for productivity are real. But there are also security risks.

The weaknesses we found in 45 repositories weren't random. They followed patterns like validation shortcuts, insecure defaults, async mistakes, losing context, and race conditions. These patterns can be predicted and found.

This wasn't what traditional SAST tools were made for. They use syntax to match patterns. They don't understand code semantics, lifecycle hooks, or running code at the same time.

Semantic analysis is possible. It understands what code does, not just what it looks like. It finds the security holes that traditional tools miss.

225 vulnerabilities found. 45 repositories analyzed. 90.24% acceptance rate.

Modern code security requires tools that understand what code actually does.

Check Your Code

We found serious security holes in all of the major development projects we analyzed. Modern frameworks, popular libraries, and production codebases. All of them had problems that traditional SAST didn't find.

What is hiding in your code?

This study is based on our ongoing research into security in more than 45 open-source repositories. Please read our other post, "Why We Found 225 Vulnerabilities That SAST Missed in 45 Open Source Projects," for more information on our methods and full results.