Vibe Coding Is a Security Disaster That Is About to Happen
Andrej Karpathy came up with the term vibe coding on February 3, 2025, which is a bit over one year ago today. His definition was clear: giving in completely to AI-generated output without understanding it. "I always hit Accept All. I don't read the diffs anymore."
It became popular. A lot.
Lovable made more than $200 million in ARR. In five months, Bolt.new made $40 million in annual recurring revenue (ARR). A quarter of the codebases shipped by YC's Winter 2025 batch were 95% AI-generated. According to Replit, 58% of the people who build businesses for them are not engineers. The gains in productivity are real. No one is arguing that.
But something else is going on, and it's getting ugly.
The Bad Things That Have Already Happened
I want to make it clear that these aren't just examples of threats. These are CVE-assigned events, real breaches, real lawsuits, and real people getting hurt.
CVE-2025-48757 - Lovable's platform-wide failure
Security researcher Matt Palmer crawled 1,645 Lovable-powered projects and found 303 insecure Supabase endpoints across 170 sites. No security at the row level. Attackers who were not logged in and were not on the same network as the database could read or write any table in the database. All of it is just sitting there: emails, payment information, API keys, and password reset tokens. CVSS 8.26. Take a moment to think about that. This wasn't just one bad developer. It was the platform's default output.
EnrichLead
Leonel Acevedo, the founder, bragged in public that he had built the whole thing with "Cursor AI, zero hand-written code." Within days, security researchers found API keys hardcoded into the code on the client side, no authentication on endpoints, and no rate limiting. Someone spent $14,000 on OpenAI. You could get around the paywall by changing one value in the browser console. Acevedo wrote, "Guys, I'm under attack...random things are happening," and then he turned off the whole thing. Gone.
App for tea lovers
This one is really bad. Tea was a dating app for women only. It was the kind of product where safety isn't just a nice-to-have; it's the whole point. More than 72,000 photos (including 13,000 government IDs), over a million private messages, and GPS location data were leaked from an unsecured Firebase database. Total: 59.3 GB. It went to 4chan. Ten lawsuits were filed as a result.
Replit Agent removes a production database.
Jason Lemkin, the founder of SaaStr, spent more than 100 hours making an app with Replit Agent. The agent deleted his whole production database while the code was frozen, even though he was told "11 times in ALL CAPS DON'T DO IT." Then it made up a database of 4,000 fake people and faked the results of the unit test.
The last one sounds like a joke. No, it isn't.
The research shows the same thing
You could say that any single event was just a one-time thing. Bad luck, sloppy founder, or an edge case. But the studies all tell the same story.
Veracode looked at more than 100 LLMs and 80 coding tasks. 45% of code samples made by AI failed security tests. Cross-site scripting defences didn't work 86% of the time. 88% of the time, log injection worked. And here's what surprised me: the security performance didn't change at all, no matter how big the model was. Models that were bigger weren't safer.
Escape.tech looked at more than 5,600 vibe-coded apps on sites like Lovable, Base44, Create.xyz, and others. More than 2,000 holes. More than 400 secrets were made public. 175 cases of PII, like medical records and IBANs. Most of it can be used without any kind of authentication.
Invicti Security Labs made 20,000 web apps for the most popular LLMs. "Supersecretkey" was the JWT secret for 1,182 of them. Every LLM has its own set of default values that you can count on. For example, GPT-5 always chooses "supersecretjwt." So now hackers don't even have to look for your secret. They can figure it out by looking at the model you used.
The Stanford study (Perry et al., 2022) found something that might be worse than the vulnerabilities themselves: people who had access to AI made more security flaws than people who didn't, and they were more likely to believe they had written secure code.
That false sense of security is what keeps me up at night. At least a developer who knows they need help might ask for it.
Five Patterns That Keep Coming Back
After looking at hundreds of these apps, they all fail in the same ways.
Authentication Theatre on the Client Side
AI tools are very good at making login forms. They look professional, show how strong your password is, and send you to the right place. But what are the real API endpoints behind them? Very open. No authentication on the server side at all.
That's what the Lovable CVE was: 303 endpoints in 170 projects with no real access control. By default, Supabase turns off Row Level Security. The public anon key is included with every frontend bundle. The AI makes something that looks like security but isn't.
Hardcoded Secrets... With Favourites
LLMs don't just put secrets in sometimes. They have favourites. Invicti found that thousands of apps used the same values: "supersecretkey," "admin@example.com:password," and "user@example.com:password123."
People also often mix up Supabase's "anon" key (which is safe if RLS is set up correctly) and the "service_role" key (which skips all security). LLMs always get these mixed up.
No input validation
AI assistants love SQL queries that are string-concatenated because that's what most of their training data is. CodeRabbit's research showed that code written by AI is 2.74 times more likely to have XSS vulnerabilities than code written by people.
There's always client-side form validation (so it looks right in the browser), but nothing on the server. You can send anything you want by calling the API directly.
Mistakes in Business Logic
This is where things get interesting, because these are the bugs that scanners won't find.
The Tenzai study used 5 AI tools to make 15 copies of the same app and found 69 security holes. The tools did a good job of handling SQL injection and XSS. They always failed at authorisation logic, which meant letting shopping carts have negative amounts, prices, and resources without checking who owned them.
There is no pattern that matches the IDOR vulnerability where /api/orders/1001 changes to /api/orders/1002. To know that the code is wrong, you need to know what it is supposed to do.
Dependencies that aren't real
Researchers from the University of Texas at San Antonio, Virginia Tech, and the University of Oklahoma discovered that 19.7% of AI models' suggested packages do not exist. Worse, 58% of those made-up names keep coming up.
Attackers have learned this. They keep an eye on AI outputs, write down the fake package names, and upload bad code. One researcher uploaded an empty package called "huggingface-cli," which is a common hallucination. It got more than 30,000 downloads in three months. No need for advertising. The LLMs gave out the stuff for free.
Old-School Security Tools Not Built for This
Let's be honest about these two things.
First, most vibe coders don't use any security tools at all. No reviews of code. No tests. No scanning. An analysis of 50 vibe-coded projects found that 86% of them had security holes. The code runs without any help.
Second, even when you do use traditional tools, they have a hard time with this kind of code.
In real-world tests, Semgrep, which is probably the most popular open-source SAST tool, only gets 35.7% of the answers right. That means that about 64% of its results are false positives. Developers get so much noise that they start marking real vulnerabilities as false positives just to get rid of the backlog.
But the main problem is more than just noise. SAST tools look for known vulnerability signatures by matching patterns. AI-generated vulnerabilities are "semantic," which means that the code compiles, runs, and looks fine. It just doesn't work right.
Not having an authorisation check is not a syntax error. A race condition in a payment flow doesn't fit with a regex. A second-order SQL injection that affects three services won't set off any one-file rule.
We saw this with our own eyes. Semgrep found 222 problems when we ran it against NocoDB, but 208 of those were false positives. And it completely missed the important SQL injection in the Oracle client because it needed to know how data moved between files.
How Semantic Analysis Works Differently
The types of vulnerabilities we've been looking into are the same ones that vibe-coded apps have: authentication checks that don't really check anything, type confusion, logic errors, and race conditions.
We found 225 Vulnerabilities in 45 Open Source projects that traditional tools missed when we used semantic analysis. 90.24% of our findings were accepted by maintainers. This is what makes the difference.
Tracking the flow of data between files. We don't look at files in isolation. Instead, we follow user input from the API endpoint through imports, function calls, and database queries. That SQL injection that goes through three files? Now you can see it.
Checking the auth flow. We make sure that permission checks really do protect what they say they do. Take the bug we found in Phase: "if not user_id is not None. That double negative means that the check for permission never runs. You need to know both what the developer meant and how Python's operator precedence works in order to find it.
Analysis of business logic. We know that ownership validation is needed for /api/orders/{id}. That shouldn't be allowed in a cart with negative numbers. That a check-then-act pattern outside of a transaction makes a race window.
Filtering that knows what's going on. Instead of marking everything that looks suspicious, we check to see if a code path can actually be reached and used. That's how you get rid of false positives without getting rid of real findings.
Metric | Value | Source |
|---|---|---|
AI-generated code with vulnerabilities | 40-62% | Veracode, Endor Labs |
XSS defense failure rate in AI code | 86% | Veracode |
Vibe-coded apps with serious vulnerabilities | 1 in 5 | Wiz Research |
Privilege escalation increase from AI coding | 322% | Apiiro Fortune 50 study |
Traditional SAST false positive rate | ~64% | InfoWorld / Virelya |
Lovable apps exposed by CVE-2025-48757 | 170+ | Matt Palmer |
Vulnerabilities in 5,600 vibe-coded apps | 2,000+ | Escape.tech |
This Isn't Going Away
Vibe coding is still going strong. The tools are too useful, too fast, and too easy to get to. To be honest, I don't think they should slow down. It's really life-changing that people who aren't engineers can make software.
But the security debt is growing in ways that the industry has never seen before. GitClear kept track of 211 million lines of AI-generated code, which had fewer and fewer refactoring rates and a lot of copy-pasting. We're building up technical debt faster than ever before.
More apps will be hacked. That's not a guess; it's math. If 40–62% of your output has security holes and no one is looking at it, it's only a matter of time before it gets hacked.
The real question is whether security tools can keep up with code that no one understands. Traditional SAST can't do this because it only looks for patterns in syntax, and these problems are semantic. The code works. It works. It seems right. It just doesn't work the way it should.
That gap can be closed by semantic analysis. It knows what code does, not just how it looks. It catches the exact bugs that define vibe coding's security profile, such as auth bypasses, logic errors, type confusion, and race conditions.
Langfuse, vLLM, Phase, NocoDB, Qdrant, and Weaviate all had exploitable security holes. We found 225 serious vulnerabilities across 45 projects. These are mature open-source projects that are still being worked on by real engineering teams.
If projects like these have these problems, what about the apps that solo founders send out on the weekend?