Passing the Safety Vibe Test: The Risks of Vibe Coding

August 13, 2025

4

Introduction

At Databricks, our AI Purple Crew recurrently explores how new software program paradigms can introduce sudden safety dangers. One current development we have been monitoring intently is “vibe coding”, the informal, fast use of generative AI to scaffold code. Whereas this strategy accelerates growth, we have discovered that it may well additionally introduce refined, harmful vulnerabilities that go unnoticed till it is too late.

On this put up, we discover some real-world examples from our pink group efforts, exhibiting how vibe coding can result in severe vulnerabilities. We additionally display some methodologies for prompting practices that may assist mitigate these dangers.

Vibe Coding Gone Unsuitable: Multiplayer Gaming

In one among our preliminary experiments exploring vibe coding dangers, we tasked Claude with making a third-person snake battle area, the place customers would management the snake from an overhead digicam perspective utilizing the mouse. In step with the vibe-coding methodology, we allowed the mannequin substantial management over the venture’s structure, incrementally prompting it to generate every part. Though the ensuing software functioned as meant, this course of inadvertently launched a essential safety vulnerability that, if left unchecked, might have led to arbitrary code execution.

The Vulnerability

The community layer of the Snake recreation transmits Python objects serialized and deserialized utilizing pickle, a module identified to be susceptible to arbitrary distant code execution (RCE). In consequence, a malicious consumer or server might craft and ship payloads that execute arbitrary code on some other occasion of the sport.

The code beneath, taken immediately from Claude’s generated community code, clearly illustrates the issue: objects acquired from the community are immediately deserialized with none validation or safety checks.

Though this kind of vulnerability is basic and well-documented, the character of vibe coding makes it simple to miss potential dangers when the generated code seems to “simply work.”

Nonetheless, by prompting Claude to implement the code securely, we noticed that the mannequin proactively recognized and resolved the next safety points:

As proven within the code excerpt beneath, the difficulty was resolved by switching from pickle to JSON for knowledge serialization. A measurement restrict was additionally imposed to mitigate in opposition to denial-of-service assaults.

ChatGPT and Reminiscence Corruption: Binary File Parsing

In one other experiment, we tasked ChatGPT with producing a parser for the GGUF binary format, widely known as difficult to parse securely. GGUF recordsdata retailer mannequin weights for modules carried out in C and C++, and we particularly selected this format as Databricks has beforehand discovered a number of vulnerabilities within the official GGUF library.

ChatGPT rapidly produced a working implementation that accurately dealt with file parsing and metadata extraction, which is proven within the supply code beneath.

Nonetheless, upon nearer examination, we found vital safety flaws associated to unsafe reminiscence dealing with. The generated C/C++ code included unchecked buffer reads and situations of sort confusion, each of which might result in reminiscence corruption vulnerabilities if exploited.

On this GGUF parser, a number of reminiscence corruption vulnerabilities exist as a consequence of unchecked enter and unsafe pointer arithmetic. The first points included:

Inadequate bounds checking when studying integers or strings from the GGUF file. These might result in buffer overreads or buffer overflows if the file was truncated or maliciously crafted.
Unsafe reminiscence allocation, resembling allocating reminiscence for a metadata key utilizing an unvalidated key size with 1 added to it. This size calculation can integer overflow leading to a heap overflow.

An attacker might exploit the second of those points by crafting a GGUF file with a faux header, an especially giant or detrimental size for a key or worth subject, and arbitrary payload knowledge. For instance, a key size of 0xFFFFFFFFFFFFFFFF (the utmost unsigned 64-bit worth) might trigger an unchecked malloc() to return a small buffer, however the subsequent memcpy() would nonetheless write previous it leading to a basic heap primarily based buffer overflow. Equally, if the parser assumes a sound string or array size and reads it into reminiscence with out validating accessible area, it might leak reminiscence contents. These flaws might probably be used to attain arbitrary code execution.

To validate this concern, we tasked ChatGPT to generate a proof-of-concept that creates a malicious GGUF file and passes it into the susceptible parser. The ensuing output exhibits this system crashing contained in the memmove operate, which is executing the logic akin to the unsafe memcpy name. The crash happens when this system reaches the top of a mapped reminiscence web page and makes an attempt to write down past it into an unmapped web page, triggering a segmentation fault as a consequence of an out-of-bounds reminiscence entry.

As soon as once more we adopted up by asking ChatGPT for options on fixing the code and it was in a position to counsel the next enhancements:

We then took the up to date code and handed the proof of idea GGUF file to it and the code detected the malformed report.

Once more, the core concern wasn’t ChatGPT’s capacity to generate purposeful code, however reasonably that the informal strategy inherent to vibe coding allowed harmful assumptions to go unnoticed within the generated implementation.

Prompting as a Safety Mitigation

Whereas there isn’t a substitute for a safety skilled reviewing your code to make sure it is not susceptible, a number of sensible, low-effort methods can assist mitigate dangers throughout a vibe coding session. On this part, we describe three simple strategies that may considerably cut back the chance of producing insecure code. Every of the prompts introduced on this put up was generated utilizing ChatGPT, demonstrating that any vibe coder can simply create efficient security-oriented prompts with out in depth safety experience.

Normal Safety-Oriented System Prompts

The primary strategy entails utilizing a generic, security-focused system immediate to encourage the LLM towards safe coding behaviors from the outset. Such prompts present baseline safety steering, probably enhancing the protection of the generated code. In our experiments, we utilized the next immediate:

Language or Utility-Particular Prompts

When the programming language or software context is understood prematurely, one other efficient technique is to offer the LLM with a tailor-made, language-specific or application-specific safety immediate. This methodology immediately targets identified vulnerabilities or frequent pitfalls related to the duty at hand. Notably, it isn’t even crucial to concentrate on these vulnerability courses explicitly, as an LLM itself can generate appropriate system prompts. In our experiments, we instructed ChatGPT to generate language-specific prompts utilizing the next request:

Self-Reflection for Safety Assessment

The third methodology incorporates a self-reflective assessment step instantly after code era. Initially, no particular system immediate is used, however as soon as the LLM produces a code part, the output is fed again into the mannequin to explicitly establish and tackle safety vulnerabilities. This strategy leverages the mannequin’s inherent capabilities to detect and proper safety points that will have been initially missed. In our experiments, we offered the unique code output as a consumer immediate and guided the safety assessment course of utilizing the next system immediate:

Empirical Outcomes: Evaluating Mannequin Conduct on Safety Duties

To quantitatively consider the effectiveness of every prompting strategy, we performed experiments utilizing the Safe Coding Benchmark from PurpleLlama’s Cybersecurity Benchmark’s testing suite. This benchmark consists of two forms of assessments designed to measure an LLM’s tendency to generate insecure code in situations immediately related to vibe coding workflows:

Instruct Assessments: Fashions generate code primarily based on specific directions.
Autocomplete Assessments: Fashions predict subsequent code given a previous context.

Testing each situations is especially helpful since, throughout a typical vibe coding session, builders typically first instruct the mannequin to supply code after which subsequently paste this code again into the mannequin to deal with points, intently mirroring instruct and autocomplete situations respectively. We evaluated two fashions, Claude 3.7 Sonnet and GPT 4o, throughout all programming languages included within the Safe Coding Benchmark. The next plots illustrate the proportion change in susceptible code era charges for every of the three prompting methods in comparison with the baseline situation with no system immediate. Damaging values point out an enchancment, which means the prompting technique decreased the speed of insecure code era.

Claude 3.7 Sonnet Outcomes

When producing code with Claude 3.7 Sonnet, all three prompting methods offered enhancements, though their effectiveness diverse considerably:

Self Reflection was the best technique general. It decreased insecure code era charges by a mean of 48% within the instruct situation and 50% within the autocomplete situation. In frequent programming languages resembling Java, Python, and C++, this technique notably decreased vulnerability charges by roughly 60% to 80%.
Language-Particular System Prompts additionally resulted in significant enhancements, decreasing insecure code era by 37% and 24%, on common, within the two analysis settings. In almost all circumstances, these prompts have been more practical than the generic safety system immediate.
Generic Safety System Prompts offered modest enhancements of 16% and eight%, on common. Nonetheless, given the better effectiveness of the opposite two approaches, this methodology would usually not be the beneficial alternative.

Though the Self Reflection technique yielded the biggest reductions in vulnerabilities, it may well generally be difficult to have an LLM assessment every particular person part it generates. In such circumstances, leveraging Language-Particular System Prompts might provide a extra sensible various.

GPT 4o Outcomes

Self Reflection was once more the best technique general, decreasing insecure code era by a mean of 30% within the instruct situation and 51% within the autocomplete situation.
Language-Particular System Prompts have been additionally extremely efficient, decreasing insecure code era by roughly 24%, on common, throughout each situations. Notably, this technique sometimes outperformed self reflection within the instruct assessments with GPT 4o.
Generic Safety System Prompts carried out higher with GPT 4o than with Claude 3.7 Sonnet, decreasing insecure code era by a mean of 13% and 19% within the instruct and autocomplete situations respectively.

General, these outcomes clearly display that focused prompting is a sensible and efficient strategy for enhancing safety outcomes when producing code with LLMs. Though prompting alone isn’t a whole safety resolution, it offers significant reductions in code vulnerabilities and might simply be custom-made or expanded in response to particular use circumstances.

Impression of Safety Methods on Code Technology

To higher perceive the sensible trade-offs of making use of these security-focused prompting methods, we evaluated their influence on the LLMs’ common code-generation talents. For this goal, we utilized the HumanEval benchmark, a widely known analysis framework designed to evaluate an LLM’s functionality to supply purposeful Python code within the autocomplete context.

Mannequin	Generic System Immediate	Python System Immediate	Self Reflection
Claude 3.7 Sonnet	0%	+1.9%	+1.3%
GPT 4o	-2.0%	0%	-5.4%

The desk above exhibits the proportion change in HumanEval success charges for every safety prompting technique in comparison with the baseline (no system immediate). For Claude 3.7 Sonnet, all three mitigations both matched or barely improved baseline efficiency. For GPT 4o, safety prompts reasonably decreased efficiency, aside from the Python-specific immediate, which matched baseline outcomes. Nonetheless, given these comparatively small variations in comparison with the substantial discount in susceptible code era, adopting these prompting methods stays sensible and useful.

The Rise of Agentic Coding Assistants

A rising variety of builders are transferring past conventional IDEs and into new, AI-powered environments that supply deeply built-in agentic help. Instruments like Cursor, Cline, and Claude-Code are a part of this rising wave. They transcend autocomplete by integrating linters, take a look at runners, documentation parsers, and even runtime evaluation instruments, all orchestrated by LLMs that act extra like brokers than static copilot fashions.

These assistants are designed to purpose about your complete codebase, make clever options, and repair errors in actual time. In precept, this interconnected toolchain ought to enhance code correctness and safety. In observe, nevertheless, our pink group testing exhibits that safety vulnerabilities nonetheless persist, particularly when these assistants generate or refactor advanced logic, deal with enter/output routines, or interface with exterior APIs.

We evaluated Cursor in a security-focused take a look at much like our earlier evaluation. Ranging from scratch, we prompted Claude 4 Sonnet with: “Write me a fundamental parser for the GGUF format in C, with the flexibility to load or write a file from reminiscence.” Cursor autonomously browsed the net to assemble particulars concerning the format, then generated a whole library that dealt with GGUF file I/O as requested. The outcome was considerably extra strong and complete than code produced with out the agentic movement. Nonetheless, throughout a assessment of the code’s safety posture, a number of vulnerabilities have been recognized, together with the one current within the read_str() operate proven beneath.

Right here, the str->n attribute is populated immediately from the GGUF buffer and used, with out validation, to allocate a heap buffer. An attacker might provide a maximum-size worth for this subject which, when incremented by one, wraps round to zero as a consequence of integer overflow. This causes malloc() to succeed, returning a minimal allocation (relying on the allocator’s habits), which is then overrun by the next memcpy() operation, resulting in a basic heap-based buffer overflow.

Mitigations

Importantly, the identical mitigations we explored earlier on this put up: security-focused prompting, self-reflection loops, and application-specific steering, proved efficient at decreasing susceptible code era even in these environments. Whether or not you are vibe coding in a standalone mannequin or utilizing a full agentic IDE, intentional prompting and post-generation assessment stay crucial for securing the output.

Self Reflection

Testing self-reflection throughout the Cursor IDE was simple: we merely pasted our earlier self-reflection immediate immediately into the chat window.

This triggered the agent to course of the code tree and seek for vulnerabilities earlier than iterating and remediating the recognized vulnerabilities. The diff beneath exhibits the end result of this course of in relation to the vulnerability we mentioned earlier.

Leveraging .cursorrules for Safe-By-Default Technology

One in every of Cursor’s extra highly effective however lesser-known options is its help for a .cursorrules file throughout the supply tree. This configuration file permits builders to outline customized steering or behavioral constraints for the coding assistant, together with language-specific prompts that affect how code is generated or refactored.

To check the influence of this characteristic on safety outcomes, we created a .cursorrules file containing a C-specific safe coding immediate, as per our earlier work above. This immediate emphasised protected reminiscence dealing with, bounds checking, and validation of untrusted enter.

After inserting the file within the root of the venture and prompting Cursor to regenerate the GGUF parser from scratch, we discovered that lots of the vulnerabilities current within the unique model have been proactively prevented. Particularly, beforehand unchecked values like str->n have been now validated earlier than use, buffer allocations have been size-checked, and the usage of unsafe capabilities was changed with safer options.

For comparability, right here is the operate that was generated to learn string varieties from the file.

This experiment highlights an vital level: by codifying safe coding expectations immediately into the event setting, instruments like Cursor can generate safer code by default, decreasing the necessity for reactive assessment. It additionally reinforces the broader lesson of this put up that intentional prompting and structured guardrails are efficient mitigations even in additional subtle agentic workflows.

Apparently, nevertheless, when operating the self-reflection take a look at described above on the code tree generated on this method, Cursor was nonetheless in a position to detect and remediate some susceptible code that had been missed throughout era.

Integration of Safety Instruments (semgrep-mcp)

Many agentic coding environments now help the mixing of exterior instruments to boost the event and assessment course of. One of the vital versatile strategies for doing that is by the Mannequin Context Protocol (MCP), an open customary launched by Anthropic that permits LLMs to interface with structured instruments and companies throughout a coding session.

To discover this, we ran a neighborhood occasion of the Semgrep MCP server and related it on to Cursor. This integration allowed the LLM to invoke static evaluation checks on newly generated code in actual time, surfacing safety points resembling the usage of unsafe capabilities, unchecked enter, and insecure deserialization patterns.

To perform this, we ran the server domestically with the command: `uv run mcp run server.py -t sse` after which added the next json to the file ~/.cursor/mcp.json:

Lastly, we created a .customrules file throughout the venture containing the immediate: “Carry out a safety scan of all generated code utilizing the semgrep device”. After this we used the unique immediate for producing the GGUF library, and as will be seen within the screenshot beneath, Cursor routinely invokes the device when wanted.

The outcomes have been encouraging. Semgrep efficiently flagged a number of of the vulnerabilities in earlier iterations of our GGUF parser. Nonetheless, what stood out was that even after the semgrep automated assessment, making use of self-reflection prompting nonetheless uncovered extra points that had not been flagged by static evaluation alone. These included edge circumstances involving integer overflows and refined misuses of pointer arithmetic, that are bugs that required deeper semantic understanding of the code and context.

This dual-layer strategy, combining automated scanning with structured LLM-based reflection, proved particularly highly effective. It highlights that whereas built-in instruments like Semgrep increase the baseline for safety throughout code era, agentic prompting methods stay important for catching the total spectrum of vulnerabilities, particularly people who contain logic, state assumptions, or nuanced reminiscence habits.

Conclusion: Vibes Aren’t Sufficient

Vibe coding is interesting. It is quick, satisfying, and infrequently surprisingly efficient. Nonetheless, relating to safety, relying solely on instinct or informal prompting is not adequate. As we transfer towards a future the place AI-driven coding turns into commonplace, builders should be taught to immediate with intention, particularly when constructing techniques which might be networked, unmanaged code, or extremely privileged code.

At Databricks, we’re optimistic concerning the energy of generative AI – however we’re additionally lifelike concerning the dangers. By code assessment, testing, and safe immediate engineering, we’re constructing processes that make vibe coding safer for our groups and our clients. We encourage the trade to undertake comparable practices to make sure that pace doesn’t come at the price of safety.

To be taught extra about different finest practices from the Databricks Purple Crew, see our blogs on how one can securely deploy third-party AI fashions and GGML GGUF File Format Vulnerabilities.

Previous articleEmpowering Folks, Enabling Progress: AI and the Workforce in Africa

Next articleiOS 26: Hold iPhone Digicam Lens Clear With This Sensible Setting

Passing the Safety Vibe Test: The Risks of Vibe Coding

Introduction

Vibe Coding Gone Unsuitable: Multiplayer Gaming

The Vulnerability

ChatGPT and Reminiscence Corruption: Binary File Parsing

Prompting as a Safety Mitigation

Normal Safety-Oriented System Prompts

Language or Utility-Particular Prompts

Self-Reflection for Safety Assessment

Empirical Outcomes: Evaluating Mannequin Conduct on Safety Duties

Claude 3.7 Sonnet Outcomes

GPT 4o Outcomes

Impression of Safety Methods on Code Technology

The Rise of Agentic Coding Assistants

Mitigations

Self Reflection

Leveraging .cursorrules for Safe-By-Default Technology

Integration of Safety Instruments (semgrep-mcp)

Conclusion: Vibes Aren’t Sufficient

Related Articles

Full Record of Voice, Knowledge, 5G and OTT Recharge Packs

Africa Engineering {Hardware}: Remodeling Schooling

Utilizing 3D Printing to Allow Fast Iterations for Boardgame Design

LEAVE A REPLY Cancel reply

Latest Articles

Full Record of Voice, Knowledge, 5G and OTT Recharge Packs

Africa Engineering {Hardware}: Remodeling Schooling

Utilizing 3D Printing to Allow Fast Iterations for Boardgame Design

Anker Presents MacRumors Readers 20% Off Assortment of Chargers, Hubs, Batteries, and Extra

Databricks Now Price $100B. Will It Attain $1T?

About Us

Passing the Safety Vibe Test: The Risks of Vibe Coding

Introduction

Vibe Coding Gone Unsuitable: Multiplayer Gaming

The Vulnerability

ChatGPT and Reminiscence Corruption: Binary File Parsing

Prompting as a Safety Mitigation

Normal Safety-Oriented System Prompts

Language or Utility-Particular Prompts

Self-Reflection for Safety Assessment

Empirical Outcomes: Evaluating Mannequin Conduct on Safety Duties

Claude 3.7 Sonnet Outcomes

GPT 4o Outcomes

Impression of Safety Methods on Code Technology

The Rise of Agentic Coding Assistants

Mitigations

Self Reflection

Leveraging .cursorrules for Safe-By-Default Technology

Integration of Safety Instruments (semgrep-mcp)

Conclusion: Vibes Aren’t Sufficient

Related Articles

LEAVE A REPLY Cancel reply

Stay Connected

Latest Articles

About Us