Critical vLLM Vulnerability Exposes Systems to Remote Code Execution via Malicious Payloads
A significant security flaw has been identified in vLLM, an open-source library designed for efficient and flexible deployment of large language models. This vulnerability, present in versions 0.10.2 and later, allows attackers to execute arbitrary code remotely by exploiting the Completions API endpoint with specially crafted prompt embeddings.
Understanding the Vulnerability
The core of this issue lies in the tensor deserialization process within vLLM’s `entrypoints/renderer.py` file, specifically at line 148. When processing user-supplied prompt embeddings, the system utilizes `torch.load()` to deserialize tensors without implementing sufficient validation checks. This oversight becomes particularly problematic due to a change introduced in PyTorch 2.8.0, which disabled sparse tensor integrity checks by default. Consequently, attackers can craft tensors that bypass internal bounds checks, leading to an out-of-bounds memory write during the `to_dense()` conversion. This memory corruption can cause the vLLM server to crash and potentially enable arbitrary code execution within the server process.
Technical Details
– CVE ID: CVE-2025-62164
– Severity: High
– CVSS Score: 8.8/10
– Affected Product: vLLM (pip)
– Affected Versions: ≥ 0.10.2
This vulnerability impacts all deployments running vLLM as a server, especially those deserializing untrusted or model-provided payloads. Any user with API access can exploit this flaw to achieve denial-of-service conditions and potentially gain remote code execution capabilities. The attack requires no special privileges, making it accessible to both authenticated and unauthenticated users, depending on the API configuration.
Potential Impact
Organizations utilizing vLLM in production environments, cloud deployments, or shared infrastructure are at significant risk. Successful exploitation could compromise the entire server and adjacent systems, leading to unauthorized access, data breaches, and service disruptions. Given the widespread adoption of vLLM for deploying large language models, the potential for damage is substantial.
Mitigation Measures
The vLLM project has addressed this vulnerability in pull request #27204. Users are strongly advised to upgrade to the patched version immediately to mitigate the risk. As a temporary measure, administrators should restrict API access to trusted users only and implement input validation layers that inspect prompt embeddings before they reach the vLLM processing pipeline. These steps can help prevent malicious payloads from exploiting the vulnerability.
Discovery and Disclosure
The vulnerability was discovered and responsibly disclosed by the AXION Security Research Team. Their prompt action underscores the importance of coordinated vulnerability disclosure in maintaining the security of AI infrastructure ecosystems. Users and organizations are encouraged to stay vigilant and apply security updates promptly to protect their systems from potential exploits.
Conclusion
The discovery of this critical vulnerability in vLLM highlights the ongoing challenges in securing AI deployment frameworks. Organizations must prioritize updating their systems and implementing robust security measures to safeguard against such threats. Staying informed about vulnerabilities and applying patches promptly are essential steps in maintaining a secure operational environment.