Critical SGLang Vulnerability (CVE-2026-5760) Exposes Systems to Remote Code Execution via Malicious GGUF Model Files
A significant security flaw has been identified in SGLang, an open-source framework designed for serving large language and multimodal models. This vulnerability, cataloged as CVE-2026-5760 with a CVSS score of 9.8, poses a severe risk by potentially allowing remote code execution (RCE) on affected systems.
Understanding the Vulnerability
SGLang is renowned for its high-performance capabilities in deploying large-scale language models. Its GitHub repository boasts over 5,500 forks and 26,100 stars, reflecting its widespread adoption in the AI community.
The core of this vulnerability lies in the reranking endpoint /v1/rerank. Attackers can exploit this by crafting a malicious GPT-Generated Unified Format (GGUF) model file. This file contains a specially designed `tokenizer.chat_template` parameter embedded with a Jinja2 server-side template injection (SSTI) payload. When this malicious model is loaded into SGLang and the reranking endpoint is triggered, the payload executes arbitrary Python code on the server, leading to RCE.
Technical Breakdown of the Exploit
1. Creation of Malicious GGUF Model: An attacker designs a GGUF model file with a `tokenizer.chat_template` that includes a Jinja2 SSTI payload.
2. Incorporation of Trigger Phrase: The template is embedded with the Qwen3 reranker trigger phrase to activate the vulnerable code path located in entrypoints/openai/serving_rerank.py.
3. Deployment of the Malicious Model: The victim downloads and integrates this model into SGLang, often from reputable sources like Hugging Face.
4. Execution of the Payload: Upon receiving a request at the /v1/rerank endpoint, SGLang processes the `chat_template` using `jinja2.Environment()`. This action renders the malicious template, executing the attacker’s arbitrary Python code on the server.
Root Cause Analysis
Security researcher Stuart Beck pinpointed the vulnerability to the use of `jinja2.Environment()` without proper sandboxing. Instead of employing `ImmutableSandboxedEnvironment`, which restricts the execution of arbitrary code, the current implementation allows malicious models to execute unrestricted Python code on the inference server.
Comparative Vulnerabilities
CVE-2026-5760 shares similarities with previous vulnerabilities:
– CVE-2024-34359 (Llama Drama): A critical flaw in the `llama_cpp_python` package that permitted arbitrary code execution. This issue has since been patched.
– CVE-2025-61620: A vulnerability in vLLM that exposed systems to similar risks, which was addressed late last year.
Mitigation Strategies
To safeguard against this vulnerability, it is recommended to:
– Implement Sandboxed Environments: Replace `jinja2.Environment()` with `ImmutableSandboxedEnvironment` when rendering chat templates. This change prevents the execution of arbitrary Python code on the server.
– Monitor for Patches: As of the latest advisory, no official patch has been released. Users should stay vigilant for updates from the SGLang development team.
– Exercise Caution with Model Files: Be cautious when downloading and integrating model files from external sources. Ensure their integrity and authenticity to prevent the introduction of malicious code.
Conclusion
The discovery of CVE-2026-5760 underscores the critical importance of secure coding practices, especially when handling user-generated content in AI frameworks. Developers and users of SGLang must take immediate action to mitigate this vulnerability to protect their systems from potential exploitation.