Critical Vulnerability in Meta Llama-Stack Threatens AI Systems

I. Targeted Entities

Organizations, researchers, and developers leveraging Meta’s Llama-Stack for AI model inference and deployment.

II. Introduction

A critical security vulnerability, CVE-2024-50050, has been identified in Meta’s Llama-Stack framework, which is widely used for developing and deploying generative AI applications. This flaw allows attackers to achieve remote code execution (RCE) by exploiting unsafe deserialization of untrusted data via the pyzmq library (ZeroMQ python implementation). Specifically, the vulnerability arises from the use of the recv_pyobj method, which automatically deserializes Python objects using “pickle”, a method known for its security risks when handling untrusted inputs.

If exploited, this vulnerability could compromise AI inference servers, leading to data breaches, resource hijacking, unauthorized model manipulation, or full system compromise. Meta has assigned the flaw a CVSS score of 6.3 (medium), while Snyk and Oligo Security have categorized it as critical, assigning it scores of 9.3 and 9.8, respectively.

This advisory provides details on the vulnerability and remediation steps to mitigate the risk.

III. Additional Background Information

Llama-Stack is an open-source framework developed by Meta to streamline the development, deployment, and optimization of generative AI (GenAI) applications. It is primarily designed to support Meta’s Llama family of models, offering a comprehensive set of tools and APIs for the entire AI development lifecycle, including:

Model training and inference
Memory management
Evaluation and optimization

The framework is intended to accelerate innovation in the AI space by providing a standardized foundation for developers and enterprises working on Llama-based AI solutions. Since its introduction in July 2024, Llama-Stack has been backed by major AI ecosystem partners such as AWS, NVIDIA, Groq, Ollama, Together AI, and Dell.

However, the discovery of CVE-2024-50050 has revealed a critical security flaw in Llama-Stack’s default inference implementation, raising concerns about the security of AI frameworks that handle sensitive model deployments.

Technical Breakdown of the Vulnerability:

Insecure Deserialization:

The run_inference method in llama-stack uses recv_pyobj to receive serialized Python objects over a ZeroMQ socket.
recv_pyobj automatically deserializes the received data using Python’s pickle.loads method.
The pickle module is inherently insecure when processing untrusted data, as it can execute arbitrary code during deserialization.

Exploitation Scenario:

If the ZeroMQ socket is exposed over the network, an attacker can send a maliciously crafted serialized object to the socket. When recv_pyobj unpickles the object using pickle.loads, the attacker’s payload is executed, leading to arbitrary code execution on the host.

Code Analysis:

The recv_pyobj method in pyzmq is defined as follows:

def recv_pyobj(self, flags: int = 0) -> Any:
msg = self.recv(flags)
return self._deserialize(msg, pickle.loads)

This method:

Receives pickled data from the socket.
Passes the data to _deserialize along with pickle.loads for deserialization.
Deserialize executes pickle.loads, which deserializes the data without validation.

Unsafe Design:

The use of pickle.loads in recv_pyobj is unsafe by design, as it deserializes data from unverified sources.

The maintainer of pyzmq has acknowledged that recv_pyobj should only be used with trusted sources, similar to pickle itself.

Impact

Severity: Critical

Consequences:

An attacker could craft a malicious serialized object using pickle and send it to the exposed ZeroMQ socket.
This can lead to full system compromise, data exfiltration, or further lateral movement within the network.

Vulnerability discovery, disclosure and patching

The vulnerability in llama-stack was discovered by Oligo, which leverages its advanced runtime detection capabilities to identify threats that traditional Software Composition Analysis (SCA) tools often miss. Oligo’s Application Detection and Response (ADR) platform maintains an extensive database of runtime profiles for third-party libraries, enabling it to detect unusual behavior indicative of exploitation. In the case of llama-stack, Oligo’s prebuilt profiles flagged the use of pickle for deserialization as anomalous, as no legitimate instances of code execution within the pickle processing flow had ever been recorded. This triggered an automatic incident report in the Oligo ADR platform, highlighting the potential for remote code execution (RCE) even though no CVE for llama-stack existed at the time. The attack graph and evidence, including Python call stack deviations captured via eBPF, were documented in the Oligo platform, confirming the exploit.

Oligo followed a responsible disclosure process to report the vulnerability to Meta, the maintainers of llama-stack. Meta’s security team responded promptly, providing clear guidelines for disclosure through a GitHub issue. The vulnerability was assigned CVE-2024-50050 with a CVSS score of 9.3, reflecting its critical severity. Meta acknowledged the issue and worked collaboratively with Oligo to address it.

Meta released a patch in version 0.0.41 of llama-stack (llama-stack>=0.0.41), which replaced the insecure pickle serialization implementation with a type-safe Pydantic JSON implementation across the API. This change eliminated the risk of arbitrary code execution by ensuring safe deserialization of data. Additionally, pyzmq issued a fix and added a clear warning in its documentation about the risks of using recv_pyobj with untrusted data, emphasizing that it should only be used with trusted sources. The patch and warning can be found in the following commit: pyzmq commit f4e9f17.

Responsible Disclosure Timeline

29 Sep, 2024: Oligo reported the vulnerability to Meta.

30 Sep, 2024: Meta performed an initial evaluation of the report.

1 Oct, 2024: Meta confirmed that their teams were working on a fix.

10 Oct, 2024: Meta released the fix on GitHub and published version 0.0.41 to PyPi.

24 Oct, 2024: Meta issued CVE-2024-50050 to formally document the vulnerability.

This coordinated effort between Oligo and Meta ensured the timely identification, disclosure, and patching of the vulnerability, mitigating the risk of exploitation for users of llama-stack.

IV. MITRE ATT&CK

T1059.007 – Command and Scripting Interpreter: Python
- The vulnerability allows attackers to execute arbitrary Python code via insecure deserialization using the pickle module.
T1190 – Exploit Public-Facing Application
- Attackers can exploit the exposed ZeroMQ socket to send malicious payloads and gain initial access to the system.
T1068 – Exploitation for Privilege Escalation
- Successful exploitation could allow attackers to execute code with the privileges of the llama-stack process, potentially escalating privileges.
T1531 – Account Access Removal
- Attackers could disrupt operations by deleting or locking user accounts, causing denial of service.

V. Recommendations

Upgrade to Llama-Stack 0.0.41 or Later
Organizations should immediately upgrade to Llama-Stack version 0.0.41 or later, as this update replaces the insecure pickle-based deserialization with a safer Pydantic JSON implementation. This eliminates the risk of arbitrary code execution by ensuring that only validated and structured data is processed. Additionally, ensure that all instances of pyzmq are updated to the latest version, as it now includes security advisories on using recv_pyobj with untrusted sources. Keeping software dependencies up to date is crucial to prevent attackers from exploiting known vulnerabilities.

Restrict Network Exposure
ZeroMQ sockets should never be exposed to the internet or untrusted networks, as this dramatically increases the risk of exploitation. Organizations should apply firewall rules and access control lists (ACLs) to restrict access to inference servers, ensuring that only authorized systems and users can interact with them. Additionally, using VPNs, network segmentation, and private subnets can provide an added layer of security, further reducing the risk of unauthorized access.

Implement Secure Serialization Practices
The use of unsafe deserialization methods like pickle.loads should be strictly prohibited, especially when handling untrusted data. Instead, organizations should adopt secure serialization formats such as JSON with Pydantic, which enforces strict type validation and eliminates the possibility of arbitrary code execution. Developers should also follow best practices by validating all incoming serialized data and ensuring that no dynamic code execution is allowed during deserialization.

VI. IOCs (Indicators of Compromise)

Displayed is the code vulnerable method in llama stack (Derived from Oligo Blog Security)

Displayed is the RCE code used to deserialize and unpickle the code, making said code no longer secure (Derived from Oligo Blog Security)

VII. Additional OSINT Information

To detect this vulnerability, having real time detection is essential for identifying and getting rid of the risk. Maintaing an extensive and constantly backed up database of profiles for third party libraries.

Patch 0.0.41 calls attention to this, it replaces the pickled serialization implementation with Pydantic JSON implementation across the API.

VIII. References

Oligo Security. (January 23, 2025). CVE-2024-50050: Critical Vulnerability in meta llama/llama-stack by Meta. https://www.oligo.security/blog/cve-2024-50050-critical-vulnerability-in-meta-llama-llama-stack

The Hacker News. (Jan 26, 2025). Meta’s Llama Framework Flaw Exposes AI Systems to Remote Code Execution Risks. https://thehackernews.com/2025/01/metas-llama-framework-flaw-exposes-ai.html

SC Media. (January 27, 2025). Severe Meta Llama issue risks RCE in AI systems. https://www.scworld.com/brief/severe-meta-llama-issue-risks-rce-in-ai-systems

Threat Advisory created by The Cyber Florida Security Operations Center.

Contributing Security Analysts: Thiago Reis Pagliaroni, Nahyan Jamil

To learn more about Cyber Florida visit: www.cyberflorida.org