Article Title: Unveiling the Sora 2 Vulnerability: Audio Transcripts Expose AI System Prompts
In a recent revelation, OpenAI’s advanced video generation model, Sora 2, has been found susceptible to a vulnerability that allows the extraction of its concealed system prompts through audio transcripts. This discovery raises significant concerns about the security of multimodal AI systems and their susceptibility to prompt leakage.
The Discovery
On November 12, 2025, AI security firm Mindgard published a detailed analysis highlighting how creative prompting across various modalities—text, images, video, and audio—can circumvent safeguards designed to keep internal instructions confidential. The research underscores the challenges in protecting AI models from prompt leakage, even as companies invest heavily in red-teaming and alignment training.
Chaining Modalities to Uncover Hidden Instructions
Mindgard’s team, led by Aaron Portnoy, initiated experiments with Sora 2 on November 3, 2025. Their objective was to explore how semantic drift in multimodal transformations could expose the model’s foundational rules. Traditional text-to-text extraction methods often rely on linguistic tricks like role-playing or repeating preceding context to coax large language models (LLMs) into revealing prompts. However, Sora 2’s video capabilities introduced new vectors for exploration.
Attempts to render text as still images or video frames frequently failed due to glyph distortions and frame inconsistencies. Legible text in one frame often devolved into unreadable approximations in the next. Encoded formats like QR codes or barcodes proved equally unreliable, producing visually plausible but indecipherable gibberish because the model prioritizes pixel realism over precise data encoding.
The Audio Breakthrough
The breakthrough came with audio. By prompting Sora 2 to generate speech in short, 15-second clips—often sped up to fit more content—researchers were able to transcribe outputs with high fidelity. Stitching these fragments together resulted in a near-complete system prompt. This stepwise approach outperformed visual methods, as audio avoids the noise of image generation and naturally sequences information.
Revealed System Prompts
The recovered prompt reveals several rules:
– Generating metadata first.
– Avoiding copyrighted characters unless explicitly requested.
– Prohibiting sexually suggestive content without precise user direction.
– Mandating fixed video parameters, such as a 15-second length and a 1.78 aspect ratio.
These instructions enforce behavioral guardrails within the model.
Implications of System Prompt Exposure
System prompts, while not always containing sensitive data, define model safety boundaries. If leaked, they can enable follow-up attacks, such as crafting prompts to evade guardrails. Mindgard argues that these instructions should be treated as configuration secrets, akin to firewall rules, rather than harmless metadata.
The vulnerability exploits inherent weaknesses in multimodal models, where transformations compound errors, creating lost in translation effects that amplify leakage risks. OpenAI’s extensive training resists direct attacks, but variations in framing indirect requests or cross-modal prompts still succeed. This is evident in adversarial examples like asking for step-by-step refusal logic without quoting the prompt verbatim.
Recommendations for Users and Developers
For users and developers, this discovery underscores the need for:
– Robust testing of audio and video outputs.
– Implementing length limits on generations.
– Treating prompts as proprietary information.
While Sora 2’s prompt itself poses low immediate risk, the technique could apply to more sensitive targets, potentially exposing tools or agent integrations.
OpenAI’s Response
OpenAI acknowledged the issue after Mindgard’s disclosure, noting general awareness of prompt extraction vulnerabilities. The company emphasized ongoing efforts to enhance the security of their AI models and mitigate such risks.
Conclusion
The Sora 2 vulnerability serves as a stark reminder of the complexities involved in securing multimodal AI systems. As AI continues to evolve and integrate into various sectors, ensuring the confidentiality and integrity of system prompts becomes paramount. Continuous research, robust testing, and proactive security measures are essential to safeguard against potential exploits and maintain user trust in AI technologies.