Critical Vulnerability in Apache Parquet Java Allows Arbitrary Code Execution

A significant security vulnerability has been identified in Apache Parquet Java, a widely used columnar storage format within big data ecosystems. This flaw, designated as CVE-2025-46762, affects all versions up to and including 1.15.1, potentially enabling attackers to execute arbitrary code through specially crafted Parquet files.

Understanding Apache Parquet and Its Significance

Apache Parquet is an open-source, column-oriented data storage format designed for efficient data storage and retrieval. It is integral to various data processing frameworks such as Apache Hadoop, Spark, and Flink. Its widespread adoption underscores the critical nature of this vulnerability, as it could impact numerous data analytics infrastructures globally.

Details of the Vulnerability

The core of this security issue lies within the parquet-avro module, responsible for processing Avro schemas embedded in Parquet file metadata. In March 2025, Apache Parquet version 1.15.1 introduced a fix intended to restrict untrusted packages. However, security researchers discovered that the default configuration of trusted packages remained overly permissive, still allowing the execution of malicious classes from these packages.

According to the advisory released by the Apache Software Foundation, Schema parsing in the parquet-avro module of Apache Parquet 1.15.0 and previous versions allows bad actors to execute arbitrary code. This indicates that the vulnerability persists despite previous mitigation efforts.

Exploitation Mechanism

The exploit specifically targets applications utilizing the specific or reflect models for reading Parquet files. In contrast, the generic model remains unaffected. This distinction is crucial for developers and system administrators to understand, as it influences the risk assessment and mitigation strategies for their applications.

Applications using Apache Parquet Java’s parquet-avro module to deserialize data from Parquet files are at risk of remote code execution if they process untrusted files. The vulnerability arises from how Avro schemas are handled during deserialization, potentially allowing attackers to inject malicious code that gets executed during schema parsing.

Historical Context and Reporting

This vulnerability follows a similar deserialization flaw (CVE-2025-30065) discovered in April 2025, which also affected the parquet-avro module. The recurrence of such issues highlights the need for continuous vigilance and prompt updates in software dependencies.

The current vulnerability was responsibly reported by security researchers Andrew Pikler, David Handermann, and Nándor Kollár. Their efforts in identifying and disclosing the issue are commendable, as they contribute to the collective security of the open-source community.

Risk Assessment

The risk factors associated with this vulnerability are as follows:

– Affected Products: Apache Parquet Java through version 1.15.1, specifically the parquet-avro module.
– Impact: Arbitrary code execution.
– Exploit Prerequisites:
– Application uses Apache Parquet Java ≤ 1.15.1.
– The parquet-avro module is utilized.
– The specific or reflect Avro models are employed for reading Parquet files.
– An attacker supplies a crafted Parquet file with a malicious Avro schema.
– CVSS 3.1 Score: Critical.

Recommended Actions

Organizations using affected versions of Apache Parquet Java are strongly advised to take immediate action to mitigate this vulnerability. The Apache Parquet team has released version 1.15.2 on May 1, 2025, which fully addresses the issue.

Remediation Options:

1. Upgrade to Apache Parquet Java 1.15.2: This version includes comprehensive fixes for the vulnerability and is the recommended course of action.

2. Temporary Mitigation for Version 1.15.1 Users: For those unable to upgrade immediately but running version 1.15.1, it is possible to mitigate the vulnerability by setting the system property `org.apache.parquet.avro.SERIALIZABLE_PACKAGES` to an empty string. This configuration change effectively prevents the execution of malicious code from trusted packages.

Conclusion

The discovery of CVE-2025-46762 in Apache Parquet Java underscores the importance of proactive security measures in software development and maintenance. Organizations must remain vigilant, promptly apply security patches, and regularly review their systems for potential vulnerabilities. By upgrading to the latest version or implementing the recommended mitigation strategies, users can protect their data processing pipelines from potential exploitation.