pyarrow-rce / README.md
rajasingh010's picture
Upload README.md with huggingface_hub
38086af verified
metadata
license: apache-2.0
tags:
  - pyarrow
  - deserialization
  - rce
  - vulnerability
  - cve-2023-47248
cve: CVE-2023-47248

PyArrow Unsafe Deserialization — CVE-2023-47248 PoC

WARNING: MALICIOUS FILE — SECURITY RESEARCH ONLY

Vulnerability

File: python/pyarrow/types.pxi Function: PyExtensionType autoload CVE: CVE-2023-47248

Description

PyArrow 0.14.0 through 14.0.0 has py_extension_type_auto_load = True by default. When reading IPC/Parquet/Feather files, __arrow_ext_deserialize__ calls pickle.loads() on untrusted extension metadata, enabling RCE.

# Vulnerable path when reading IPC/Parquet files
__arrow_ext_deserialize__(self, serialized):
    return pickle.loads(serialized)  # RCE!

Impact

  • Severity: Critical (CVSS 9.8)
  • Attack Vector: Victim reads malicious .arrow/.feather file → RCE
  • Fix: PyArrow 14.0.1+ sets py_extension_type_auto_load = False

Reproduction

import pyarrow.ipc as ipc
reader = ipc.open_file("malicious_pyarrow.arrow")
table = reader.read_all()  # RCE triggered

References