pyarrow-rce / README.md
rajasingh010's picture
Upload README.md with huggingface_hub
38086af verified
---
license: apache-2.0
tags:
- pyarrow
- deserialization
- rce
- vulnerability
- cve-2023-47248
cve: "CVE-2023-47248"
---
# PyArrow Unsafe Deserialization — CVE-2023-47248 PoC
## WARNING: MALICIOUS FILE — SECURITY RESEARCH ONLY
## Vulnerability
**File:** `python/pyarrow/types.pxi`
**Function:** `PyExtensionType` autoload
**CVE:** CVE-2023-47248
## Description
PyArrow 0.14.0 through 14.0.0 has `py_extension_type_auto_load = True` by
default. When reading IPC/Parquet/Feather files, `__arrow_ext_deserialize__`
calls `pickle.loads()` on untrusted extension metadata, enabling RCE.
```python
# Vulnerable path when reading IPC/Parquet files
__arrow_ext_deserialize__(self, serialized):
return pickle.loads(serialized) # RCE!
```
## Impact
- **Severity:** Critical (CVSS 9.8)
- **Attack Vector:** Victim reads malicious .arrow/.feather file → RCE
- **Fix:** PyArrow 14.0.1+ sets `py_extension_type_auto_load = False`
## Reproduction
```python
import pyarrow.ipc as ipc
reader = ipc.open_file("malicious_pyarrow.arrow")
table = reader.read_all() # RCE triggered
```
## References
- https://nvd.nist.gov/vuln/detail/CVE-2023-47248
- https://huntr.com/repos/apache/arrow