| license: apache-2.0 | |
| tags: | |
| - pyarrow | |
| - deserialization | |
| - rce | |
| - vulnerability | |
| - cve-2023-47248 | |
| cve: "CVE-2023-47248" | |
| # PyArrow Unsafe Deserialization — CVE-2023-47248 PoC | |
| ## WARNING: MALICIOUS FILE — SECURITY RESEARCH ONLY | |
| ## Vulnerability | |
| **File:** `python/pyarrow/types.pxi` | |
| **Function:** `PyExtensionType` autoload | |
| **CVE:** CVE-2023-47248 | |
| ## Description | |
| PyArrow 0.14.0 through 14.0.0 has `py_extension_type_auto_load = True` by | |
| default. When reading IPC/Parquet/Feather files, `__arrow_ext_deserialize__` | |
| calls `pickle.loads()` on untrusted extension metadata, enabling RCE. | |
| ```python | |
| # Vulnerable path when reading IPC/Parquet files | |
| __arrow_ext_deserialize__(self, serialized): | |
| return pickle.loads(serialized) # RCE! | |
| ``` | |
| ## Impact | |
| - **Severity:** Critical (CVSS 9.8) | |
| - **Attack Vector:** Victim reads malicious .arrow/.feather file → RCE | |
| - **Fix:** PyArrow 14.0.1+ sets `py_extension_type_auto_load = False` | |
| ## Reproduction | |
| ```python | |
| import pyarrow.ipc as ipc | |
| reader = ipc.open_file("malicious_pyarrow.arrow") | |
| table = reader.read_all() # RCE triggered | |
| ``` | |
| ## References | |
| - https://nvd.nist.gov/vuln/detail/CVE-2023-47248 | |
| - https://huntr.com/repos/apache/arrow | |