YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
LightGBM duplicate leaf_value β reader differential PoC
Vulnerability class: Model File Vulnerability (MFV) β format-level reader differential
Target: microsoft/LightGBM
Tested version: lightgbm 4.6.0
Summary
LightGBM's C++ text model parser uses last-wins semantics when a tree block
contains duplicate leaf_value entries. A reviewer or inspection script that
checks only the first matching leaf_value line sees the benign value, while
the LightGBM runtime uses the later value.
An attacker can embed a benign-looking leaf_value as the first line in a tree
block (visible to reviewers) and append a malicious leaf_value as the second
line (used by the C++ runtime at inference time).
ModelScan 0.8.8 has no LightGBM scanner β the format is not covered.
Files
| File | Description |
|---|---|
benign.txt |
Clean 3-tree binary classifier |
dup_leaf_malicious_last.txt |
Duplicate leaf_value in Tree=0: benign first, malicious last |
dup_leaf_malicious_first.txt |
Reversed control: malicious first, benign last |
make_lightgbm_duplicate_leaf.py |
Generates all three model files from scratch |
check_lightgbm_duplicate_leaf.py |
Verifies the reader differential and output flip |
Reproduction
pip install -r requirements.txt
python make_lightgbm_duplicate_leaf.py
python check_lightgbm_duplicate_leaf.py
Expected output
LIGHTGBM_LAST_WINS=True
PYTHON_FIRST_WINS=True
OUTPUT_FLIP_CONFIRMED=True
benign=0.812768 malicious_last=0.000120 delta=0.812649
MODELSCAN_RESULT=0 issues / not scanned (no LightGBM scanner)
=== Prediction table ===
Model Pred
benign.txt 0.812768
dup_leaf_malicious_last.txt 0.000120 <- C++ used last (malicious)
dup_leaf_malicious_first.txt 0.812768 <- C++ used last (benign)
First leaf_value Python sees: leaf_value=-0.59999999999999976 0.59999999999999976 0 (BENIGN)
[PASS] All checks passed.
Mechanism
In dup_leaf_malicious_last.txt, Tree=0 block contains:
leaf_value=-0.59999999999999976 0.59999999999999976 0 β FIRST (reviewer sees)
leaf_value=9.9 -9.9 9.9 β LAST (C++ uses)
tree_sizes[0] is updated to reflect the enlarged block. The model loads
without error or warning.
Control experiment
With malicious values FIRST and benign values LAST, the C++ runtime uses the
benign (last) value β prediction matches benign.txt. This confirms last-wins
is the differential mechanism, not a value magnitude effect.