You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

LightGBM duplicate `leaf_value` — reader differential PoC

Vulnerability class: Model File Vulnerability (MFV) — format-level reader differential
Target: microsoft/LightGBM
Tested version: lightgbm 4.6.0

Summary

LightGBM's C++ text model parser uses last-wins semantics when a tree block contains duplicate leaf_value entries. A reviewer or inspection script that checks only the first matching leaf_value line sees the benign value, while the LightGBM runtime uses the later value.

An attacker can embed a benign-looking leaf_value as the first line in a tree block (visible to reviewers) and append a malicious leaf_value as the second line (used by the C++ runtime at inference time).

ModelScan 0.8.8 has no LightGBM scanner — the format is not covered.

Files

File	Description
`benign.txt`	Clean 3-tree binary classifier
`dup_leaf_malicious_last.txt`	Duplicate `leaf_value` in Tree=0: benign first, malicious last
`dup_leaf_malicious_first.txt`	Reversed control: malicious first, benign last
`make_lightgbm_duplicate_leaf.py`	Generates all three model files from scratch
`check_lightgbm_duplicate_leaf.py`	Verifies the reader differential and output flip

Reproduction

pip install -r requirements.txt
python make_lightgbm_duplicate_leaf.py
python check_lightgbm_duplicate_leaf.py

Expected output

LIGHTGBM_LAST_WINS=True
PYTHON_FIRST_WINS=True
OUTPUT_FLIP_CONFIRMED=True
  benign=0.812768  malicious_last=0.000120  delta=0.812649
MODELSCAN_RESULT=0 issues / not scanned (no LightGBM scanner)

=== Prediction table ===
Model                                    Pred
benign.txt                           0.812768
dup_leaf_malicious_last.txt          0.000120  <- C++ used last (malicious)
dup_leaf_malicious_first.txt         0.812768  <- C++ used last (benign)

First leaf_value Python sees: leaf_value=-0.59999999999999976 0.59999999999999976 0 (BENIGN)

[PASS] All checks passed.

Mechanism

In dup_leaf_malicious_last.txt, Tree=0 block contains:

leaf_value=-0.59999999999999976 0.59999999999999976 0   ← FIRST (reviewer sees)
leaf_value=9.9 -9.9 9.9                                ← LAST  (C++ uses)

tree_sizes[0] is updated to reflect the enlarged block. The model loads without error or warning.

Control experiment

With malicious values FIRST and benign values LAST, the C++ runtime uses the benign (last) value — prediction matches benign.txt. This confirms last-wins is the differential mechanism, not a value magnitude effect.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support