explainable-book-reranker-ko

v2.0.0

A Korean book-recommendation reranker that returns both a relevance score and the evidence sentences behind it — not just a number. It is a LoRA adaptation of BAAI/bge-reranker-v2-m3 trained with the select-then-predict architecture, distilled from an LLM teacher.

What it does

Given a query and a candidate book (with its synopsis/review sentences), the model:

Generator (LoRA on the encoder + a selection head) picks the 1–3 sentences that justify the match — the explanation.
Predictor (LoRA + classification head) scores the book using only those selected sentences.

Because the score is computed only from the selected evidence, the highlighted sentences are a faithful reason for the ranking — the model cannot secretly rank on something it didn't show.

This is not a drop-in CrossEncoder: it has two adapters and a custom inference path (see Usage).

Results

Evaluated against held-out teacher labels (distillation generalization; gold = teacher, not human relevance).

split	NDCG@5	NDCG@10	MRR	rationale F1	rationale IoU
test (unbiased)	0.581	0.606	0.880	0.434	0.317
valid (selected epoch 4)	0.597	0.618	0.928	0.432	0.312

vs v1.0.0 (989 labels): test NDCG@10 0.577 → 0.606 (+5%), rationale F1 0.197 → 0.434 (2.2×), rationale IoU 0.142 → 0.317.

Two levers drove the gain: (1) 2× the teacher data (989 → 1,907 labeled queries) lifted ranking — the data–quality curve was still rising; (2) capping each training query's pool to its top-24 candidates + hard negatives roughly doubled rationale faithfulness at equal ranking (it sharpens the generator's evidence selection).

Model selection: trained 5 epochs with per-epoch checkpoints; epoch 4 maximized validation NDCG@5. The final epoch reliably collapses (degenerate ranking overfit), so best-epoch selection is required — never use a fixed last epoch.

Files

generator_adapter/      LoRA_G — evidence-sentence selection (PEFT)
generator_head.pt       selection head (fp32 linear)
predictor_adapter/      LoRA_P — relevance scoring (PEFT)
lora_target_modules.yaml  LoRA config required by the loader

Usage

The architecture lives in the source repo. Install it, download this model, and load with the select-then-predict loader:

git clone https://github.com/reranker-master/explainable-reranker
cd explainable-reranker && pip install -e '.[gpu]'
huggingface-cli download zettascope/explainable-book-reranker-ko --local-dir ./ckpt

from explainable_reranker.models.select_predict.neural_model import load_neural_model

# select_fp32=True makes the rationale fully deterministic (no bf16/padding wobble) at
# ~50% extra latency; default False keeps ranking identical and is faster.
model = load_neural_model("./ckpt", "./ckpt/lora_target_modules.yaml", device="cuda")
# model.rerank_batch(batch) -> ranked books with score + selected evidence sentences

Or serve the /rerank HTTP endpoint:

PYTHONPATH=src python3 scripts/serve_rerank.py \
  --checkpoint ./ckpt --lora-config ./ckpt/lora_target_modules.yaml
# add --select-fp32 for deterministic rationale

Latency (GB10, bf16, 50-candidate pool): ~1.8 s/query, scaling ~linearly at ~38 ms/candidate. Candidates are scored independently, so batching them is safe (ranking unchanged).

Training

Base: BAAI/bge-reranker-v2-m3 (XLM-RoBERTa-large cross-encoder), LoRA r=16.
Teacher labels: 1,907 Korean book-recommendation queries, each with ~50 candidates graded 0–3, top-10 grounded rationale sentences, and in-pool hard negatives. Labeled by an LLM teacher via grounded 2-pass prompting.
Recipe: 5 epochs, lr 1e-4, train-candidate cap 24 (top-by-score + hard negatives; valid/test use the full pool), best-epoch selection on validation NDCG@5.
Losses: listwise KD (ranking) + binary selection (which sentences are evidence) + hard-negative anchor + sparsity/continuity. Generator and Predictor trained jointly with a teacher→generator selection-packing schedule.

Limitations

Quality is capped by the teacher (LLM-distilled); metrics measure agreement with the teacher on unseen queries, not absolute human relevance.
Trained on 1,907 queries — the data curve was still rising, so more/better-teacher data should raise quality further.
rationale IoU (0.32) means the selected sentences are reasonable evidence but do not exactly match the teacher's picks.
Korean book-recommendation domain; not validated elsewhere.

Versions

v2.0.0 — 1,907 LLM-distilled teacher labels (2×), train-candidate cap 24, epoch-4 (valid-NDCG@5-selected). Test NDCG@10 = 0.606, rationale F1 = 0.434. Adds a select_fp32 option for deterministic rationale.
v1.0.0 — first release. bge-reranker-v2-m3 + LoRA, 989 LLM-distilled teacher labels, epoch-3 (valid-NDCG@5-selected). Test NDCG@5 = 0.550, NDCG@10 = 0.577.

License

MIT, inheriting BAAI/bge-reranker-v2-m3 (MIT). Verify the base license before redistribution.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zettascope/explainable-book-reranker-ko

Base model

BAAI/bge-reranker-v2-m3

Adapter

(5)

this model