MultiEvalSumViet2: Multi-Criteria Evaluation and Reward Modeling for Vietnamese Summarization
MultiEvalSumViet2 is a Vietnamese-native learned evaluator that scores a candidate summary given its source document on three criteria:
- Faithfulness (F): factual consistency w.r.t. the source document
- Coherence (C): readability and logical flow
- Relevance (R): topical alignment and coverage of key information
The model outputs criterion-wise scores in [0, 1] for a (document, summary) pair and is designed to be used as:
- an automatic Vietnamese summarization evaluator,
- a scorer for dataset curation / preference construction,
- a reward model in PPO/GRPO-style optimization.
What’s in this repository
This repo contains:
- Backbone encoder weights (
config.json,model.safetensors, tokenizer files) - Lightweight heads:
trunk.pt,head_faith.pt,head_coh.pt,head_rel.pt - Configs:
arch_config.json,training_args.json,loss_config.json,package_versions.json - Inference helpers:
modeling_summary_evaluator.py(recommended loader + pair-encoding)
Output format
Given (doc, summary), the model returns:
pred_faith∈ [0, 1]pred_coherence∈ [0, 1]pred_relevance∈ [0, 1]
Optional mappings:
- Likert 1–5:
score_1to5 = 4 * score_0to1 + 1 - Aggregate (paper default):
pred_overall = 0.5*pred_faith + 0.3*pred_relevance + 0.2*pred_coherence
Reproducibility-critical tokenization policy
To match training-time preprocessing, MultiEvalSumViet2 uses:
- Summary pre-trim (default
SUM_MAX_LEN = 256) - Pair encoding (doc, summary) with
MAX_LEN = 512 truncation="only_first"so truncation primarily affects the document, not the summary
The recommended
encode_pair()helper inmodeling_summary_evaluator.pyimplements the same policy.
Quickstart (minimal usage)
Install
pip install -U torch transformers huggingface_hub numpy
Score a few pairs (minimal batch)
import os
import numpy as np
import torch
import importlib.util
from huggingface_hub import snapshot_download
from transformers import DataCollatorWithPadding
REPO_ID = "phuongntc/Multi_EvalSumViet2"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
# 1) Download repo snapshot
repo_dir = snapshot_download(repo_id=REPO_ID, repo_type="model")
# 2) Import repo inference helper
loader_path = os.path.join(repo_dir, "modeling_summary_evaluator.py")
spec = importlib.util.spec_from_file_location("mse", loader_path)
mse = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mse) # type: ignore
# 3) Load model + tokenizer
model, tokenizer, _ = mse.load_for_inference(repo_dir, device=DEVICE)
model.eval()
collator = DataCollatorWithPadding(tokenizer=tokenizer, padding=True, pad_to_multiple_of=8)
@torch.inference_mode()
def score_pairs(docs, sums, batch_size=8):
outs = []
for i in range(0, len(docs), batch_size):
d = docs[i:i+batch_size]
s = sums[i:i+batch_size]
enc = mse.encode_pair(tokenizer, d, s) # training-matched encoding
features = [{k: enc[k][j] for k in enc.keys()} for j in range(len(d))]
batch = collator(features)
y = model(batch["input_ids"].to(DEVICE), batch["attention_mask"].to(DEVICE)) # [B,3]
outs.append(y.detach().cpu().numpy())
y = np.clip(np.vstack(outs), 0.0, 1.0)
return y # columns: [faith, coherence, relevance]
docs = ["Văn bản gốc ..."]
sums = ["Bản tóm tắt ..."]
scores = score_pairs(docs, sums)
faith, coh, rel = scores[0].tolist()
print({"faith": faith, "coherence": coh, "relevance": rel})
Batch scoring (CSV/XLSX)
Input
A CSV/XLSX file with two required columns:
docsummary
Output
An XLSX file adding:
pred_faith,pred_coherence,pred_relevance(0–1)- optional:
pred_*_1to5 - optional:
pred_overall,pred_overall_1to5
Ready-to-run script
Copy this into a file like
examples/score_batch_xlsx.py(recommended), or run directly in a notebook.
# pip install -U torch transformers huggingface_hub numpy pandas openpyxl tqdm
import os, math, importlib.util
import numpy as np
import pandas as pd
import torch
from tqdm import tqdm
from huggingface_hub import snapshot_download
from transformers import DataCollatorWithPadding
REPO_ID = "phuongntc/Multi_EvalSumViet2"
INPUT_FILE = "/content/test.xlsx" # .xlsx or .csv with columns: doc, summary
OUTPUT_XLSX = "/content/output_scored.xlsx"
BATCH_SIZE = 8
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
repo_dir = snapshot_download(repo_id=REPO_ID, repo_type="model")
loader_path = os.path.join(repo_dir, "modeling_summary_evaluator.py")
spec = importlib.util.spec_from_file_location("mse", loader_path)
mse = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mse) # type: ignore
model, tokenizer, _ = mse.load_for_inference(repo_dir, device=DEVICE)
model.eval()
# Load data
df = pd.read_excel(INPUT_FILE) if INPUT_FILE.lower().endswith((".xlsx",".xls",".xlsm")) else pd.read_csv(INPUT_FILE)
df.columns = [c.strip() for c in df.columns]
for c in ["doc", "summary"]:
if c not in df.columns:
raise ValueError(f"Missing required column: {c}")
df = df.dropna(subset=["doc","summary"]).reset_index(drop=True)
collator = DataCollatorWithPadding(tokenizer=tokenizer, padding=True, pad_to_multiple_of=8)
def iter_batches(n, bs):
for i in range(0, n, bs):
yield i, min(i+bs, n)
preds = []
with torch.inference_mode():
for a, b in tqdm(iter_batches(len(df), BATCH_SIZE), total=math.ceil(len(df)/BATCH_SIZE), desc="Scoring"):
docs = df.loc[a:b-1, "doc"].astype(str).tolist()
sums = df.loc[a:b-1, "summary"].astype(str).tolist()
enc = mse.encode_pair(tokenizer, docs, sums) # training-matched encoding
features = [{k: enc[k][i] for k in enc.keys()} for i in range(len(docs))]
batch = collator(features)
y = model(batch["input_ids"].to(DEVICE), batch["attention_mask"].to(DEVICE)) # [B,3]
preds.append(y.detach().cpu().numpy())
preds = np.clip(np.vstack(preds), 0.0, 1.0)
out = df.copy()
out["pred_faith"] = preds[:, 0]
out["pred_coherence"] = preds[:, 1]
out["pred_relevance"] = preds[:, 2]
# Optional: map to 1–5
out["pred_faith_1to5"] = 4.0 * out["pred_faith"] + 1.0
out["pred_coherence_1to5"] = 4.0 * out["pred_coherence"] + 1.0
out["pred_relevance_1to5"] = 4.0 * out["pred_relevance"] + 1.0
# Optional: aggregate score
out["pred_overall"] = 0.5*out["pred_faith"] + 0.3*out["pred_relevance"] + 0.2*out["pred_coherence"]
out["pred_overall_1to5"] = 4.0*out["pred_overall"] + 1.0
out.to_excel(OUTPUT_XLSX, index=False)
print("Saved:", OUTPUT_XLSX)
Model description (paper-aligned)
Architecture
Vietnamese encoder backbone → masked mean pooling → shared MLP trunk → three regression heads (F/C/R).
Training objective
Hybrid training:
- multi-task regression (calibrated absolute scoring),
- intra-document pairwise ranking (preserve within-document preferences among multiple candidate summaries).
Data & supervision
We construct a calibrated dataset of 80,856 labeled (document, summary) pairs from 13,476 Vietnamese news articles (2022–2024) using an LLM-assisted annotation pipeline with human verification, and normalize criterion-wise scores to [0,1].
Intended use
- Vietnamese summarization evaluation beyond lexical overlap metrics
- Preference dataset construction from multiple candidate summaries per document
- Reward modeling for PPO/GRPO-style fine-tuning and iterative data curation
Reproducibility & version pinning
Pin a specific revision/commit when reproducing paper results:
from huggingface_hub import snapshot_download
repo_dir = snapshot_download(
repo_id="phuongntc/Multi_EvalSumViet2",
repo_type="model",
revision="<COMMIT_HASH_OR_TAG>"
)
Repository license (code/model files)
This Hugging Face repository is released under Apache License 2.0 (apache-2.0).
If you prefer MIT or non-commercial terms for the repository contents, change the YAML
license:field and add the correspondingLICENSEfile.
Citation
Model DOI (recommended)
Generate a DOI from: Settings → DOI → Generate DOI, then fill it here.
- DOI: 10.57967/hf/7956
BibTeX (model)
@misc{multievalsumviet2_model,
title = {MultiEvalSumViet2: Vietnamese Multi-Criteria Summary Evaluator},
author = {Thu Phuong Tran Thi},
year = {2026},
howpublished = {Hugging Face Hub},
url = {https://huggingface.co/phuongntc/Multi_EvalSumViet2},
doi = {10.57967/hf/7956}
}
Contact
- Maintainer: Thu Phuong Tran Thi
- Affiliation: Hanoi Metropolitan University; VNU University of Engineering and Technology, Vietnam National University
- Email: tttphuong2@daihocthudo.edu.vn
- Please use the Hugging Face Community tab for questions and bug reports.
(Keep the repository license above unchanged unless you also intend to relicense the repository contents.)
- Downloads last month
- 46