bge-m3-traffic-ft

Fine-tuned from BAAI/bge-m3 on Vietnamese traffic-law QA pairs.

Used as the dense retrieval encoder for a RAG + LoRA pipeline on Vietnamese road-traffic law.

Training

Base: BAAI/bge-m3 (full weights, 560M params).
Loss: MultipleNegativesRankingLoss (sentence-transformers).
Data: splits_filtered/qa_train.jsonl (1762 pairs) + penalty_training_pairs.jsonl (hard negatives from legal_sanction_facts.jsonl).
Schedule: 3 epochs, lr 2e-5, batch 16, max_seq 256.

Usage

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Anakonkai/bge-m3-traffic-ft")
embeddings = model.encode(["Phạt tiền đối với hành vi không đội mũ bảo hiểm là bao nhiêu?"])

Results (end-to-end)

On data/eval_manual_labeled_v5.jsonl (145 samples), full pipeline (config D):

source_recall@5: 0.9643
context_recall@5: 0.9750
ROUGE-L: 0.5149

Related repos

Code: github.com/Anakonkai01/nlp-traffic-laws
LoRA generator: Anakonkai/qwen3.5-9b-lora-traffic-v2

Downloads last month: 51

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for Anakonkai/bge-m3-traffic-ft

Base model

BAAI/bge-m3

Finetuned

(518)

this model