bge-m3-traffic-ft

Fine-tuned from BAAI/bge-m3 on Vietnamese traffic-law QA pairs.

Used as the dense retrieval encoder for a RAG + LoRA pipeline on Vietnamese road-traffic law.

Training

  • Base: BAAI/bge-m3 (full weights, 560M params).
  • Loss: MultipleNegativesRankingLoss (sentence-transformers).
  • Data: splits_filtered/qa_train.jsonl (1762 pairs) + penalty_training_pairs.jsonl (hard negatives from legal_sanction_facts.jsonl).
  • Schedule: 3 epochs, lr 2e-5, batch 16, max_seq 256.

Usage

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Anakonkai/bge-m3-traffic-ft")
embeddings = model.encode(["Phạt tiền đối với hành vi không đội mũ bảo hiểm là bao nhiêu?"])

Results (end-to-end)

On data/eval_manual_labeled_v5.jsonl (145 samples), full pipeline (config D):

  • source_recall@5: 0.9643
  • context_recall@5: 0.9750
  • ROUGE-L: 0.5149

Related repos

Downloads last month
16
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Anakonkai/bge-m3-traffic-ft

Base model

BAAI/bge-m3
Finetuned
(472)
this model