Sentence Similarity
sentence-transformers
Safetensors
Vietnamese
xlm-roberta
legal
vietnamese
traffic-law
text-embeddings-inference
Instructions to use Anakonkai/bge-m3-traffic-ft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Anakonkai/bge-m3-traffic-ft with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Anakonkai/bge-m3-traffic-ft") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
bge-m3-traffic-ft
Fine-tuned from BAAI/bge-m3 on Vietnamese traffic-law QA pairs.
Used as the dense retrieval encoder for a RAG + LoRA pipeline on Vietnamese road-traffic law.
Training
- Base:
BAAI/bge-m3(full weights, 560M params). - Loss:
MultipleNegativesRankingLoss(sentence-transformers). - Data:
splits_filtered/qa_train.jsonl(1762 pairs) + penalty_training_pairs.jsonl (hard negatives fromlegal_sanction_facts.jsonl). - Schedule: 3 epochs, lr 2e-5, batch 16, max_seq 256.
Usage
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Anakonkai/bge-m3-traffic-ft")
embeddings = model.encode(["Phạt tiền đối với hành vi không đội mũ bảo hiểm là bao nhiêu?"])
Results (end-to-end)
On data/eval_manual_labeled_v5.jsonl (145 samples), full pipeline (config D):
- source_recall@5: 0.9643
- context_recall@5: 0.9750
- ROUGE-L: 0.5149
Related repos
- Code: github.com/Anakonkai01/nlp-traffic-laws
- LoRA generator: Anakonkai/qwen3.5-9b-lora-traffic-v2
- Downloads last month
- 16
Model tree for Anakonkai/bge-m3-traffic-ft
Base model
BAAI/bge-m3