ποΈ RAMT-LaBSE English β Malay Legal Translation Model
RAMT-LaBSE is a Retrieval-Augmented Machine Translation (RAMT) model based on
Facebook M2M100, fine-tuned specifically for English β Malay legal text translation.
This model improves translation quality in:
- statutes
- contracts
- court documents
- legal terms
- policy & regulatory content
It uses LaBSE embeddings during training for retrieval-augmented sentence matching.
π Model Architecture
- Base model:
facebook/m2m100_418M - Tokenizer: M2M100 SentencePiece
- Fine-tuned on: EnglishβMalay legal parallel corpus
- Technique: Retrieval-Augmented MT + Contrastive Alignment
- Optimized for: Precision, legal terminology, formal structure
π Quick Use (Transformers)
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
model_name = "mytranslatenisa/ramt-labse"
tokenizer = M2M100Tokenizer.from_pretrained(model_name)
model = M2M100ForConditionalGeneration.from_pretrained(model_name)
text = "The agreement shall be legally binding upon both parties."
tokenizer.src_lang = "en"
encoded = tokenizer(text, return_tensors="pt")
generated = model.generate(
**encoded,
forced_bos_token_id=tokenizer.get_lang_id("ms")
)
print(tokenizer.decode(generated[0], skip_special_tokens=True))
- Downloads last month
- 123