πŸ›οΈ RAMT-LaBSE English β†’ Malay Legal Translation Model

RAMT-LaBSE is a Retrieval-Augmented Machine Translation (RAMT) model based on
Facebook M2M100, fine-tuned specifically for English β†’ Malay legal text translation.

This model improves translation quality in:

  • statutes
  • contracts
  • court documents
  • legal terms
  • policy & regulatory content

It uses LaBSE embeddings during training for retrieval-augmented sentence matching.


πŸ“‚ Model Architecture

  • Base model: facebook/m2m100_418M
  • Tokenizer: M2M100 SentencePiece
  • Fine-tuned on: English–Malay legal parallel corpus
  • Technique: Retrieval-Augmented MT + Contrastive Alignment
  • Optimized for: Precision, legal terminology, formal structure

πŸš€ Quick Use (Transformers)

from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer

model_name = "mytranslatenisa/ramt-labse"

tokenizer = M2M100Tokenizer.from_pretrained(model_name)
model = M2M100ForConditionalGeneration.from_pretrained(model_name)

text = "The agreement shall be legally binding upon both parties."

tokenizer.src_lang = "en"
encoded = tokenizer(text, return_tensors="pt")

generated = model.generate(
    **encoded,
    forced_bos_token_id=tokenizer.get_lang_id("ms")
)

print(tokenizer.decode(generated[0], skip_special_tokens=True))
Downloads last month
123
Safetensors
Model size
0.5B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Spaces using mytranslatenisa/ramt-labse 3