Open machine translation for the Chechen language
Collection
4 items • Updated
How to use NM-development/nllb-ce-rus-v0 with Transformers:
# Use a pipeline as a high-level helper
# Warning: Pipeline type "translation" is no longer supported in transformers v5.
# You must load the model directly (see below) or downgrade to v4.x with:
# 'pip install "transformers<5.0.0'
from transformers import pipeline
pipe = pipeline("translation", model="NM-development/nllb-ce-rus-v0") # Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("NM-development/nllb-ce-rus-v0")
model = AutoModelForSeq2SeqLM.from_pretrained("NM-development/nllb-ce-rus-v0")This is fine tuned NLLB-200 model for Chechen-Russian translation, presented in paper The first open machine translation system for the Chechen language.
The language token for the Chechen language is ce_Cyrl, while for all the other languages included in NLLB-200, the tokens are composed of three letters (i.e. rus_Cyrl for Russian).
Here is an example of how the model can be used in the code:
import torch
from transformers import AutoModelForSeq2SeqLM
from transformers import NllbTokenizer
model_nllb = AutoModelForSeq2SeqLM.from_pretrained('NM-development/nllb-ce-rus-v0').cuda()
tokenizer_nllb = NllbTokenizer.from_pretrained('NM-development/nllb-ce-rus-v0')
def translate(text, model, tokenizer, src_lang='rus_Cyrl', tgt_lang='eng_Latn', a=16, b=1.5, max_input_length=1024, **kwargs):
model.eval()
with torch.no_grad():
tokenizer.src_lang = src_lang
tokenizer.tgt_lang = tgt_lang
inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=max_input_length)
result = model.generate(
**inputs.to(model.device),
forced_bos_token_id=tokenizer.convert_tokens_to_ids(tgt_lang),
max_new_tokens=int(a + b * inputs.input_ids.shape[1]),
**kwargs
)
return tokenizer.batch_decode(result, skip_special_tokens=True)
text = "Стигална кӀел къахьоьгуш, ша мел динчу хӀуманах буьсун болу хӀун пайда оьцу адамо?"
translate(text, model_nllb, tokenizer_nllb, src_lang='ce_Cyrl', tgt_lang='rus_Cyrl')[0]
# 'Что пользы человеку от того, что он трудился под солнцем и что сделал?'
Base model
facebook/nllb-200-distilled-600M