opus-mt-mul-en-big (Core ML)
Core ML conversion of opus-mt-mul-en-big (multilingual-to-English, Marian NMT, ~0.3B parameters) for deployment on iOS and macOS.
- Source: Helsinki-NLP OPUS-MT; original PyTorch model vonjack/opus-mt-mul-en-big
- Conversion: aoiandroid/opus-mt-mul-en-big β Core ML encoder/decoder (fixed sequence length 128)
Contents
OpusMT_Encoder_128.mlpackageβ Marian encoder (input_ids, attention_mask β hidden states)OpusMT_Decoder_128.mlpackageβ Marian decoder (input_ids, encoder_hidden_states, encoder_attention_mask β logits)tokenizer/β Marian tokenizer (sentencepiece + config)config.jsonβ Model config- Optional:
test.txtβ Multilingual sample sentences;test_results_*.csv/test_results_*.jsonβ Evaluation outputs
Input format
Source text must be prefixed with a language code so the model routes correctly:
>>jpn<< γγγ―γγΉγγ§γγ
>>fra<< Ceci est un test.
>>deu<< Das ist ein Test.
Use the opus-mt language code (e.g. jpn, fra, deu, eng). For unsupported languages the notebook typically falls back to eng (output is for reference only).
Usage on iOS / macOS
- Add the
.mlpackagefiles and tokenizer assets to your Xcode project (or load from disk). - Tokenize input with the included tokenizer; format as
>>{lang}<< {text}. - Run the encoder with
input_ids(int32, shape[1, 128]) andattention_mask(int32, shape[1, 128]). - Run the decoder in a loop: feed encoder hidden states and decoder input ids; take argmax of logits for the next token until EOS or max length.
Precision is float32 for both encoder and decoder to avoid quality loss (e.g. Korean/CJK with float16) and decoder output collapse.
Evaluation
Sample sentences and test results are provided in this repo:
test.txtβ One sentence per line in the formN. Language name (English)γSource texttest_results_*.csv/test_results_*.jsonβ Batch translation results (index, code, source, translation)
License
Apache 2.0 (inherited from OPUS-MT / Marian).
- Downloads last month
- 31