opus-mt-mul-en-big (Core ML)

Core ML conversion of opus-mt-mul-en-big (multilingual-to-English, Marian NMT, ~0.3B parameters) for deployment on iOS and macOS.

Source: Helsinki-NLP OPUS-MT; original PyTorch model vonjack/opus-mt-mul-en-big
Conversion: aoiandroid/opus-mt-mul-en-big → Core ML encoder/decoder (fixed sequence length 128)

OpusMT_Encoder_128.mlpackage – Marian encoder (input_ids, attention_mask → hidden states)
OpusMT_Decoder_128.mlpackage – Marian decoder (input_ids, encoder_hidden_states, encoder_attention_mask → logits)
tokenizer/ – Marian tokenizer (sentencepiece + config)
config.json – Model config
Optional: test.txt – Multilingual sample sentences; test_results_*.csv / test_results_*.json – Evaluation outputs

Input format

Source text must be prefixed with a language code so the model routes correctly:

>>jpn<< これはテストです。
>>fra<< Ceci est un test.
>>deu<< Das ist ein Test.

Use the opus-mt language code (e.g. jpn, fra, deu, eng). For unsupported languages the notebook typically falls back to eng (output is for reference only).

Usage on iOS / macOS

Add the .mlpackage files and tokenizer assets to your Xcode project (or load from disk).
Tokenize input with the included tokenizer; format as >>{lang}<< {text}.
Run the encoder with input_ids (int32, shape [1, 128]) and attention_mask (int32, shape [1, 128]).
Run the decoder in a loop: feed encoder hidden states and decoder input ids; take argmax of logits for the next token until EOS or max length.

Precision is float32 for both encoder and decoder to avoid quality loss (e.g. Korean/CJK with float16) and decoder output collapse.

Evaluation

Sample sentences and test results are provided in this repo:

test.txt – One sentence per line in the form N. Language name (English)　Source text
test_results_*.csv / test_results_*.json – Batch translation results (index, code, source, translation)

License

Apache 2.0 (inherited from OPUS-MT / Marian).

Downloads last month: 2

aoiandroid
/

opus-mt-mul-en-big-coreml

opus-mt-mul-en-big (Core ML)

Contents

Input format

Usage on iOS / macOS

Evaluation

License