opus-mt-en-it-ct2
CTranslate2 int8 quantized conversion of Helsinki-NLP/opus-mt-en-it, packaged for offline use in Playto.
Purpose
Lightweight en→it (Italian) NMT for the Playto desktop game-language-learning tool, used as a low-spec / no-GPU fallback when local LLM inference is not available.
Source model
- Upstream:
Helsinki-NLP/opus-mt-en-it— MarianMT, transformer architecture - License: CC-BY 4.0
- No fine-tuning, no weight modification — purely a format conversion + int8 quantization of the upstream weights
Conversion
pip install ctranslate2 transformers sentencepiece
ct2-transformers-converter \
--model Helsinki-NLP/opus-mt-en-it \
--output_dir opus-mt-en-it \
--quantization int8 \
--copy_files source.spm target.spm
tar czf opus-mt-en-it-ct2.tar.gz opus-mt-en-it-ct2
File layout (inside opus-mt-en-it-ct2.tar.gz)
| File | Approx Size | Purpose |
|---|---|---|
model.bin |
~62MB | CTranslate2 int8 quantized weights |
shared_vocabulary.json |
~1.5MB | CTranslate2 vocab |
source.spm |
~800 KB | SentencePiece source tokenizer |
target.spm |
~800 KB | SentencePiece target tokenizer |
config.json |
~250 B | CTranslate2 config |
Usage
With ctranslate2 (Python)
import ctranslate2
import sentencepiece
translator = ctranslate2.Translator("opus-mt-en-it", device="cpu", compute_type="int8")
sp_source = sentencepiece.SentencePieceProcessor("opus-mt-en-it/source.spm")
sp_target = sentencepiece.SentencePieceProcessor("opus-mt-en-it/target.spm")
source_tokens = sp_source.encode("Hello, how are you?", out_type=str) + ["</s>"]
results = translator.translate_batch([source_tokens])
print(sp_target.decode(results[0].hypotheses[0]))
# → "Ciao, come stai?"
With ct2rs (Rust)
use ct2rs::{Translator, Tokenizer};
let tokenizer = Tokenizer::new("opus-mt-en-it")?;
let translator = Translator::with_tokenizer("opus-mt-en-it", tokenizer, /* config */)?;
let result = translator.translate_batch(&["Hello, how are you?".to_string()], /* options */)?;
Important: MarianMT models require </s> appended to source token sequences. The ct2rs::Tokenizer wrapper handles this automatically; raw SentencePiece calls must add it manually.
Quality
~54% Good rate on Playto fixture corpus. Base opus-mt comparable to tc-big variant in quality, base selected for smaller size (67MB vs 206MB).
For higher-quality translation, Playto's LLM-based translation path is recommended.
Attribution
This is a derivative work of Helsinki-NLP/opus-mt-en-it. License is CC-BY 4.0 inherited from upstream.
Helsinki-NLP. OPUS-MT — Open Machine Translation Models.
https://github.com/Helsinki-NLP/Opus-MT
Disclaimer
NMT output quality is significantly lower than modern LLM-based translation. This model is intended as a lightweight fallback for environments where LLM inference is not viable (= low VRAM, mobile, slow connection). For higher-quality translation, Playto's LLM-based translation path is recommended.
Model tree for playto-mt/opus-mt-en-it-ct2
Base model
Helsinki-NLP/opus-mt-en-it