CA byte-LM translator β en2x
A vocabulary-free, UTF-8-byte, decoder-only translator with a causal Neural-Cellular-Automaton
front-end, fine-tuned (prompted continuation, prefix-LM, target-span loss) from the pretrained base
sujayrittikar/ca-byte-lm-indic6 for the
en2x direction. WMT-2026 Indic MT research track. Covers Assamese, Khasi, Manipuri,
Meitei-Mayek, Mizo, Nyishi β English β and is the only system for Khasi, Nyishi and Meitei-Mayek.
Dev chrF++ (best checkpoint @ step 60000)
| language | chrF++ |
|---|---|
| Assamese | 14.85 |
| Manipuri | 16.06 |
| Mizo | 35.62 |
| Khasi | 25.58 |
| Nyishi | 68.76 |
Dev numbers are measured on the 2025 test set, which was folded into training β indicative, not held-out.
Usage
from ca_byte_lm import from_hub, translate
model, cfg, meta = from_hub("sujayrittikar/ca-byte-mt-en2x", device="cuda")
print(translate(model, cfg, meta, "The government announced a new policy.", "English", "Khasi", device="cuda"))
Weights: ca_byte_lm.pt; architecture: ca_byte_lm.py; config: config.json.
- Downloads last month
- 19
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support