metadata
license: apache-2.0
pipeline_tag: translation
tags:
- SimulMT
- Mamba-2
- Cross-attention
- en-ru
datasets:
- OldestSalt/translation_enru
language:
- en
- ru
Bimba
Bimba is almost linear SimulMT model trained with wait-k policy (k = 3, 5, 7, 9, 11) on en-ru translation dataset.
Architecture
The model has encoder-decoder architecture, where self-attention blocks are Mamba-2 blocks instead. It means that encoder is linear, but cross-attention's input is all outputs of encoder, and this means that complexity of Bimba is O(S * T), which is not exactly linear
Bimba was developed and trained as a part of master's thesis, and I hope that I will continue research in the Linear SimulMT field.
Using
To download Bimba you can clone the GitHub repository and use the HybridMamba2MT class:
from model_classes import HybridMamba2MT
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("OldestSalt/Bimba")
model = HybridMamba2MT.from_pretrained("OldestSalt/Bimba")
Translation
Maybe someday I will write here an example of simultaneous translation.
Tokenizer
This model was distilled from NLLB-200-1.3B, so Bimba uses its' tokenizer.
- Code: https://github.com/OldestSalt/LinearSimultMT
- Paper: Soon (I hope)
