File size: 10,558 Bytes
50cbb49 59e7707 b386247 59e7707 4cb9ad2 b386247 59e7707 b386247 59e7707 b386247 59e7707 b386247 59e7707 b386247 59e7707 f7c44dd 59e7707 4cb9ad2 59e7707 50cbb49 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 | ---
license: mit
language:
- ja
- en
base_model:
- sbintuitions/sarashina2.2-0.5b
---
# CAT-Translate 🐱
[](https://opensource.org/licenses/MIT)
[](https://huggingface.co/cyberagent/CAT-Translate-0.8b/)
Tiny Language Model For Japanese and English Bidirectional Translation
- **Purrs on your lap** 🐱: Small and efficient! 0.8-7B models that run on edge devices.
- **Swift and Feline Sharp** 🐾: Beats TranslateGemma-12B on text-to-text translation quality.
- **Adopt and adapt** 🐈: Open source (MIT License) models you can customize and extend.
<div align="center">
<img src="CAT-logo.png" alt="Cat sleeping on top of a laptop." width="200">
</div>
## Models
All models are available on Hugging Face:
- [CAT-Translate-0.8B](https://huggingface.co/cyberagent/CAT-Translate-0.8b/)
- [CAT-Translate-1.4B](https://huggingface.co/cyberagent/CAT-Translate-1.4b/)
- [CAT-Translate-3.3B](https://huggingface.co/cyberagent/CAT-Translate-3.3b/)
- [CAT-Translate-7B](https://huggingface.co/cyberagent/CAT-Translate-7b/)
## Evaluation
We conducted evaluation on the translation subsets of the following benchmarks:
- [The Business Scene Dialogue corpus](https://github.com/tsuruoka-lab/BSD) (BSD)
- Each conversation is given to the model to translate instead of each sentence.
- [Court Interpreter](https://github.com/mynlp/court_interpreter) (Court)
- [JMedBench](https://huggingface.co/datasets/Coldog2333/JMedBench) (JMed)
- ejmmt subsets are used.
- [pfmt-bench-fin-ja](https://github.com/pfnet-research/pfmt-bench-fin-ja) (PFMT)
- [WAT 2025 Patent Translation](https://sites.google.com/view/pat-claims-trans-2025/) (wat-pat-2025)
We chose these tasks as benchmarks because (1) they are derived from real world applications and (2) are less overoptimized compared to popular datasets (e.g., WMT).
The results are below.
All the models achieved the best scores among all models (including closed source) within their respective sizes for both En-Ja and Ja-En translation tasks.
| Model | Avg. BLEU | Avg. BLEU Ja->En | Avg. BLEU En->Ja | BSD (Ja-En) | Court (Ja-En) | JMed (Ja-En) | PFMT (Ja-En) | wat-pat-2025 (Ja-En) | BSD (En-Ja) | JMed (En-Ja) | PFMT (En-Ja) | wat-pat-2025 (En-Ja) |
|:-------------------------------------------------|----------:|-----------------:|-----------------:|------------:|--------------:|-------------:|-------------:|------------------:|------------:|-------------:|-------------:|------------------:|
| CyberAgent/CAT-Translate-7B | 37.68 | 41.06 | 34.31 | 33.75 | 45.29 | 30.65 | 49.86 | 45.74 | 16.29 | 29.62 | 52.94 | 38.37 |
| CyberAgent/CAT-Translate-3.3B | 36.16 | 37.51 | 34.80 | 26.51 | 42.44 | 24.47 | 49.93 | 44.23 | 17.21 | 28.67 | 53.88 | 39.44 |
| CyberAgent/CAT-Translate-1.4B | 33.73 | 33.26 | 34.19 | 31.28 | 43.84 | 24.08 | 36.55 | 30.57 | 15.71 | 26.92 | 51.53 | 42.58 |
| Unbabel/Tower-Plus-9B | 32.41 | 36.84 | 27.99 | 15.43 | 40.54 | 29.13 | 58.00 | 41.10 | 10.00 | 18.80 | 53.00 | 30.16 |
| google/translategemma-12b-it | 32.24 | 35.81 | 28.68 | 31.58 | 34.30 | 23.46 | 48.75 | 40.97 | 15.92 | 21.79 | 52.53 | 24.47 |
| CyberAgent/CAT-Translate-3.3B-beta | 30.60 | 30.32 | 30.88 | 17.20 | 38.65 | 23.96 | 40.58 | 31.22 | 16.63 | 26.68 | 53.40 | 26.80 |
| CyberAgent/CAT-Translate-0.8B | 30.42 | 29.71 | 30.68 | 29.63 | 33.19 | 22.96 | 32.51 | 30.56 | 14.60 | 26.22 | 50.62 | 32.87 |
| google/translategemma-4b-it | 28.09 | 29.41 | 26.76 | 28.86 | 25.89 | 21.50 | 42.65 | 28.16 | 14.14 | 20.68 | 51.99 | 20.23 |
| LiquidAI/LFM2.5-1.2B-JP | 25.47 | 24.51 | 26.43 | 19.06 | 29.99 | 22.10 | 43.61 | 7.80 | 14.57 | 23.85 | 54.77 | 12.54 |
| pfnet/plamo-2-translate | 25.24 | 25.92 | 24.57 | 25.55 | 28.63 | 22.90 | 29.02 | 23.48 | 17.35 | 24.98 | 32.04 | 23.89 |
| LiquidAI/LFM2-350M-ENJP-MT | 24.95 | 24.91 | 25.00 | 10.94 | 29.56 | 21.48 | 41.40 | 21.17 | 8.11 | 22.84 | 47.53 | 21.52 |
| mistralai/Ministral-8B-Instruct-2410 | 24.12 | 27.52 | 20.71 | 19.23 | 29.21 | 16.25 | 50.23 | 22.69 | 12.91 | 16.49 | 41.66 | 11.80 |
| nvidia/NVIDIA-Nemotron-Nano-9B-v2-Japanese | 22.97 | 22.77 | 23.18 | 9.62 | 34.98 | 18.01 | 38.44 | 12.81 | 10.62 | 20.41 | 42.55 | 19.13 |
| Rakuten/RakutenAI-2.0-mini-instruct | 18.43 | 17.24 | 19.62 | 0.11 | 30.62 | 18.21 | 29.34 | 7.90 | 5.19 | 20.36 | 45.70 | 7.23 |
| SakanaAI/TinySwallow-1.5B-Instruct | 15.74 | 14.99 | 16.49 | 4.96 | 18.93 | 15.83 | 26.67 | 8.58 | 6.30 | 17.58 | 34.07 | 8.00 |
| llm-jp/llm-jp-3.1-1.8b-instruct4 | 15.18 | 16.26 | 14.11 | 18.82 | 2.44 | 15.67 | 30.65 | 13.72 | 15.38 | 4.91 | 25.47 | 10.65 |
| tencent/HY-MT1.5-1.8B | 14.49 | 8.95 | 20.04 | 5.50 | 4.59 | 4.00 | 15.67 | 14.98 | 6.33 | 18.13 | 37.75 | 17.96 |
| shisa-ai/shisa-v2.1-llama3.2-3b | 14.27 | 14.26 | 14.28 | 17.08 | 3.70 | 8.26 | 26.86 | 15.42 | 13.18 | 5.54 | 25.97 | 12.41 |
| google/gemma-2-2b-jpn-it | 14.15 | 16.98 | 11.32 | 20.04 | 8.08 | 11.27 | 31.49 | 14.01 | 12.37 | 4.48 | 16.24 | 12.21 |
| shisa-ai/shisa-v2.1-lfm2-1.2b | 13.08 | 14.02 | 12.14 | 20.93 | 4.95 | 7.68 | 26.72 | 9.80 | 12.11 | 5.54 | 17.60 | 13.30 |
| microsoft/phi-4 | 11.92 | 13.48 | 10.36 | 6.10 | 18.66 | 2.81 | 24.86 | 14.98 | 3.24 | 6.97 | 14.36 | 16.87 |
| tencent/HY-MT1.5-7B | 10.56 | 13.46 | 7.67 | 4.99 | 12.32 | 5.72 | 29.53 | 14.76 | 0.82 | 7.80 | 14.30 | 7.74 |
| tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5 | 10.35 | 12.42 | 8.28 | 24.25 | 2.30 | 3.69 | 14.11 | 17.74 | 6.82 | 2.37 | 11.21 | 12.71 |
| Qwen/Qwen2.5-14B-Instruct | 8.39 | 9.88 | 6.89 | 10.81 | 4.70 | 4.27 | 11.18 | 18.46 | 4.01 | 3.69 | 13.42 | 6.42 |
| meta-llama/Llama-3.2-3B-Instruct | 6.06 | 9.90 | 2.23 | 18.60 | 0.41 | 2.72 | 16.62 | 11.17 | 1.44 | 1.10 | 4.50 | 1.87 |
A detailed experimental evaluation will be present in a technical report.
## Usage
The model supports English to Japanese and Japanese to English translation with the following prompt format:
```python
from transformers import pipeline
# Load the model
chat_pipeline = pipeline("text-generation", model="CyberAgent/CAT-Translate-0.8b")
# Define the prompt template
prompt = "Translate the following {src_lang} text into {tgt_lang}.\n\n{src_text}"
# Example: Japanese to English
src_lang = "Japanese"
tgt_lang = "English"
src_text = "🐈はとてもかわいいの。おててがまるくてふわふわなの。"
user_input = [{"role": "user", "content": prompt.format(src_lang=src_lang, tgt_lang=tgt_lang, src_text=src_text)}]
response = chat_pipeline(user_input, max_new_tokens=512)
print("-" * 20)
print("Source Text:")
print(src_text)
print("Translation:")
print(response[0]['generated_text'][-1]['content'])
```
**Important**: You need to apply the chat template to run the model correctly. The template is the same as [sarashina2.2-0.5b-instruct-v0.1](https://huggingface.co/sbintuitions/sarashina2.2-0.5b-instruct-v0.1).
### Why Use Instructions?
Although the model is specialized for machine translation, we require an instruction prompt to invoke the translation capability. This design choice provides better customizability—extending and merging this model is easier this way. Since the model is open source, any extensions are welcome!
## Training
We used the [sarashina2.2 series](https://huggingface.co/collections/sbintuitions/sarashina22) ([MIT LICENSE](https://huggingface.co/sbintuitions/sarashina2.2-0.5b/blob/main/LICENSE)) as our pretrained model. While Qwen-3 showed higher benchmark scores, we found that sarashina generated more natural Japanese text that avoided "translationese" patterns. We hypothesized that naturalness is more difficult to learn than translation accuracy, leading us to choose sarashina as our base model.
Our training process involved:
- Synthesizing parallel corpora from monolingual data using large language models
- Two-stage supervised fine-tuning (SFT) approach
- Reinforcement learning with [Multi-Objective GRPO (Ichihara et al. 2025)](https://arxiv.org/abs/2509.22047)
- LoRA for efficient training
For detailed information about our training methodology, data preparation, and technical specifications, please see [TRAINING.md](TRAINING.md).
## License
The model is licensed under the [MIT License](LICENSE).
## Citation
```bibtex
@misc{cat-translate-2026,
title={CAT-Translate: Tiny Language Model For Japanese and English Bidirectional Translation},
author={Yuu Jinnai},
year={2026},
url={https://huggingface.co/collections/cyberagent/cat-translate}
}
```
## Acknowledgments
This project stands on the shoulders of giants. In particular, the following resources significantly helped us develop the model:
- [sarashina](https://huggingface.co/sbintuitions) by SB Intuitions
- [gpt-oss](https://huggingface.co/openai/gpt-oss-20b) by OpenAI
- [MetricX](https://huggingface.co/google/metricx-24-hybrid-xl-v2p6-bfloat16) by Juraj Juraska et al.
- [Duplodocus](https://github.com/allenai/duplodocus) by AllenAI
- [fastText](https://github.com/facebookresearch/fastText) by Facebook Research
- [COMET](https://huggingface.co/Unbabel/wmt22-comet-da) by Ricardo Rei et al.
- [sacrebleu](https://github.com/mjpost/sacrebleu) by Matt Post
- Mitsuki Sakamoto for deploying the model with UI for internal testing |