--- title: OpenSakura emoji: 🌸 colorFrom: pink colorTo: purple sdk: static pinned: false ---

🌸 OpenSakura 🌸

Typing SVG

---

🌸 About Us

We build datasets, train models, run evaluations, and publish benchmarks for **literary and niche-domain translation**. Light novels, visual novels, galgames, web fiction — the domains where generic Machine Translation collapses into polite nonsense. At **OpenSakura**, we believe in open science and reproducible results. Everything we do is done in public, with receipts. ### 🎯 What We Do - 📚 **Datasets** — Curated, schema-validated, with documented lineage and licensing. High-quality parallel corpora designed specifically for our target domains. - 🔬 **Experiments** — Fully reproducible training runs with pinned revisions, deterministic setups, and meticulously logged metrics. - 📊 **Benchmarks** — Comprehensive LLM-as-a-judge pairwise evaluations, establishing Elo-style rankings for translation models. - ⚔️ **Arena** — Blind A/B human evaluation platform to gather high-fidelity preference data for RLHF and DPO. --- ### 🚀 Quick Links | Resource | URL | |---|---| | 📊 **Benchmark Dashboard** | [bench.opensakura.com](https://bench.opensakura.com/) | | ⚔️ **Translation Arena** | [arena.opensakura.com](https://arena.opensakura.com/) | | 🐙 **GitHub** | [github.com/OpenSakura](https://github.com/OpenSakura) | --- ### 🤝 Get Involved We welcome contributions of all kinds! Whether you want to contribute **datasets, tooling, evaluation scripts, documentation**, or just **file issues** when something breaks, we need your help! 👉 **Start here: [Contribution Guidelines](https://github.com/OpenSakura/Contributors-You-Dare-Not-Watch-This)** *If you use OpenSakura artifacts and they help you, tell a friend. If they don't, tell us what failed (with examples).* --- ### 🌱 Origins OpenSakura grew out of the [SakuraLLM](https://huggingface.co/SakuraLLM) community — the pioneering open-source project for Japanese-to-Chinese ACGN translation models. Members of that community came together to push the mission further: broader language pairs, rigorous reproducibility, public benchmarks, and a more open development process. We stand on their shoulders and are deeply grateful for the foundation they built. ---

🌸 中文介绍 🌸

Typing SVG

OpenSakura 是一个专注于**特定领域大语言模型 (LLM) 翻译**的开源社区项目。我们致力于构建高质量数据集、训练专属模型、执行严格评测并发布权威基准——全面覆盖**轻小说、视觉小说 (Visual Novel)、Galgame、网络小说**等通用机器翻译极易“翻车”的垂直领域。 ### 🎯 我们做什么 - 📚 **数据集** — 经过精心整理与严格校验，具备完整的来源追溯与明确的开源许可。 - 🔬 **实验** — 保证绝对可复现的训练流程，锁定数据集与模型版本，并记录完整的训练指标。 - 📊 **基准测试 (Benchmarks)** — 引入基于 LLM 作为裁判 (LLM-as-a-judge) 的成对评测机制，建立科学的 Elo 排名体系。 - ⚔️ **竞技场 (Arena)** — 开展盲评 A/B 人工评测，为后续的 RLHF 和 DPO 算法提供高质量的人类偏好数据。 --- ### 🚀 快速链接 | 资源 | 链接 | |---|---| | 📊 **基准测试看板** | [bench.opensakura.com](https://bench.opensakura.com/) | | ⚔️ **翻译竞技场** | [arena.opensakura.com](https://arena.opensakura.com/) | | 🐙 **GitHub** | [github.com/OpenSakura](https://github.com/OpenSakura) | --- ### 🤝 参与贡献我们热烈欢迎各种形式的开源贡献！无论是**提供数据集、开发工具链、编写评测脚本、完善文档**，还是仅仅**提交一个 issue** 报告 Bug，我们都需要你的力量！ 👉 **从这里开始: [贡献指南](https://github.com/OpenSakura/Contributors-You-Dare-Not-Watch-This)** *如果你觉得 OpenSakura 的产出对你有帮助，请推荐给你的朋友们！如果遇到糟糕的翻译结果，请带上具体的例子向我们反馈，帮助我们不断改进。* --- ### 🌱 社区起源 OpenSakura 脱胎于 [SakuraLLM](https://huggingface.co/SakuraLLM) 社区——那是日中 ACGN 翻译领域的开源先驱。怀揣着将这项事业推向更高峰的愿景，社区成员们再次集结：我们致力于**支持更丰富的语言对、贯彻更严格的可复现性标准、构建公开透明的基准评测，并推行更加开放的开发流程**。我们站在前人的肩膀上，对 SakuraLLM 奠定的坚实基础致以最深的敬意与感谢。