Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,10 +1,110 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: OpenSakura
|
| 3 |
+
emoji: 🌸
|
| 4 |
+
colorFrom: pink
|
| 5 |
+
colorTo: purple
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
+
<div align="center">
|
| 11 |
+
<img src="https://raw.githubusercontent.com/OpenSakura/.github/main/profile/OpenSakura_512.png" width="200" alt="OpenSakura Logo">
|
| 12 |
+
|
| 13 |
+
<h1>🌸 OpenSakura 🌸</h1>
|
| 14 |
+
|
| 15 |
+
<p>
|
| 16 |
+
<a href="https://github.com/OpenSakura"><img src="https://img.shields.io/badge/GitHub-OpenSakura-181717?logo=github&logoColor=white" alt="GitHub"></a>
|
| 17 |
+
<a href="https://bench.opensakura.com/"><img src="https://img.shields.io/badge/📊_Benchmark-Dashboard-blue" alt="Benchmark Dashboard"></a>
|
| 18 |
+
<a href="https://arena.opensakura.com/"><img src="https://img.shields.io/badge/⚔️_Arena-Translation-red" alt="Translation Arena"></a>
|
| 19 |
+
</p>
|
| 20 |
+
|
| 21 |
+
<p>
|
| 22 |
+
<img src="https://readme-typing-svg.demolab.com?font=Fira+Code&weight=500&size=20&pause=1000&color=F194B4¢er=true&vCenter=true&width=600&lines=Domain-specific+LLM+translation;Done+in+public%2C+with+receipts;Light+novels%2C+visual+novels%2C+and+more" alt="Typing SVG" />
|
| 23 |
+
</p>
|
| 24 |
+
</div>
|
| 25 |
+
|
| 26 |
+
---
|
| 27 |
+
|
| 28 |
+
<h2 id="english">🌸 About Us</h2>
|
| 29 |
+
|
| 30 |
+
We build datasets, train models, run evaluations, and publish benchmarks for **literary and niche-domain translation**. Light novels, visual novels, galgames, web fiction — the domains where generic Machine Translation collapses into polite nonsense.
|
| 31 |
+
|
| 32 |
+
At **OpenSakura**, we believe in open science and reproducible results. Everything we do is done in public, with receipts.
|
| 33 |
+
|
| 34 |
+
### 🎯 What We Do
|
| 35 |
+
|
| 36 |
+
- 📚 **Datasets** — Curated, schema-validated, with documented lineage and licensing. High-quality parallel corpora designed specifically for our target domains.
|
| 37 |
+
- 🔬 **Experiments** — Fully reproducible training runs with pinned revisions, deterministic setups, and meticulously logged metrics.
|
| 38 |
+
- 📊 **Benchmarks** — Comprehensive LLM-as-a-judge pairwise evaluations, establishing Elo-style rankings for translation models.
|
| 39 |
+
- ⚔️ **Arena** — Blind A/B human evaluation platform to gather high-fidelity preference data for RLHF and DPO.
|
| 40 |
+
|
| 41 |
+
---
|
| 42 |
+
|
| 43 |
+
### 🚀 Quick Links
|
| 44 |
+
|
| 45 |
+
| Resource | URL |
|
| 46 |
+
|---|---|
|
| 47 |
+
| 📊 **Benchmark Dashboard** | [bench.opensakura.com](https://bench.opensakura.com/) |
|
| 48 |
+
| ⚔️ **Translation Arena** | [arena.opensakura.com](https://arena.opensakura.com/) |
|
| 49 |
+
| 🐙 **GitHub** | [github.com/OpenSakura](https://github.com/OpenSakura) |
|
| 50 |
+
|
| 51 |
+
---
|
| 52 |
+
|
| 53 |
+
### 🤝 Get Involved
|
| 54 |
+
|
| 55 |
+
We welcome contributions of all kinds! Whether you want to contribute **datasets, tooling, evaluation scripts, documentation**, or just **file issues** when something breaks, we need your help!
|
| 56 |
+
|
| 57 |
+
👉 **Start here: [Contribution Guidelines](https://github.com/OpenSakura/Contributors-You-Dare-Not-Watch-This)**
|
| 58 |
+
|
| 59 |
+
*If you use OpenSakura artifacts and they help you, tell a friend. If they don't, tell us what failed (with examples).*
|
| 60 |
+
|
| 61 |
+
---
|
| 62 |
+
|
| 63 |
+
### 🌱 Origins
|
| 64 |
+
|
| 65 |
+
OpenSakura grew out of the [SakuraLLM](https://huggingface.co/SakuraLLM) community — the pioneering open-source project for Japanese-to-Chinese ACGN translation models. Members of that community came together to push the mission further: broader language pairs, rigorous reproducibility, public benchmarks, and a more open development process. We stand on their shoulders and are deeply grateful for the foundation they built.
|
| 66 |
+
|
| 67 |
+
---
|
| 68 |
+
|
| 69 |
+
<h2 id="中文介绍" align="center">🌸 中文介绍 🌸</h2>
|
| 70 |
+
|
| 71 |
+
<p align="center">
|
| 72 |
+
<img src="https://readme-typing-svg.demolab.com?font=Fira+Code&weight=500&size=20&pause=1000&color=F194B4¢er=true&vCenter=true&width=600&lines=%E4%B8%93%E6%B3%A8%E4%BA%8E%E7%89%B9%E5%AE%9A%E9%A2%86%E5%9F%9F%E7%9A%84+LLM+%E7%BF%BB%E8%AF%91;%E5%85%AC%E5%BC%80%E9%80%8F%E6%98%8E%EF%BC%8C%E6%9C%89%E8%BF%B9%E5%8F%AF%E5%BE%AA;%E8%BD%BB%E5%B0%8F%E8%AF%B4%E3%80%81%E8%A7%86%E8%A7%89%E5%B0%8F%E8%AF%B4%E5%8F%8A%E6%9B%B4%E5%A4%9A" alt="Typing SVG" />
|
| 73 |
+
</p>
|
| 74 |
+
|
| 75 |
+
OpenSakura 是一个专注于**特定领域大语言模型 (LLM) 翻译**的开源社区项目。我们致力于构建高质量数据集、训练专属模型、执行严格评测并发布权威基准——全面覆盖**轻小说、视觉小说 (Visual Novel)、Galgame、网络小说**等通用机器翻译极易“翻车”的垂直领域。
|
| 76 |
+
|
| 77 |
+
### 🎯 我们做什么
|
| 78 |
+
|
| 79 |
+
- 📚 **数据集** — 经过精心整理与严格校验,具备完整的来源追溯与明确的开源许可。
|
| 80 |
+
- 🔬 **实验** — 保证绝对可复现的训练流程,锁定数据集与模型版本,并记录完整的训练指标。
|
| 81 |
+
- 📊 **基准测试 (Benchmarks)** — 引入基于 LLM 作为裁判 (LLM-as-a-judge) 的成对评测机制,建立科学的 Elo 排名体系。
|
| 82 |
+
- ⚔️ **竞技场 (Arena)** — 开展盲评 A/B 人工评测,为后续的 RLHF 和 DPO 算法提供高质量的人类偏好数据。
|
| 83 |
+
|
| 84 |
+
---
|
| 85 |
+
|
| 86 |
+
### 🚀 快速链接
|
| 87 |
+
|
| 88 |
+
| 资源 | 链接 |
|
| 89 |
+
|---|---|
|
| 90 |
+
| 📊 **基准测试看板** | [bench.opensakura.com](https://bench.opensakura.com/) |
|
| 91 |
+
| ⚔️ **翻译竞��场** | [arena.opensakura.com](https://arena.opensakura.com/) |
|
| 92 |
+
| 🐙 **GitHub** | [github.com/OpenSakura](https://github.com/OpenSakura) |
|
| 93 |
+
|
| 94 |
+
---
|
| 95 |
+
|
| 96 |
+
### 🤝 参与贡献
|
| 97 |
+
|
| 98 |
+
我们热烈欢迎各种形式的开源贡献!无论是**提供数据集、开发工具链、编写评测脚本、完善文档**,还是仅仅**提交一个 issue** 报告 Bug,我们都需要你的力量!
|
| 99 |
+
|
| 100 |
+
👉 **从这里开始: [贡献指南](https://github.com/OpenSakura/Contributors-You-Dare-Not-Watch-This)**
|
| 101 |
+
|
| 102 |
+
*如果你觉得 OpenSakura 的产出对你有帮助,请推荐给你的朋友们!如果遇到糟糕的翻译结果,请带上具体的例子向我们反馈,帮助我们不断改进。*
|
| 103 |
+
|
| 104 |
+
---
|
| 105 |
+
|
| 106 |
+
### 🌱 社区起源
|
| 107 |
+
|
| 108 |
+
OpenSakura 脱胎于 [SakuraLLM](https://huggingface.co/SakuraLLM) 社区——那是日中 ACGN 翻译领域的开源先驱。怀揣着将这项事业推向更高峰的愿景,社区成员们再次集结:我们致力于**支持更丰富的语言对、贯彻更严格的可复现性标准、构建公开透明的基准评测,并推行更加开放的开发流程**。
|
| 109 |
+
|
| 110 |
+
我们站在前人的肩膀上,对 SakuraLLM 奠定的坚实基础致以最深的敬意与感谢。
|