Add link to technical report and GitHub, fix citation, and refine metadata (#4)

- Add link to technical report and GitHub, fix citation, and refine metadata (a1a729b263c88654dad6d50601f269613ab40717)

Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +15 -12

README.md CHANGED Viewed

@@ -1,9 +1,4 @@
 ---
-tags:
-- text-to-speech
-license: other
-license_name: fish-audio-research-license
-license_link: LICENSE.md
 language:
 - zh
 - en
@@ -18,7 +13,7 @@ language:
 - sv
 - it
 - tr
-- "no"
 - nl
 - cy
 - eu
@@ -88,22 +83,29 @@ language:
 - as
 - gu
 - fo
 pipeline_tag: text-to-speech
 inference: false
-extra_gated_prompt: >-
-  You agree to not use the model to generate contents that violate DMCA or local
-  laws.
 extra_gated_fields:
   Country: country
   Specific date: date_picker
   I agree to use this model for non-commercial use ONLY: checkbox
 ---
 # Fish Audio S2 Pro
 <img src="overview.png" alt="Fish Audio S2 Pro overview — fine-grained control, multi-speaker multi-turn generation, low-latency streaming, and long-context inference." width="100%">
 **Fish Audio S2 Pro** is a leading text-to-speech (TTS) model with fine-grained inline control of prosody and emotion. Trained on over 10M+ hours of audio data across 80+ languages, the system combines reinforcement learning alignment with a dual-autoregressive architecture. The release includes model weights, fine-tuning code, and an SGLang-based streaming inference engine.
 ## Architecture
@@ -131,7 +133,7 @@ S2 Pro supports 80+ languages.
 **Tier 2:** Korean (ko), Spanish (es), Portuguese (pt), Arabic (ar), Russian (ru), French (fr), German (de)
-**Other supported languages:** sv, it, tr, no, nl, cy, eu, ca, da, gl, ta, hu, fi, pl, et, hi, la, ur, th, vi, jw, bn, yo, sl, cs, sw, nn, he, ms, uk, id, kk, bg, lv, my, tl, sk, ne, fa, af, el, bo, hr, ro, sn, mi, yi, am, be, km, is, az, sd, br, sq, ps, mn, ht, ml, sr, sa, te, ka, bs, pa, lt, kn, si, hy, mr, as, gu, fo, and more.
 ## Production Streaming Performance
@@ -160,8 +162,9 @@ If you find our work useful, please consider citing our report:
       archivePrefix={arXiv},
       primaryClass={cs.SD},
       url={https://arxiv.org/abs/2603.08823},
 ```
 ## License
-This model is licensed under the [Fish Audio Research License](LICENSE.md). Research and non-commercial use is permitted free of charge. Commercial use requires a separate license from Fish Audio — contact business@fish.audio.

 ---
 language:
 - zh
 - en
 - sv
 - it
 - tr
+- 'no'
 - nl
 - cy
 - eu
 - as
 - gu
 - fo
+license: other
+license_name: fish-audio-research-license
+license_link: LICENSE.md
 pipeline_tag: text-to-speech
+tags:
+- text-to-speech
+- instruction-following
+- multilingual
 inference: false
+extra_gated_prompt: You agree to not use the model to generate contents that violate
+  DMCA or local laws.
 extra_gated_fields:
   Country: country
   Specific date: date_picker
   I agree to use this model for non-commercial use ONLY: checkbox
 ---
 # Fish Audio S2 Pro
 <img src="overview.png" alt="Fish Audio S2 Pro overview — fine-grained control, multi-speaker multi-turn generation, low-latency streaming, and long-context inference." width="100%">
+[**Technical Report**](https://huggingface.co/papers/2603.08823) | [**GitHub**](https://github.com/fishaudio/fish-speech) | [**Playground**](https://fish.audio)
 **Fish Audio S2 Pro** is a leading text-to-speech (TTS) model with fine-grained inline control of prosody and emotion. Trained on over 10M+ hours of audio data across 80+ languages, the system combines reinforcement learning alignment with a dual-autoregressive architecture. The release includes model weights, fine-tuning code, and an SGLang-based streaming inference engine.
 ## Architecture
 **Tier 2:** Korean (ko), Spanish (es), Portuguese (pt), Arabic (ar), Russian (ru), French (fr), German (de)
+**Other supported languages:** sv, it, tr, no, nl, cy, eu, ca, da, gl, ta, hu, fi, pl, et, hi, la, ur, th, vi, jw, bn, yo, xsl, cs, sw, nn, he, ms, uk, id, kk, bg, lv, my, tl, sk, ne, fa, af, el, bo, hr, ro, sn, mi, yi, am, be, km, is, az, sd, br, sq, ps, mn, ht, ml, sr, sa, te, ka, bs, pa, lt, kn, si, hy, mr, as, gu, fo, and more.
 ## Production Streaming Performance
       archivePrefix={arXiv},
       primaryClass={cs.SD},
       url={https://arxiv.org/abs/2603.08823},
+}
 ```
 ## License
+This model is licensed under the [Fish Audio Research License](LICENSE.md). Research and non-commercial use is permitted free of charge. Commercial use requires a separate license from Fish Audio — contact business@fish.audio.