Improve model card: add paper link and fix citation

Hi! I'm Niels from the community science team at Hugging Face. I'm opening this PR to improve your model card by adding a link to the [Fish Audio S2 Technical Report](https://huggingface.co/papers/2603.08823) and fixing a small syntax error in the BibTeX citation. This will help users better discover and cite your work.

Files changed (1) hide show

README.md +11 -12

README.md CHANGED Viewed

@@ -1,9 +1,4 @@
 ---
-tags:
-- text-to-speech
-license: other
-license_name: fish-audio-research-license
-license_link: LICENSE.md
 language:
 - zh
 - en
@@ -18,7 +13,7 @@ language:
 - sv
 - it
 - tr
-- "no"
 - nl
 - cy
 - eu
@@ -88,23 +83,26 @@ language:
 - as
 - gu
 - fo
 pipeline_tag: text-to-speech
 inference: false
-extra_gated_prompt: >-
-  You agree to not use the model to generate contents that violate DMCA or local
-  laws.
 extra_gated_fields:
   Country: country
   Specific date: date_picker
   I agree to use this model for non-commercial use ONLY: checkbox
 ---
 # Fish Audio S2 Pro
 <img src="overview.png" alt="Fish Audio S2 Pro overview — fine-grained control, multi-speaker multi-turn generation, low-latency streaming, and long-context inference." width="100%">
-**Fish Audio S2 Pro** is a leading text-to-speech (TTS) model with fine-grained inline control of prosody and emotion. Trained on over 10M+ hours of audio data across 80+ languages, the system combines reinforcement learning alignment with a dual-autoregressive architecture. The release includes model weights, fine-tuning code, and an SGLang-based streaming inference engine.
 ## Architecture
@@ -160,8 +158,9 @@ If you find our work useful, please consider citing our report:
       archivePrefix={arXiv},
       primaryClass={cs.SD},
       url={https://arxiv.org/abs/2603.08823},
 ```
 ## License
-This model is licensed under the [Fish Audio Research License](LICENSE.md). Research and non-commercial use is permitted free of charge. Commercial use requires a separate license from Fish Audio — contact business@fish.audio.

 ---
 language:
 - zh
 - en
 - sv
 - it
 - tr
+- 'no'
 - nl
 - cy
 - eu
 - as
 - gu
 - fo
+license: other
+license_name: fish-audio-research-license
+license_link: LICENSE.md
 pipeline_tag: text-to-speech
+tags:
+- text-to-speech
 inference: false
+extra_gated_prompt: You agree to not use the model to generate contents that violate
+  DMCA or local laws.
 extra_gated_fields:
   Country: country
   Specific date: date_picker
   I agree to use this model for non-commercial use ONLY: checkbox
 ---
 # Fish Audio S2 Pro
 <img src="overview.png" alt="Fish Audio S2 Pro overview — fine-grained control, multi-speaker multi-turn generation, low-latency streaming, and long-context inference." width="100%">
+**Fish Audio S2 Pro** is a leading text-to-speech (TTS) model with fine-grained inline control of prosody and emotion, introduced in the [Fish Audio S2 Technical Report](https://huggingface.co/papers/2603.08823). Trained on over 10M+ hours of audio data across 80+ languages, the system combines reinforcement learning alignment with a dual-autoregressive architecture. The release includes model weights, fine-tuning code, and an SGLang-based streaming inference engine.
 ## Architecture
       archivePrefix={arXiv},
       primaryClass={cs.SD},
       url={https://arxiv.org/abs/2603.08823},
+}
 ```
 ## License
+This model is licensed under the [Fish Audio Research License](LICENSE.md). Research and non-commercial use is permitted free of charge. Commercial use requires a separate license from Fish Audio — contact business@fish.audio.