lengyue233 commited on
Commit
0fd528f
·
verified ·
1 Parent(s): 7700840

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ overview.png filter=lfs diff=lfs merge=lfs -text
37
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
LICENSE.md ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # FISH AUDIO RESEARCH LICENSE AGREEMENT
2
+
3
+ **Last Updated: March 7, 2026**
4
+
5
+ ## I. INTRODUCTION
6
+
7
+ This Agreement applies to any individual person or entity ("You", "Your" or "Licensee") that uses or distributes any portion or element of the Fish Audio Materials or Derivative Works thereof for any Research, Non-Commercial, or Commercial purpose. Capitalized terms not otherwise defined herein are defined in Section V below.
8
+
9
+ This Agreement is intended to allow research and non-commercial uses of the Materials free of charge. Any Commercial use of the Materials requires a separate license from Fish Audio.
10
+
11
+ By clicking "I Accept" or by using, distributing, or accessing any portion or element of the Fish Audio Materials or Derivative Works, You agree that You have read, understood and are bound by the terms of this Agreement. If You are acting on behalf of a company, organization or other entity, then "You" includes you and that entity, and You agree that You: (i) are an authorized representative of such entity with the authority to bind such entity to this Agreement, and (ii) You agree to the terms of this Agreement on that entity's behalf.
12
+
13
+ ## II. RESEARCH & NON-COMMERCIAL USE LICENSE
14
+
15
+ Subject to the terms of this Agreement, Fish Audio grants You a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable and royalty-free limited license under Fish Audio's intellectual property or other rights owned by Fish Audio embodied in the Fish Audio Materials to use, reproduce, distribute, and create Derivative Works of, and make modifications to, the Fish Audio Materials for any Research or Non-Commercial Purpose.
16
+
17
+ "Research Purpose" means academic or scientific advancement, and in each case, is not primarily intended for commercial advantage or monetary compensation to You or others.
18
+
19
+ "Non-Commercial Purpose" means any purpose other than a Research Purpose that is not primarily intended for commercial advantage or monetary compensation to You or others, such as personal use (i.e., hobbyist) or evaluation and testing.
20
+
21
+ ## III. COMMERCIAL USE
22
+
23
+ **Any use of the Fish Audio Materials or Derivative Works for a Commercial Purpose requires a separate written license agreement from Fish Audio.** No commercial rights are granted under this Agreement.
24
+
25
+ "Commercial Purpose" means any purpose other than a Research Purpose or Non-Commercial Purpose that is primarily intended for or directed toward commercial advantage or monetary compensation to You or others, including but not limited to: (i) creating, modifying, or distributing Your product or service, including via a hosted service or application programming interface, (ii) Your business's or organization's internal operations, and (iii) any use in connection with a product or service for which You charge a fee or generate revenue, whether directly or indirectly.
26
+
27
+ To obtain a commercial license, please contact Fish Audio at:
28
+
29
+ - **Website:** [https://fish.audio](https://fish.audio)
30
+ - **Email:** business@fish.audio
31
+
32
+ ## IV. GENERAL TERMS
33
+
34
+ Your Research and Non-Commercial License under this Agreement is subject to the following terms.
35
+
36
+ ### a. Distribution & Attribution
37
+
38
+ If You distribute or make available the Fish Audio Materials or a Derivative Work to a third party, or a product or service that uses any portion of them, You shall: (i) provide a copy of this Agreement to that third party, (ii) retain the following attribution notice within a "Notice" text file distributed as a part of such copies: "This model is licensed under the Fish Audio Research License, Copyright © 39 AI, INC. All Rights Reserved.", and (iii) prominently display "Built with Fish Audio" on a related website, user interface, blogpost, about page, or product documentation.
39
+
40
+ If You create a Derivative Work, You may add your own attribution notice(s) to the "Notice" text file included with that Derivative Work, provided that You clearly indicate which attributions apply to the Fish Audio Materials and state in the "Notice" text file that You changed the Fish Audio Materials and how it was modified.
41
+
42
+ ### b. Use Restrictions
43
+
44
+ Your use of the Fish Audio Materials and Derivative Works, including any output or results of the Fish Audio Materials or Derivative Works, must comply with applicable laws and regulations (including Trade Control Laws and equivalent regulations) and adhere to Fish Audio's Acceptable Use Policy, which is hereby incorporated by reference.
45
+
46
+ Furthermore, You will not use the Fish Audio Materials or Derivative Works, or any output or results of the Fish Audio Materials or Derivative Works, to create or improve any foundational generative AI model (excluding the Models or Derivative Works).
47
+
48
+ ### c. Intellectual Property
49
+
50
+ **(i) Trademark License.** No trademark licenses are granted under this Agreement, and in connection with the Fish Audio Materials or Derivative Works, You may not use any name or mark owned by or associated with Fish Audio or any of its Affiliates, except as required under Section IV(a) herein.
51
+
52
+ **(ii) Ownership of Derivative Works.** As between You and Fish Audio, You are the owner of Derivative Works You create, subject to Fish Audio's ownership of the Fish Audio Materials and any Derivative Works made by or for Fish Audio.
53
+
54
+ **(iii) Ownership of Outputs.** As between You and Fish Audio, You own any outputs generated from the Models or Derivative Works to the extent permitted by applicable law.
55
+
56
+ **(iv) Disputes.** If You or Your Affiliate(s) institute litigation or other proceedings against Fish Audio (including a cross-claim or counterclaim in a lawsuit) alleging that the Fish Audio Materials, Derivative Works or associated outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by You, then any licenses granted to You under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Fish Audio from and against any claim by any third party arising out of or related to Your use or distribution of the Fish Audio Materials or Derivative Works in violation of this Agreement.
57
+
58
+ **(v) Feedback.** From time to time, You may provide Fish Audio with verbal and/or written suggestions, comments or other feedback related to Fish Audio's existing or prospective technology, products or services (collectively, "Feedback"). You are not obligated to provide Fish Audio with Feedback, but to the extent that You do, You hereby grant Fish Audio a perpetual, irrevocable, royalty-free, fully-paid, sub-licensable, transferable, non-exclusive, worldwide right and license to exploit the Feedback in any manner without restriction. Your Feedback is provided "AS IS" and You make no warranties whatsoever about any Feedback.
59
+
60
+ ### d. Disclaimer of Warranty
61
+
62
+ UNLESS REQUIRED BY APPLICABLE LAW, THE FISH AUDIO MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OR LAWFULNESS OF USING OR REDISTRIBUTING THE FISH AUDIO MATERIALS, DERIVATIVE WORKS OR ANY OUTPUT OR RESULTS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE FISH AUDIO MATERIALS, DERIVATIVE WORKS AND ANY OUTPUT AND RESULTS.
63
+
64
+ ### e. Limitation of Liability
65
+
66
+ IN NO EVENT WILL FISH AUDIO OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY DIRECT, INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF FISH AUDIO OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
67
+
68
+ ### f. Term and Termination
69
+
70
+ The term of this Agreement will commence upon Your acceptance of this Agreement or access to the Fish Audio Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Fish Audio may terminate this Agreement if You are in breach of any term or condition of this Agreement. Upon termination of this Agreement, You shall delete and cease use of any Fish Audio Materials or Derivative Works. Sections IV(d), (e), and (g) shall survive the termination of this Agreement.
71
+
72
+ ### g. Governing Law
73
+
74
+ This Agreement will be governed by and construed in accordance with the laws of the United States and the State of California without regard to choice of law principles, and the UN Convention on Contracts for International Sale of Goods does not apply to this Agreement.
75
+
76
+ ## V. DEFINITIONS
77
+
78
+ **"Affiliate(s)"** means any entity that directly or indirectly controls, is controlled by, or is under common control with the subject entity; for purposes of this definition, "control" means direct or indirect ownership or control of more than 50% of the voting interests of the subject entity.
79
+
80
+ **"Agreement"** means this Fish Audio Research License Agreement.
81
+
82
+ **"Derivative Work(s)"** means (a) any derivative work of the Fish Audio Materials as recognized by U.S. copyright laws and (b) any modifications to a Model, and any other model created which is based on or derived from the Model or the Model's output, including "fine tune" and "low-rank adaptation" models derived from a Model or a Model's output, but do not include the output of any Model.
83
+
84
+ **"Documentation"** means any specifications, manuals, documentation, and other written information provided by Fish Audio related to the Software or Models.
85
+
86
+ **"Fish Audio"** or **"we"** means 39 AI, INC. and its Affiliates.
87
+
88
+ **"Model(s)"** means, collectively, Fish Audio's proprietary models and algorithms, including machine-learning models, trained model weights and other elements of the foregoing.
89
+
90
+ **"Software"** means Fish Audio's proprietary software made available under this Agreement now or in the future.
91
+
92
+ **"Fish Audio Materials"** means, collectively, Fish Audio's proprietary Models, Software and Documentation (and any portion or combination thereof) made available under this Agreement.
93
+
94
+ **"Trade Control Laws"** means any applicable U.S. and non-U.S. export control and trade sanctions laws and regulations.
README.md ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - text-to-speech
4
+ license: other
5
+ license_name: fish-audio-research-license
6
+ license_link: LICENSE.md
7
+ language:
8
+ - zh
9
+ - en
10
+ - ja
11
+ - ko
12
+ - es
13
+ - pt
14
+ - ar
15
+ - ru
16
+ - fr
17
+ - de
18
+ - sv
19
+ - it
20
+ - tr
21
+ - "no"
22
+ - nl
23
+ - cy
24
+ - eu
25
+ - ca
26
+ - da
27
+ - gl
28
+ - ta
29
+ - hu
30
+ - fi
31
+ - pl
32
+ - et
33
+ - hi
34
+ - la
35
+ - ur
36
+ - th
37
+ - vi
38
+ - jw
39
+ - bn
40
+ - yo
41
+ - sl
42
+ - cs
43
+ - sw
44
+ - nn
45
+ - he
46
+ - ms
47
+ - uk
48
+ - id
49
+ - kk
50
+ - bg
51
+ - lv
52
+ - my
53
+ - tl
54
+ - sk
55
+ - ne
56
+ - fa
57
+ - af
58
+ - el
59
+ - bo
60
+ - hr
61
+ - ro
62
+ - sn
63
+ - mi
64
+ - yi
65
+ - am
66
+ - be
67
+ - km
68
+ - is
69
+ - az
70
+ - sd
71
+ - br
72
+ - sq
73
+ - ps
74
+ - mn
75
+ - ht
76
+ - ml
77
+ - sr
78
+ - sa
79
+ - te
80
+ - ka
81
+ - bs
82
+ - pa
83
+ - lt
84
+ - kn
85
+ - si
86
+ - hy
87
+ - mr
88
+ - as
89
+ - gu
90
+ - fo
91
+ pipeline_tag: text-to-speech
92
+ inference: false
93
+ extra_gated_prompt: >-
94
+ You agree to not use the model to generate contents that violate DMCA or local
95
+ laws.
96
+ extra_gated_fields:
97
+ Country: country
98
+ Specific date: date_picker
99
+ I agree to use this model for non-commercial use ONLY: checkbox
100
+ ---
101
+
102
+
103
+ # Fish Audio S2 Pro
104
+
105
+ <img src="overview.png" alt="Fish Audio S2 Pro overview — fine-grained control, multi-speaker multi-turn generation, low-latency streaming, and long-context inference." width="100%">
106
+
107
+ **Fish Audio S2 Pro** is a leading text-to-speech (TTS) model with fine-grained inline control of prosody and emotion. Trained on over 10M+ hours of audio data across 80+ languages, the system combines reinforcement learning alignment with a dual-autoregressive architecture. The release includes model weights, fine-tuning code, and an SGLang-based streaming inference engine.
108
+
109
+ ## Architecture
110
+
111
+ S2 Pro builds on a decoder-only transformer combined with an RVQ-based audio codec (10 codebooks, ~21 Hz frame rate) using a **Dual-Autoregressive (Dual-AR)** architecture:
112
+
113
+ - **Slow AR** (4B parameters): Operates along the time axis and predicts the primary semantic codebook.
114
+ - **Fast AR** (400M parameters): Generates the remaining 9 residual codebooks at each time step, reconstructing fine-grained acoustic detail.
115
+
116
+ This asymmetric design keeps inference efficient while preserving audio fidelity. Because the Dual-AR architecture is structurally isomorphic to standard autoregressive LLMs, it inherits all LLM-native serving optimizations from SGLang — including continuous batching, paged KV cache, CUDA graph replay, and RadixAttention-based prefix caching.
117
+
118
+ ## Fine-Grained Inline Control
119
+
120
+ S2 Pro enables localized control over speech generation by embedding natural-language instructions directly within the text using `[tag]` syntax. Rather than relying on a fixed set of predefined tags, S2 Pro accepts **free-form textual descriptions** — such as `[whisper in small voice]`, `[professional broadcast tone]`, or `[pitch up]` — allowing open-ended expression control at the word level.
121
+
122
+ **Common tags (15,000+ unique tags supported):**
123
+
124
+ `[pause]` `[emphasis]` `[laughing]` `[inhale]` `[chuckle]` `[tsk]` `[singing]` `[excited]` `[laughing tone]` `[interrupting]` `[chuckling]` `[excited tone]` `[volume up]` `[echo]` `[angry]` `[low volume]` `[sigh]` `[low voice]` `[whisper]` `[screaming]` `[shouting]` `[loud]` `[surprised]` `[short pause]` `[exhale]` `[delight]` `[panting]` `[audience laughter]` `[with strong accent]` `[volume down]` `[clearing throat]` `[sad]` `[moaning]` `[shocked]`
125
+
126
+ ## Supported Languages
127
+
128
+ S2 Pro supports 80+ languages.
129
+
130
+ **Tier 1:** Japanese (ja), English (en), Chinese (zh)
131
+
132
+ **Tier 2:** Korean (ko), Spanish (es), Portuguese (pt), Arabic (ar), Russian (ru), French (fr), German (de)
133
+
134
+ **Other supported languages:** sv, it, tr, no, nl, cy, eu, ca, da, gl, ta, hu, fi, pl, et, hi, la, ur, th, vi, jw, bn, yo, sl, cs, sw, nn, he, ms, uk, id, kk, bg, lv, my, tl, sk, ne, fa, af, el, bo, hr, ro, sn, mi, yi, am, be, km, is, az, sd, br, sq, ps, mn, ht, ml, sr, sa, te, ka, bs, pa, lt, kn, si, hy, mr, as, gu, fo, and more.
135
+
136
+ ## Production Streaming Performance
137
+
138
+ On a single NVIDIA H200 GPU:
139
+
140
+ - **Real-Time Factor (RTF):** 0.195
141
+ - **Time-to-first-audio:** ~100 ms
142
+ - **Throughput:** 3,000+ acoustic tokens/s while maintaining RTF below 0.5
143
+
144
+ ## Links
145
+
146
+ - [Fish Speech GitHub](https://github.com/fishaudio/fish-speech)
147
+ - [Fish Audio Playground](https://fish.audio)
148
+ - [Blog & Tech Report](https://fish.audio/blog/)
149
+
150
+ ## License
151
+
152
+ This model is licensed under the [Fish Audio Research License](LICENSE.md). Research and non-commercial use is permitted free of charge. Commercial use requires a separate license from Fish Audio — contact business@fish.audio.
chat_template.jinja ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0].role == 'system' %}
4
+ {{- messages[0].content + '\n\n' }}
5
+ {%- endif %}
6
+ {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
+ {%- for tool in tools %}
8
+ {{- "\n" }}
9
+ {{- tool | tojson }}
10
+ {%- endfor %}
11
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
+ {%- else %}
13
+ {%- if messages[0].role == 'system' %}
14
+ {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
+ {%- endif %}
16
+ {%- endif %}
17
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
18
+ {%- for message in messages[::-1] %}
19
+ {%- set index = (messages|length - 1) - loop.index0 %}
20
+ {%- if ns.multi_step_tool and message.role == "user" and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
21
+ {%- set ns.multi_step_tool = false %}
22
+ {%- set ns.last_query_index = index %}
23
+ {%- endif %}
24
+ {%- endfor %}
25
+ {%- for message in messages %}
26
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
27
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
28
+ {%- elif message.role == "assistant" %}
29
+ {%- set content = message.content %}
30
+ {%- set reasoning_content = '' %}
31
+ {%- if message.reasoning_content is defined and message.reasoning_content is not none %}
32
+ {%- set reasoning_content = message.reasoning_content %}
33
+ {%- else %}
34
+ {%- if '</think>' in message.content %}
35
+ {%- set content = message.content.split('</think>')[-1].lstrip('\n') %}
36
+ {%- set reasoning_content = message.content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
37
+ {%- endif %}
38
+ {%- endif %}
39
+ {%- if loop.index0 > ns.last_query_index %}
40
+ {%- if loop.last or (not loop.last and reasoning_content) %}
41
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
42
+ {%- else %}
43
+ {{- '<|im_start|>' + message.role + '\n' + content }}
44
+ {%- endif %}
45
+ {%- else %}
46
+ {{- '<|im_start|>' + message.role + '\n' + content }}
47
+ {%- endif %}
48
+ {%- if message.tool_calls %}
49
+ {%- for tool_call in message.tool_calls %}
50
+ {%- if (loop.first and content) or (not loop.first) %}
51
+ {{- '\n' }}
52
+ {%- endif %}
53
+ {%- if tool_call.function %}
54
+ {%- set tool_call = tool_call.function %}
55
+ {%- endif %}
56
+ {{- '<tool_call>\n{"name": "' }}
57
+ {{- tool_call.name }}
58
+ {{- '", "arguments": ' }}
59
+ {%- if tool_call.arguments is string %}
60
+ {{- tool_call.arguments }}
61
+ {%- else %}
62
+ {{- tool_call.arguments | tojson }}
63
+ {%- endif %}
64
+ {{- '}\n</tool_call>' }}
65
+ {%- endfor %}
66
+ {%- endif %}
67
+ {{- '<|im_end|>\n' }}
68
+ {%- elif message.role == "tool" %}
69
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
70
+ {{- '<|im_start|>user' }}
71
+ {%- endif %}
72
+ {{- '\n<tool_response>\n' }}
73
+ {{- message.content }}
74
+ {{- '\n</tool_response>' }}
75
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
76
+ {{- '<|im_end|>\n' }}
77
+ {%- endif %}
78
+ {%- endif %}
79
+ {%- endfor %}
80
+ {%- if add_generation_prompt %}
81
+ {{- '<|im_start|>assistant\n' }}
82
+ {%- if enable_thinking is defined and enable_thinking is false %}
83
+ {{- '<think>\n\n</think>\n\n' }}
84
+ {%- endif %}
85
+ {%- endif %}
config.json ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "audio_decoder_config": {
3
+ "attention_o_bias": false,
4
+ "attention_qk_norm": false,
5
+ "attention_qkv_bias": false,
6
+ "audio_hidden_dim": 5120,
7
+ "dim": 2560,
8
+ "dropout": 0.0,
9
+ "head_dim": 128,
10
+ "initializer_range": 0.01976423537605237,
11
+ "intermediate_size": 9728,
12
+ "max_seq_len": 11,
13
+ "model_type": "fish_qwen3_audio_decoder",
14
+ "moe_intermediate_size": 768,
15
+ "n_head": 32,
16
+ "n_layer": 4,
17
+ "n_local_heads": 8,
18
+ "norm_eps": 1e-06,
19
+ "norm_topk_prob": true,
20
+ "num_codebooks": 10,
21
+ "num_experts": 1,
22
+ "num_experts_per_tok": 1,
23
+ "rope_base": 1000000,
24
+ "router_gamma": 0.001,
25
+ "text_dim": 2560,
26
+ "tie_word_embeddings": false,
27
+ "use_aux_loss_free": false,
28
+ "use_bfloat16": false,
29
+ "use_gradient_checkpointing": true,
30
+ "use_moe": false,
31
+ "vocab_size": 4096
32
+ },
33
+ "audio_pad_token_id": 151677,
34
+ "dtype": "bfloat16",
35
+ "eos_token_id": 151645,
36
+ "model_type": "fish_qwen3_omni",
37
+ "pad_token_id": 151669,
38
+ "semantic_end_token_id": 155773,
39
+ "semantic_start_token_id": 151678,
40
+ "text_config": {
41
+ "attention_o_bias": false,
42
+ "attention_qk_norm": true,
43
+ "attention_qkv_bias": false,
44
+ "audio_hidden_dim": 5120,
45
+ "dim": 2560,
46
+ "dropout": 0.0,
47
+ "head_dim": 128,
48
+ "initializer_range": 0.01976423537605237,
49
+ "intermediate_size": 9728,
50
+ "max_seq_len": 32768,
51
+ "model_type": "fish_qwen3",
52
+ "moe_intermediate_size": 768,
53
+ "n_head": 32,
54
+ "n_layer": 36,
55
+ "n_local_heads": 8,
56
+ "norm_eps": 1e-06,
57
+ "norm_topk_prob": true,
58
+ "num_experts": 1,
59
+ "num_experts_per_tok": 1,
60
+ "rope_base": 1000000,
61
+ "router_gamma": 0.001,
62
+ "tie_word_embeddings": true,
63
+ "use_aux_loss_free": false,
64
+ "use_bfloat16": false,
65
+ "use_gradient_checkpointing": true,
66
+ "use_moe": false,
67
+ "vocab_size": 155776
68
+ },
69
+ "transformers_version": "4.57.1"
70
+ }
model-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c4218e8ac93be83b35eee30b4f94cb2e9b5ecff40f3e21611438d2f4f8804aad
3
+ size 4986872984
model-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76738d23465deaac431433232c0762908cc99a6eddc3d49f67307d92680827be
3
+ size 4136876104
model.safetensors.index.json ADDED
@@ -0,0 +1,366 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_parameters": 4561852416,
4
+ "total_size": 9123704832
5
+ },
6
+ "weight_map": {
7
+ "audio_decoder.codebook_embeddings.weight": "model-00002-of-00002.safetensors",
8
+ "audio_decoder.embeddings.weight": "model-00002-of-00002.safetensors",
9
+ "audio_decoder.layers.0.attention.wo.weight": "model-00002-of-00002.safetensors",
10
+ "audio_decoder.layers.0.attention.wqkv.weight": "model-00002-of-00002.safetensors",
11
+ "audio_decoder.layers.0.attention_norm.weight": "model-00002-of-00002.safetensors",
12
+ "audio_decoder.layers.0.feed_forward.w1.weight": "model-00002-of-00002.safetensors",
13
+ "audio_decoder.layers.0.feed_forward.w2.weight": "model-00002-of-00002.safetensors",
14
+ "audio_decoder.layers.0.feed_forward.w3.weight": "model-00002-of-00002.safetensors",
15
+ "audio_decoder.layers.0.ffn_norm.weight": "model-00002-of-00002.safetensors",
16
+ "audio_decoder.layers.1.attention.wo.weight": "model-00002-of-00002.safetensors",
17
+ "audio_decoder.layers.1.attention.wqkv.weight": "model-00002-of-00002.safetensors",
18
+ "audio_decoder.layers.1.attention_norm.weight": "model-00002-of-00002.safetensors",
19
+ "audio_decoder.layers.1.feed_forward.w1.weight": "model-00002-of-00002.safetensors",
20
+ "audio_decoder.layers.1.feed_forward.w2.weight": "model-00002-of-00002.safetensors",
21
+ "audio_decoder.layers.1.feed_forward.w3.weight": "model-00002-of-00002.safetensors",
22
+ "audio_decoder.layers.1.ffn_norm.weight": "model-00002-of-00002.safetensors",
23
+ "audio_decoder.layers.2.attention.wo.weight": "model-00002-of-00002.safetensors",
24
+ "audio_decoder.layers.2.attention.wqkv.weight": "model-00002-of-00002.safetensors",
25
+ "audio_decoder.layers.2.attention_norm.weight": "model-00002-of-00002.safetensors",
26
+ "audio_decoder.layers.2.feed_forward.w1.weight": "model-00002-of-00002.safetensors",
27
+ "audio_decoder.layers.2.feed_forward.w2.weight": "model-00002-of-00002.safetensors",
28
+ "audio_decoder.layers.2.feed_forward.w3.weight": "model-00002-of-00002.safetensors",
29
+ "audio_decoder.layers.2.ffn_norm.weight": "model-00002-of-00002.safetensors",
30
+ "audio_decoder.layers.3.attention.wo.weight": "model-00002-of-00002.safetensors",
31
+ "audio_decoder.layers.3.attention.wqkv.weight": "model-00002-of-00002.safetensors",
32
+ "audio_decoder.layers.3.attention_norm.weight": "model-00002-of-00002.safetensors",
33
+ "audio_decoder.layers.3.feed_forward.w1.weight": "model-00002-of-00002.safetensors",
34
+ "audio_decoder.layers.3.feed_forward.w2.weight": "model-00002-of-00002.safetensors",
35
+ "audio_decoder.layers.3.feed_forward.w3.weight": "model-00002-of-00002.safetensors",
36
+ "audio_decoder.layers.3.ffn_norm.weight": "model-00002-of-00002.safetensors",
37
+ "audio_decoder.norm.weight": "model-00002-of-00002.safetensors",
38
+ "audio_decoder.output.weight": "model-00002-of-00002.safetensors",
39
+ "text_model.model.embeddings.weight": "model-00001-of-00002.safetensors",
40
+ "text_model.model.layers.0.attention.k_norm.weight": "model-00001-of-00002.safetensors",
41
+ "text_model.model.layers.0.attention.q_norm.weight": "model-00001-of-00002.safetensors",
42
+ "text_model.model.layers.0.attention.wo.weight": "model-00001-of-00002.safetensors",
43
+ "text_model.model.layers.0.attention.wqkv.weight": "model-00001-of-00002.safetensors",
44
+ "text_model.model.layers.0.attention_norm.weight": "model-00001-of-00002.safetensors",
45
+ "text_model.model.layers.0.feed_forward.w1.weight": "model-00001-of-00002.safetensors",
46
+ "text_model.model.layers.0.feed_forward.w2.weight": "model-00001-of-00002.safetensors",
47
+ "text_model.model.layers.0.feed_forward.w3.weight": "model-00001-of-00002.safetensors",
48
+ "text_model.model.layers.0.ffn_norm.weight": "model-00001-of-00002.safetensors",
49
+ "text_model.model.layers.1.attention.k_norm.weight": "model-00001-of-00002.safetensors",
50
+ "text_model.model.layers.1.attention.q_norm.weight": "model-00001-of-00002.safetensors",
51
+ "text_model.model.layers.1.attention.wo.weight": "model-00001-of-00002.safetensors",
52
+ "text_model.model.layers.1.attention.wqkv.weight": "model-00001-of-00002.safetensors",
53
+ "text_model.model.layers.1.attention_norm.weight": "model-00001-of-00002.safetensors",
54
+ "text_model.model.layers.1.feed_forward.w1.weight": "model-00001-of-00002.safetensors",
55
+ "text_model.model.layers.1.feed_forward.w2.weight": "model-00001-of-00002.safetensors",
56
+ "text_model.model.layers.1.feed_forward.w3.weight": "model-00001-of-00002.safetensors",
57
+ "text_model.model.layers.1.ffn_norm.weight": "model-00001-of-00002.safetensors",
58
+ "text_model.model.layers.10.attention.k_norm.weight": "model-00001-of-00002.safetensors",
59
+ "text_model.model.layers.10.attention.q_norm.weight": "model-00001-of-00002.safetensors",
60
+ "text_model.model.layers.10.attention.wo.weight": "model-00001-of-00002.safetensors",
61
+ "text_model.model.layers.10.attention.wqkv.weight": "model-00001-of-00002.safetensors",
62
+ "text_model.model.layers.10.attention_norm.weight": "model-00001-of-00002.safetensors",
63
+ "text_model.model.layers.10.feed_forward.w1.weight": "model-00001-of-00002.safetensors",
64
+ "text_model.model.layers.10.feed_forward.w2.weight": "model-00001-of-00002.safetensors",
65
+ "text_model.model.layers.10.feed_forward.w3.weight": "model-00001-of-00002.safetensors",
66
+ "text_model.model.layers.10.ffn_norm.weight": "model-00001-of-00002.safetensors",
67
+ "text_model.model.layers.11.attention.k_norm.weight": "model-00001-of-00002.safetensors",
68
+ "text_model.model.layers.11.attention.q_norm.weight": "model-00001-of-00002.safetensors",
69
+ "text_model.model.layers.11.attention.wo.weight": "model-00001-of-00002.safetensors",
70
+ "text_model.model.layers.11.attention.wqkv.weight": "model-00001-of-00002.safetensors",
71
+ "text_model.model.layers.11.attention_norm.weight": "model-00001-of-00002.safetensors",
72
+ "text_model.model.layers.11.feed_forward.w1.weight": "model-00001-of-00002.safetensors",
73
+ "text_model.model.layers.11.feed_forward.w2.weight": "model-00001-of-00002.safetensors",
74
+ "text_model.model.layers.11.feed_forward.w3.weight": "model-00001-of-00002.safetensors",
75
+ "text_model.model.layers.11.ffn_norm.weight": "model-00001-of-00002.safetensors",
76
+ "text_model.model.layers.12.attention.k_norm.weight": "model-00001-of-00002.safetensors",
77
+ "text_model.model.layers.12.attention.q_norm.weight": "model-00001-of-00002.safetensors",
78
+ "text_model.model.layers.12.attention.wo.weight": "model-00001-of-00002.safetensors",
79
+ "text_model.model.layers.12.attention.wqkv.weight": "model-00001-of-00002.safetensors",
80
+ "text_model.model.layers.12.attention_norm.weight": "model-00001-of-00002.safetensors",
81
+ "text_model.model.layers.12.feed_forward.w1.weight": "model-00001-of-00002.safetensors",
82
+ "text_model.model.layers.12.feed_forward.w2.weight": "model-00001-of-00002.safetensors",
83
+ "text_model.model.layers.12.feed_forward.w3.weight": "model-00001-of-00002.safetensors",
84
+ "text_model.model.layers.12.ffn_norm.weight": "model-00001-of-00002.safetensors",
85
+ "text_model.model.layers.13.attention.k_norm.weight": "model-00001-of-00002.safetensors",
86
+ "text_model.model.layers.13.attention.q_norm.weight": "model-00001-of-00002.safetensors",
87
+ "text_model.model.layers.13.attention.wo.weight": "model-00001-of-00002.safetensors",
88
+ "text_model.model.layers.13.attention.wqkv.weight": "model-00001-of-00002.safetensors",
89
+ "text_model.model.layers.13.attention_norm.weight": "model-00001-of-00002.safetensors",
90
+ "text_model.model.layers.13.feed_forward.w1.weight": "model-00001-of-00002.safetensors",
91
+ "text_model.model.layers.13.feed_forward.w2.weight": "model-00001-of-00002.safetensors",
92
+ "text_model.model.layers.13.feed_forward.w3.weight": "model-00001-of-00002.safetensors",
93
+ "text_model.model.layers.13.ffn_norm.weight": "model-00001-of-00002.safetensors",
94
+ "text_model.model.layers.14.attention.k_norm.weight": "model-00001-of-00002.safetensors",
95
+ "text_model.model.layers.14.attention.q_norm.weight": "model-00001-of-00002.safetensors",
96
+ "text_model.model.layers.14.attention.wo.weight": "model-00001-of-00002.safetensors",
97
+ "text_model.model.layers.14.attention.wqkv.weight": "model-00001-of-00002.safetensors",
98
+ "text_model.model.layers.14.attention_norm.weight": "model-00001-of-00002.safetensors",
99
+ "text_model.model.layers.14.feed_forward.w1.weight": "model-00001-of-00002.safetensors",
100
+ "text_model.model.layers.14.feed_forward.w2.weight": "model-00001-of-00002.safetensors",
101
+ "text_model.model.layers.14.feed_forward.w3.weight": "model-00001-of-00002.safetensors",
102
+ "text_model.model.layers.14.ffn_norm.weight": "model-00001-of-00002.safetensors",
103
+ "text_model.model.layers.15.attention.k_norm.weight": "model-00001-of-00002.safetensors",
104
+ "text_model.model.layers.15.attention.q_norm.weight": "model-00001-of-00002.safetensors",
105
+ "text_model.model.layers.15.attention.wo.weight": "model-00001-of-00002.safetensors",
106
+ "text_model.model.layers.15.attention.wqkv.weight": "model-00001-of-00002.safetensors",
107
+ "text_model.model.layers.15.attention_norm.weight": "model-00001-of-00002.safetensors",
108
+ "text_model.model.layers.15.feed_forward.w1.weight": "model-00001-of-00002.safetensors",
109
+ "text_model.model.layers.15.feed_forward.w2.weight": "model-00001-of-00002.safetensors",
110
+ "text_model.model.layers.15.feed_forward.w3.weight": "model-00001-of-00002.safetensors",
111
+ "text_model.model.layers.15.ffn_norm.weight": "model-00001-of-00002.safetensors",
112
+ "text_model.model.layers.16.attention.k_norm.weight": "model-00001-of-00002.safetensors",
113
+ "text_model.model.layers.16.attention.q_norm.weight": "model-00001-of-00002.safetensors",
114
+ "text_model.model.layers.16.attention.wo.weight": "model-00001-of-00002.safetensors",
115
+ "text_model.model.layers.16.attention.wqkv.weight": "model-00001-of-00002.safetensors",
116
+ "text_model.model.layers.16.attention_norm.weight": "model-00001-of-00002.safetensors",
117
+ "text_model.model.layers.16.feed_forward.w1.weight": "model-00001-of-00002.safetensors",
118
+ "text_model.model.layers.16.feed_forward.w2.weight": "model-00001-of-00002.safetensors",
119
+ "text_model.model.layers.16.feed_forward.w3.weight": "model-00001-of-00002.safetensors",
120
+ "text_model.model.layers.16.ffn_norm.weight": "model-00001-of-00002.safetensors",
121
+ "text_model.model.layers.17.attention.k_norm.weight": "model-00001-of-00002.safetensors",
122
+ "text_model.model.layers.17.attention.q_norm.weight": "model-00001-of-00002.safetensors",
123
+ "text_model.model.layers.17.attention.wo.weight": "model-00001-of-00002.safetensors",
124
+ "text_model.model.layers.17.attention.wqkv.weight": "model-00001-of-00002.safetensors",
125
+ "text_model.model.layers.17.attention_norm.weight": "model-00001-of-00002.safetensors",
126
+ "text_model.model.layers.17.feed_forward.w1.weight": "model-00001-of-00002.safetensors",
127
+ "text_model.model.layers.17.feed_forward.w2.weight": "model-00001-of-00002.safetensors",
128
+ "text_model.model.layers.17.feed_forward.w3.weight": "model-00001-of-00002.safetensors",
129
+ "text_model.model.layers.17.ffn_norm.weight": "model-00001-of-00002.safetensors",
130
+ "text_model.model.layers.18.attention.k_norm.weight": "model-00001-of-00002.safetensors",
131
+ "text_model.model.layers.18.attention.q_norm.weight": "model-00001-of-00002.safetensors",
132
+ "text_model.model.layers.18.attention.wo.weight": "model-00001-of-00002.safetensors",
133
+ "text_model.model.layers.18.attention.wqkv.weight": "model-00001-of-00002.safetensors",
134
+ "text_model.model.layers.18.attention_norm.weight": "model-00001-of-00002.safetensors",
135
+ "text_model.model.layers.18.feed_forward.w1.weight": "model-00001-of-00002.safetensors",
136
+ "text_model.model.layers.18.feed_forward.w2.weight": "model-00001-of-00002.safetensors",
137
+ "text_model.model.layers.18.feed_forward.w3.weight": "model-00001-of-00002.safetensors",
138
+ "text_model.model.layers.18.ffn_norm.weight": "model-00001-of-00002.safetensors",
139
+ "text_model.model.layers.19.attention.k_norm.weight": "model-00001-of-00002.safetensors",
140
+ "text_model.model.layers.19.attention.q_norm.weight": "model-00001-of-00002.safetensors",
141
+ "text_model.model.layers.19.attention.wo.weight": "model-00001-of-00002.safetensors",
142
+ "text_model.model.layers.19.attention.wqkv.weight": "model-00001-of-00002.safetensors",
143
+ "text_model.model.layers.19.attention_norm.weight": "model-00001-of-00002.safetensors",
144
+ "text_model.model.layers.19.feed_forward.w1.weight": "model-00001-of-00002.safetensors",
145
+ "text_model.model.layers.19.feed_forward.w2.weight": "model-00001-of-00002.safetensors",
146
+ "text_model.model.layers.19.feed_forward.w3.weight": "model-00001-of-00002.safetensors",
147
+ "text_model.model.layers.19.ffn_norm.weight": "model-00001-of-00002.safetensors",
148
+ "text_model.model.layers.2.attention.k_norm.weight": "model-00001-of-00002.safetensors",
149
+ "text_model.model.layers.2.attention.q_norm.weight": "model-00001-of-00002.safetensors",
150
+ "text_model.model.layers.2.attention.wo.weight": "model-00001-of-00002.safetensors",
151
+ "text_model.model.layers.2.attention.wqkv.weight": "model-00001-of-00002.safetensors",
152
+ "text_model.model.layers.2.attention_norm.weight": "model-00001-of-00002.safetensors",
153
+ "text_model.model.layers.2.feed_forward.w1.weight": "model-00001-of-00002.safetensors",
154
+ "text_model.model.layers.2.feed_forward.w2.weight": "model-00001-of-00002.safetensors",
155
+ "text_model.model.layers.2.feed_forward.w3.weight": "model-00001-of-00002.safetensors",
156
+ "text_model.model.layers.2.ffn_norm.weight": "model-00001-of-00002.safetensors",
157
+ "text_model.model.layers.20.attention.k_norm.weight": "model-00001-of-00002.safetensors",
158
+ "text_model.model.layers.20.attention.q_norm.weight": "model-00001-of-00002.safetensors",
159
+ "text_model.model.layers.20.attention.wo.weight": "model-00001-of-00002.safetensors",
160
+ "text_model.model.layers.20.attention.wqkv.weight": "model-00001-of-00002.safetensors",
161
+ "text_model.model.layers.20.attention_norm.weight": "model-00002-of-00002.safetensors",
162
+ "text_model.model.layers.20.feed_forward.w1.weight": "model-00001-of-00002.safetensors",
163
+ "text_model.model.layers.20.feed_forward.w2.weight": "model-00002-of-00002.safetensors",
164
+ "text_model.model.layers.20.feed_forward.w3.weight": "model-00001-of-00002.safetensors",
165
+ "text_model.model.layers.20.ffn_norm.weight": "model-00002-of-00002.safetensors",
166
+ "text_model.model.layers.21.attention.k_norm.weight": "model-00002-of-00002.safetensors",
167
+ "text_model.model.layers.21.attention.q_norm.weight": "model-00002-of-00002.safetensors",
168
+ "text_model.model.layers.21.attention.wo.weight": "model-00002-of-00002.safetensors",
169
+ "text_model.model.layers.21.attention.wqkv.weight": "model-00002-of-00002.safetensors",
170
+ "text_model.model.layers.21.attention_norm.weight": "model-00002-of-00002.safetensors",
171
+ "text_model.model.layers.21.feed_forward.w1.weight": "model-00002-of-00002.safetensors",
172
+ "text_model.model.layers.21.feed_forward.w2.weight": "model-00002-of-00002.safetensors",
173
+ "text_model.model.layers.21.feed_forward.w3.weight": "model-00002-of-00002.safetensors",
174
+ "text_model.model.layers.21.ffn_norm.weight": "model-00002-of-00002.safetensors",
175
+ "text_model.model.layers.22.attention.k_norm.weight": "model-00002-of-00002.safetensors",
176
+ "text_model.model.layers.22.attention.q_norm.weight": "model-00002-of-00002.safetensors",
177
+ "text_model.model.layers.22.attention.wo.weight": "model-00002-of-00002.safetensors",
178
+ "text_model.model.layers.22.attention.wqkv.weight": "model-00002-of-00002.safetensors",
179
+ "text_model.model.layers.22.attention_norm.weight": "model-00002-of-00002.safetensors",
180
+ "text_model.model.layers.22.feed_forward.w1.weight": "model-00002-of-00002.safetensors",
181
+ "text_model.model.layers.22.feed_forward.w2.weight": "model-00002-of-00002.safetensors",
182
+ "text_model.model.layers.22.feed_forward.w3.weight": "model-00002-of-00002.safetensors",
183
+ "text_model.model.layers.22.ffn_norm.weight": "model-00002-of-00002.safetensors",
184
+ "text_model.model.layers.23.attention.k_norm.weight": "model-00002-of-00002.safetensors",
185
+ "text_model.model.layers.23.attention.q_norm.weight": "model-00002-of-00002.safetensors",
186
+ "text_model.model.layers.23.attention.wo.weight": "model-00002-of-00002.safetensors",
187
+ "text_model.model.layers.23.attention.wqkv.weight": "model-00002-of-00002.safetensors",
188
+ "text_model.model.layers.23.attention_norm.weight": "model-00002-of-00002.safetensors",
189
+ "text_model.model.layers.23.feed_forward.w1.weight": "model-00002-of-00002.safetensors",
190
+ "text_model.model.layers.23.feed_forward.w2.weight": "model-00002-of-00002.safetensors",
191
+ "text_model.model.layers.23.feed_forward.w3.weight": "model-00002-of-00002.safetensors",
192
+ "text_model.model.layers.23.ffn_norm.weight": "model-00002-of-00002.safetensors",
193
+ "text_model.model.layers.24.attention.k_norm.weight": "model-00002-of-00002.safetensors",
194
+ "text_model.model.layers.24.attention.q_norm.weight": "model-00002-of-00002.safetensors",
195
+ "text_model.model.layers.24.attention.wo.weight": "model-00002-of-00002.safetensors",
196
+ "text_model.model.layers.24.attention.wqkv.weight": "model-00002-of-00002.safetensors",
197
+ "text_model.model.layers.24.attention_norm.weight": "model-00002-of-00002.safetensors",
198
+ "text_model.model.layers.24.feed_forward.w1.weight": "model-00002-of-00002.safetensors",
199
+ "text_model.model.layers.24.feed_forward.w2.weight": "model-00002-of-00002.safetensors",
200
+ "text_model.model.layers.24.feed_forward.w3.weight": "model-00002-of-00002.safetensors",
201
+ "text_model.model.layers.24.ffn_norm.weight": "model-00002-of-00002.safetensors",
202
+ "text_model.model.layers.25.attention.k_norm.weight": "model-00002-of-00002.safetensors",
203
+ "text_model.model.layers.25.attention.q_norm.weight": "model-00002-of-00002.safetensors",
204
+ "text_model.model.layers.25.attention.wo.weight": "model-00002-of-00002.safetensors",
205
+ "text_model.model.layers.25.attention.wqkv.weight": "model-00002-of-00002.safetensors",
206
+ "text_model.model.layers.25.attention_norm.weight": "model-00002-of-00002.safetensors",
207
+ "text_model.model.layers.25.feed_forward.w1.weight": "model-00002-of-00002.safetensors",
208
+ "text_model.model.layers.25.feed_forward.w2.weight": "model-00002-of-00002.safetensors",
209
+ "text_model.model.layers.25.feed_forward.w3.weight": "model-00002-of-00002.safetensors",
210
+ "text_model.model.layers.25.ffn_norm.weight": "model-00002-of-00002.safetensors",
211
+ "text_model.model.layers.26.attention.k_norm.weight": "model-00002-of-00002.safetensors",
212
+ "text_model.model.layers.26.attention.q_norm.weight": "model-00002-of-00002.safetensors",
213
+ "text_model.model.layers.26.attention.wo.weight": "model-00002-of-00002.safetensors",
214
+ "text_model.model.layers.26.attention.wqkv.weight": "model-00002-of-00002.safetensors",
215
+ "text_model.model.layers.26.attention_norm.weight": "model-00002-of-00002.safetensors",
216
+ "text_model.model.layers.26.feed_forward.w1.weight": "model-00002-of-00002.safetensors",
217
+ "text_model.model.layers.26.feed_forward.w2.weight": "model-00002-of-00002.safetensors",
218
+ "text_model.model.layers.26.feed_forward.w3.weight": "model-00002-of-00002.safetensors",
219
+ "text_model.model.layers.26.ffn_norm.weight": "model-00002-of-00002.safetensors",
220
+ "text_model.model.layers.27.attention.k_norm.weight": "model-00002-of-00002.safetensors",
221
+ "text_model.model.layers.27.attention.q_norm.weight": "model-00002-of-00002.safetensors",
222
+ "text_model.model.layers.27.attention.wo.weight": "model-00002-of-00002.safetensors",
223
+ "text_model.model.layers.27.attention.wqkv.weight": "model-00002-of-00002.safetensors",
224
+ "text_model.model.layers.27.attention_norm.weight": "model-00002-of-00002.safetensors",
225
+ "text_model.model.layers.27.feed_forward.w1.weight": "model-00002-of-00002.safetensors",
226
+ "text_model.model.layers.27.feed_forward.w2.weight": "model-00002-of-00002.safetensors",
227
+ "text_model.model.layers.27.feed_forward.w3.weight": "model-00002-of-00002.safetensors",
228
+ "text_model.model.layers.27.ffn_norm.weight": "model-00002-of-00002.safetensors",
229
+ "text_model.model.layers.28.attention.k_norm.weight": "model-00002-of-00002.safetensors",
230
+ "text_model.model.layers.28.attention.q_norm.weight": "model-00002-of-00002.safetensors",
231
+ "text_model.model.layers.28.attention.wo.weight": "model-00002-of-00002.safetensors",
232
+ "text_model.model.layers.28.attention.wqkv.weight": "model-00002-of-00002.safetensors",
233
+ "text_model.model.layers.28.attention_norm.weight": "model-00002-of-00002.safetensors",
234
+ "text_model.model.layers.28.feed_forward.w1.weight": "model-00002-of-00002.safetensors",
235
+ "text_model.model.layers.28.feed_forward.w2.weight": "model-00002-of-00002.safetensors",
236
+ "text_model.model.layers.28.feed_forward.w3.weight": "model-00002-of-00002.safetensors",
237
+ "text_model.model.layers.28.ffn_norm.weight": "model-00002-of-00002.safetensors",
238
+ "text_model.model.layers.29.attention.k_norm.weight": "model-00002-of-00002.safetensors",
239
+ "text_model.model.layers.29.attention.q_norm.weight": "model-00002-of-00002.safetensors",
240
+ "text_model.model.layers.29.attention.wo.weight": "model-00002-of-00002.safetensors",
241
+ "text_model.model.layers.29.attention.wqkv.weight": "model-00002-of-00002.safetensors",
242
+ "text_model.model.layers.29.attention_norm.weight": "model-00002-of-00002.safetensors",
243
+ "text_model.model.layers.29.feed_forward.w1.weight": "model-00002-of-00002.safetensors",
244
+ "text_model.model.layers.29.feed_forward.w2.weight": "model-00002-of-00002.safetensors",
245
+ "text_model.model.layers.29.feed_forward.w3.weight": "model-00002-of-00002.safetensors",
246
+ "text_model.model.layers.29.ffn_norm.weight": "model-00002-of-00002.safetensors",
247
+ "text_model.model.layers.3.attention.k_norm.weight": "model-00001-of-00002.safetensors",
248
+ "text_model.model.layers.3.attention.q_norm.weight": "model-00001-of-00002.safetensors",
249
+ "text_model.model.layers.3.attention.wo.weight": "model-00001-of-00002.safetensors",
250
+ "text_model.model.layers.3.attention.wqkv.weight": "model-00001-of-00002.safetensors",
251
+ "text_model.model.layers.3.attention_norm.weight": "model-00001-of-00002.safetensors",
252
+ "text_model.model.layers.3.feed_forward.w1.weight": "model-00001-of-00002.safetensors",
253
+ "text_model.model.layers.3.feed_forward.w2.weight": "model-00001-of-00002.safetensors",
254
+ "text_model.model.layers.3.feed_forward.w3.weight": "model-00001-of-00002.safetensors",
255
+ "text_model.model.layers.3.ffn_norm.weight": "model-00001-of-00002.safetensors",
256
+ "text_model.model.layers.30.attention.k_norm.weight": "model-00002-of-00002.safetensors",
257
+ "text_model.model.layers.30.attention.q_norm.weight": "model-00002-of-00002.safetensors",
258
+ "text_model.model.layers.30.attention.wo.weight": "model-00002-of-00002.safetensors",
259
+ "text_model.model.layers.30.attention.wqkv.weight": "model-00002-of-00002.safetensors",
260
+ "text_model.model.layers.30.attention_norm.weight": "model-00002-of-00002.safetensors",
261
+ "text_model.model.layers.30.feed_forward.w1.weight": "model-00002-of-00002.safetensors",
262
+ "text_model.model.layers.30.feed_forward.w2.weight": "model-00002-of-00002.safetensors",
263
+ "text_model.model.layers.30.feed_forward.w3.weight": "model-00002-of-00002.safetensors",
264
+ "text_model.model.layers.30.ffn_norm.weight": "model-00002-of-00002.safetensors",
265
+ "text_model.model.layers.31.attention.k_norm.weight": "model-00002-of-00002.safetensors",
266
+ "text_model.model.layers.31.attention.q_norm.weight": "model-00002-of-00002.safetensors",
267
+ "text_model.model.layers.31.attention.wo.weight": "model-00002-of-00002.safetensors",
268
+ "text_model.model.layers.31.attention.wqkv.weight": "model-00002-of-00002.safetensors",
269
+ "text_model.model.layers.31.attention_norm.weight": "model-00002-of-00002.safetensors",
270
+ "text_model.model.layers.31.feed_forward.w1.weight": "model-00002-of-00002.safetensors",
271
+ "text_model.model.layers.31.feed_forward.w2.weight": "model-00002-of-00002.safetensors",
272
+ "text_model.model.layers.31.feed_forward.w3.weight": "model-00002-of-00002.safetensors",
273
+ "text_model.model.layers.31.ffn_norm.weight": "model-00002-of-00002.safetensors",
274
+ "text_model.model.layers.32.attention.k_norm.weight": "model-00002-of-00002.safetensors",
275
+ "text_model.model.layers.32.attention.q_norm.weight": "model-00002-of-00002.safetensors",
276
+ "text_model.model.layers.32.attention.wo.weight": "model-00002-of-00002.safetensors",
277
+ "text_model.model.layers.32.attention.wqkv.weight": "model-00002-of-00002.safetensors",
278
+ "text_model.model.layers.32.attention_norm.weight": "model-00002-of-00002.safetensors",
279
+ "text_model.model.layers.32.feed_forward.w1.weight": "model-00002-of-00002.safetensors",
280
+ "text_model.model.layers.32.feed_forward.w2.weight": "model-00002-of-00002.safetensors",
281
+ "text_model.model.layers.32.feed_forward.w3.weight": "model-00002-of-00002.safetensors",
282
+ "text_model.model.layers.32.ffn_norm.weight": "model-00002-of-00002.safetensors",
283
+ "text_model.model.layers.33.attention.k_norm.weight": "model-00002-of-00002.safetensors",
284
+ "text_model.model.layers.33.attention.q_norm.weight": "model-00002-of-00002.safetensors",
285
+ "text_model.model.layers.33.attention.wo.weight": "model-00002-of-00002.safetensors",
286
+ "text_model.model.layers.33.attention.wqkv.weight": "model-00002-of-00002.safetensors",
287
+ "text_model.model.layers.33.attention_norm.weight": "model-00002-of-00002.safetensors",
288
+ "text_model.model.layers.33.feed_forward.w1.weight": "model-00002-of-00002.safetensors",
289
+ "text_model.model.layers.33.feed_forward.w2.weight": "model-00002-of-00002.safetensors",
290
+ "text_model.model.layers.33.feed_forward.w3.weight": "model-00002-of-00002.safetensors",
291
+ "text_model.model.layers.33.ffn_norm.weight": "model-00002-of-00002.safetensors",
292
+ "text_model.model.layers.34.attention.k_norm.weight": "model-00002-of-00002.safetensors",
293
+ "text_model.model.layers.34.attention.q_norm.weight": "model-00002-of-00002.safetensors",
294
+ "text_model.model.layers.34.attention.wo.weight": "model-00002-of-00002.safetensors",
295
+ "text_model.model.layers.34.attention.wqkv.weight": "model-00002-of-00002.safetensors",
296
+ "text_model.model.layers.34.attention_norm.weight": "model-00002-of-00002.safetensors",
297
+ "text_model.model.layers.34.feed_forward.w1.weight": "model-00002-of-00002.safetensors",
298
+ "text_model.model.layers.34.feed_forward.w2.weight": "model-00002-of-00002.safetensors",
299
+ "text_model.model.layers.34.feed_forward.w3.weight": "model-00002-of-00002.safetensors",
300
+ "text_model.model.layers.34.ffn_norm.weight": "model-00002-of-00002.safetensors",
301
+ "text_model.model.layers.35.attention.k_norm.weight": "model-00002-of-00002.safetensors",
302
+ "text_model.model.layers.35.attention.q_norm.weight": "model-00002-of-00002.safetensors",
303
+ "text_model.model.layers.35.attention.wo.weight": "model-00002-of-00002.safetensors",
304
+ "text_model.model.layers.35.attention.wqkv.weight": "model-00002-of-00002.safetensors",
305
+ "text_model.model.layers.35.attention_norm.weight": "model-00002-of-00002.safetensors",
306
+ "text_model.model.layers.35.feed_forward.w1.weight": "model-00002-of-00002.safetensors",
307
+ "text_model.model.layers.35.feed_forward.w2.weight": "model-00002-of-00002.safetensors",
308
+ "text_model.model.layers.35.feed_forward.w3.weight": "model-00002-of-00002.safetensors",
309
+ "text_model.model.layers.35.ffn_norm.weight": "model-00002-of-00002.safetensors",
310
+ "text_model.model.layers.4.attention.k_norm.weight": "model-00001-of-00002.safetensors",
311
+ "text_model.model.layers.4.attention.q_norm.weight": "model-00001-of-00002.safetensors",
312
+ "text_model.model.layers.4.attention.wo.weight": "model-00001-of-00002.safetensors",
313
+ "text_model.model.layers.4.attention.wqkv.weight": "model-00001-of-00002.safetensors",
314
+ "text_model.model.layers.4.attention_norm.weight": "model-00001-of-00002.safetensors",
315
+ "text_model.model.layers.4.feed_forward.w1.weight": "model-00001-of-00002.safetensors",
316
+ "text_model.model.layers.4.feed_forward.w2.weight": "model-00001-of-00002.safetensors",
317
+ "text_model.model.layers.4.feed_forward.w3.weight": "model-00001-of-00002.safetensors",
318
+ "text_model.model.layers.4.ffn_norm.weight": "model-00001-of-00002.safetensors",
319
+ "text_model.model.layers.5.attention.k_norm.weight": "model-00001-of-00002.safetensors",
320
+ "text_model.model.layers.5.attention.q_norm.weight": "model-00001-of-00002.safetensors",
321
+ "text_model.model.layers.5.attention.wo.weight": "model-00001-of-00002.safetensors",
322
+ "text_model.model.layers.5.attention.wqkv.weight": "model-00001-of-00002.safetensors",
323
+ "text_model.model.layers.5.attention_norm.weight": "model-00001-of-00002.safetensors",
324
+ "text_model.model.layers.5.feed_forward.w1.weight": "model-00001-of-00002.safetensors",
325
+ "text_model.model.layers.5.feed_forward.w2.weight": "model-00001-of-00002.safetensors",
326
+ "text_model.model.layers.5.feed_forward.w3.weight": "model-00001-of-00002.safetensors",
327
+ "text_model.model.layers.5.ffn_norm.weight": "model-00001-of-00002.safetensors",
328
+ "text_model.model.layers.6.attention.k_norm.weight": "model-00001-of-00002.safetensors",
329
+ "text_model.model.layers.6.attention.q_norm.weight": "model-00001-of-00002.safetensors",
330
+ "text_model.model.layers.6.attention.wo.weight": "model-00001-of-00002.safetensors",
331
+ "text_model.model.layers.6.attention.wqkv.weight": "model-00001-of-00002.safetensors",
332
+ "text_model.model.layers.6.attention_norm.weight": "model-00001-of-00002.safetensors",
333
+ "text_model.model.layers.6.feed_forward.w1.weight": "model-00001-of-00002.safetensors",
334
+ "text_model.model.layers.6.feed_forward.w2.weight": "model-00001-of-00002.safetensors",
335
+ "text_model.model.layers.6.feed_forward.w3.weight": "model-00001-of-00002.safetensors",
336
+ "text_model.model.layers.6.ffn_norm.weight": "model-00001-of-00002.safetensors",
337
+ "text_model.model.layers.7.attention.k_norm.weight": "model-00001-of-00002.safetensors",
338
+ "text_model.model.layers.7.attention.q_norm.weight": "model-00001-of-00002.safetensors",
339
+ "text_model.model.layers.7.attention.wo.weight": "model-00001-of-00002.safetensors",
340
+ "text_model.model.layers.7.attention.wqkv.weight": "model-00001-of-00002.safetensors",
341
+ "text_model.model.layers.7.attention_norm.weight": "model-00001-of-00002.safetensors",
342
+ "text_model.model.layers.7.feed_forward.w1.weight": "model-00001-of-00002.safetensors",
343
+ "text_model.model.layers.7.feed_forward.w2.weight": "model-00001-of-00002.safetensors",
344
+ "text_model.model.layers.7.feed_forward.w3.weight": "model-00001-of-00002.safetensors",
345
+ "text_model.model.layers.7.ffn_norm.weight": "model-00001-of-00002.safetensors",
346
+ "text_model.model.layers.8.attention.k_norm.weight": "model-00001-of-00002.safetensors",
347
+ "text_model.model.layers.8.attention.q_norm.weight": "model-00001-of-00002.safetensors",
348
+ "text_model.model.layers.8.attention.wo.weight": "model-00001-of-00002.safetensors",
349
+ "text_model.model.layers.8.attention.wqkv.weight": "model-00001-of-00002.safetensors",
350
+ "text_model.model.layers.8.attention_norm.weight": "model-00001-of-00002.safetensors",
351
+ "text_model.model.layers.8.feed_forward.w1.weight": "model-00001-of-00002.safetensors",
352
+ "text_model.model.layers.8.feed_forward.w2.weight": "model-00001-of-00002.safetensors",
353
+ "text_model.model.layers.8.feed_forward.w3.weight": "model-00001-of-00002.safetensors",
354
+ "text_model.model.layers.8.ffn_norm.weight": "model-00001-of-00002.safetensors",
355
+ "text_model.model.layers.9.attention.k_norm.weight": "model-00001-of-00002.safetensors",
356
+ "text_model.model.layers.9.attention.q_norm.weight": "model-00001-of-00002.safetensors",
357
+ "text_model.model.layers.9.attention.wo.weight": "model-00001-of-00002.safetensors",
358
+ "text_model.model.layers.9.attention.wqkv.weight": "model-00001-of-00002.safetensors",
359
+ "text_model.model.layers.9.attention_norm.weight": "model-00001-of-00002.safetensors",
360
+ "text_model.model.layers.9.feed_forward.w1.weight": "model-00001-of-00002.safetensors",
361
+ "text_model.model.layers.9.feed_forward.w2.weight": "model-00001-of-00002.safetensors",
362
+ "text_model.model.layers.9.feed_forward.w3.weight": "model-00001-of-00002.safetensors",
363
+ "text_model.model.layers.9.ffn_norm.weight": "model-00001-of-00002.safetensors",
364
+ "text_model.model.norm.weight": "model-00002-of-00002.safetensors"
365
+ }
366
+ }
overview.png ADDED

Git LFS Details

  • SHA256: f77da18e7d3cf59182fb714fc6c1bc526e7877b538e6380d8f002e84e8c56ea9
  • Pointer size: 132 Bytes
  • Size of remote file: 3.54 MB
special_tokens_map.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f24e08099d45a8adf3f52f5f0b03276e433bb9d689bb15fcbcc48ce58744588b
3
+ size 12217872
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff