sujr commited on
Commit
ca00dae
·
verified ·
1 Parent(s): e698dae

Upload 1.html with huggingface_hub

Browse files
Files changed (1) hide show
  1. 1.html +285 -0
1.html ADDED
@@ -0,0 +1,285 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="zh-CN">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>数据总量概览:预训练与后训练阶段</title>
7
+ <style>
8
+ :root {
9
+ --primary-color: #2563eb;
10
+ --secondary-color: #10b981;
11
+ --text-main: #1f2937;
12
+ --text-muted: #6b7280;
13
+ --bg-color: #ffffff;
14
+ --card-bg: #f9fafb;
15
+ --border-color: #e5e7eb;
16
+ }
17
+
18
+ body {
19
+ font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif;
20
+ background-color: var(--bg-color);
21
+ color: var(--text-main);
22
+ margin: 0;
23
+ padding: 40px 60px;
24
+ box-sizing: border-box;
25
+ display: flex;
26
+ flex-direction: column;
27
+ height: 100vh;
28
+ justify-content: center;
29
+ }
30
+
31
+ h1 {
32
+ text-align: center;
33
+ font-size: 2.5rem;
34
+ margin-bottom: 10px;
35
+ color: #111827;
36
+ font-weight: 800;
37
+ letter-spacing: 2px;
38
+ }
39
+
40
+ .subtitle {
41
+ text-align: center;
42
+ color: var(--text-muted);
43
+ font-size: 1.2rem;
44
+ margin-bottom: 40px;
45
+ }
46
+
47
+ .container {
48
+ display: flex;
49
+ gap: 40px;
50
+ width: 100%;
51
+ max-width: 1400px;
52
+ margin: 0 auto;
53
+ flex: 1;
54
+ }
55
+
56
+ .column {
57
+ flex: 1;
58
+ display: flex;
59
+ flex-direction: column;
60
+ gap: 20px;
61
+ }
62
+
63
+ .column-header {
64
+ font-size: 1.8rem;
65
+ font-weight: bold;
66
+ padding-bottom: 15px;
67
+ border-bottom: 3px solid;
68
+ display: flex;
69
+ align-items: center;
70
+ gap: 10px;
71
+ }
72
+
73
+ .pre-train .column-header {
74
+ color: var(--primary-color);
75
+ border-bottom-color: var(--primary-color);
76
+ }
77
+
78
+ .post-train .column-header {
79
+ color: var(--secondary-color);
80
+ border-bottom-color: var(--secondary-color);
81
+ }
82
+
83
+ .card {
84
+ background-color: var(--card-bg);
85
+ border: 1px solid var(--border-color);
86
+ border-radius: 12px;
87
+ padding: 24px;
88
+ box-shadow: 0 4px 6px -1px rgba(0, 0, 0, 0.05), 0 2px 4px -1px rgba(0, 0, 0, 0.03);
89
+ transition: transform 0.2s ease;
90
+ }
91
+
92
+ .card:hover {
93
+ transform: translateY(-2px);
94
+ box-shadow: 0 10px 15px -3px rgba(0, 0, 0, 0.05), 0 4px 6px -2px rgba(0, 0, 0, 0.025);
95
+ }
96
+
97
+ .card-header {
98
+ display: flex;
99
+ justify-content: space-between;
100
+ align-items: center;
101
+ margin-bottom: 16px;
102
+ }
103
+
104
+ .card-title {
105
+ font-size: 1.4rem;
106
+ font-weight: bold;
107
+ display: flex;
108
+ align-items: center;
109
+ gap: 8px;
110
+ }
111
+
112
+ .card-total {
113
+ font-size: 1.2rem;
114
+ font-weight: 900;
115
+ background: #e0f2fe;
116
+ color: #0369a1;
117
+ padding: 4px 12px;
118
+ border-radius: 20px;
119
+ }
120
+
121
+ .post-train .card-total {
122
+ background: #d1fae5;
123
+ color: #047857;
124
+ }
125
+
126
+ .data-list {
127
+ list-style: none;
128
+ padding: 0;
129
+ margin: 0;
130
+ }
131
+
132
+ .data-list li {
133
+ position: relative;
134
+ padding-left: 20px;
135
+ margin-bottom: 12px;
136
+ line-height: 1.6;
137
+ color: #374151;
138
+ }
139
+
140
+ .data-list li::before {
141
+ content: "•";
142
+ position: absolute;
143
+ left: 0;
144
+ color: #9ca3af;
145
+ font-size: 1.2rem;
146
+ }
147
+
148
+ .highlight {
149
+ font-weight: 600;
150
+ color: #111827;
151
+ }
152
+
153
+ .tag {
154
+ display: inline-block;
155
+ background-color: #f3f4f6;
156
+ border: 1px solid #d1d5db;
157
+ color: #4b5563;
158
+ font-size: 0.85rem;
159
+ padding: 2px 8px;
160
+ border-radius: 4px;
161
+ margin-left: 6px;
162
+ vertical-align: middle;
163
+ }
164
+
165
+ .method-box {
166
+ margin-top: 12px;
167
+ padding: 12px;
168
+ background-color: #f8fafc;
169
+ border-left: 4px solid var(--secondary-color);
170
+ border-radius: 0 8px 8px 0;
171
+ font-size: 0.95rem;
172
+ }
173
+
174
+ .arrow {
175
+ color: #9ca3af;
176
+ margin: 0 6px;
177
+ }
178
+ </style>
179
+ </head>
180
+ <body>
181
+
182
+ <h1>多模态大模型数据大盘</h1>
183
+ <div class="subtitle">Data Volume Overview: Pre-training & Post-training</div>
184
+
185
+ <div class="container">
186
+ <div class="column pre-train">
187
+ <div class="column-header">
188
+ <span>⚡ 预训练阶段 (Pre-training)</span>
189
+ </div>
190
+
191
+ <div class="card">
192
+ <div class="card-header">
193
+ <div class="card-title">🖼️ T2I (Text-to-Image)</div>
194
+ <div class="card-total">~ 812M+</div>
195
+ </div>
196
+ <ul class="data-list">
197
+ <li><span class="highlight">华山落盘数据:800M</span>
198
+ <div style="margin-top: 4px; font-size: 0.95rem; color: var(--text-muted);">
199
+ 含 240M Qwen3vl4b 重打标 Dense 数据 <span class="tag">平均387 tokens</span>,其余为短标注
200
+ </div>
201
+ </li>
202
+ <li><span class="highlight">Banana 蒸馏数据:2.2M</span> <span class="tag">Gemini标注</span></li>
203
+ <li><span class="highlight">自收集开源数据:10M</span> <span class="tag">高质量美学图片</span></li>
204
+ </ul>
205
+ </div>
206
+
207
+ <div class="card">
208
+ <div class="card-header">
209
+ <div class="card-title">🎵 T2A (Text-to-Audio)</div>
210
+ <div class="card-total">56,300 Hours+</div>
211
+ </div>
212
+ <ul class="data-list">
213
+ <li><span class="highlight">Audio 数据:6,300 小时</span> (约 2.3M)</li>
214
+ <li><span class="highlight">Speech 数据:50,000 小时</span> (约 33M)</li>
215
+ <li><span class="highlight">自收集开源数据</span> (补充来源)</li>
216
+ </ul>
217
+ </div>
218
+
219
+ <div class="card">
220
+ <div class="card-header">
221
+ <div class="card-title">🎬 T2V (Text-to-Video)</div>
222
+ <div class="card-total">~ 100M+</div>
223
+ </div>
224
+ <div style="margin-bottom: 10px;"><span class="tag">>480p</span> <span class="tag">时长 3~10s</span></div>
225
+ <ul class="data-list">
226
+ <li><span class="highlight">华山数据:100M</span>
227
+ <div style="margin-top: 4px; font-size: 0.95rem; color: var(--text-muted);">
228
+ 含 22M 数据经 Gemini 重打标注
229
+ </div>
230
+ </li>
231
+ <li><span class="highlight">Seedance 蒸馏:350K</span></li>
232
+ </ul>
233
+ </div>
234
+
235
+ <div class="card">
236
+ <div class="card-header">
237
+ <div class="card-title">🎞️ T2AV (Text-to-Audio-Video)</div>
238
+ <div class="card-total">2M</div>
239
+ </div>
240
+ </div>
241
+ </div>
242
+
243
+ <div class="column post-train">
244
+ <div class="column-header">
245
+ <span>🎯 后训练阶段 (Post-training)</span>
246
+ </div>
247
+
248
+ <div class="card">
249
+ <div class="card-header">
250
+ <div class="card-title">✨ I2I (Image-to-Image)</div>
251
+ <div class="card-total">5M</div>
252
+ </div>
253
+ <ul class="data-list">
254
+ <li><span class="highlight">Nano Banana 蒸馏数据:5M</span> <span class="tag">1K 分辨率</span></li>
255
+ </ul>
256
+ </div>
257
+
258
+ <div class="card">
259
+ <div class="card-header">
260
+ <div class="card-title">📽️ I2V (Image-to-Video)</div>
261
+ <div class="card-total">5M</div>
262
+ </div>
263
+ <ul class="data-list">
264
+ <li><span class="highlight">精选图生视频数据:5M</span></li>
265
+ </ul>
266
+ </div>
267
+
268
+ <div class="card">
269
+ <div class="card-header">
270
+ <div class="card-title">🎮 交互数据 (Interactive)</div>
271
+ <div class="card-total">1M</div>
272
+ </div>
273
+ <ul class="data-list">
274
+ <li><span class="highlight">游戏交互控制数据:1M</span></li>
275
+ </ul>
276
+ <div class="method-box">
277
+ <strong>🔧 构造链路:</strong><br>
278
+ YouTube游戏视频 <span class="arrow">➔</span> VGGt估计姿态 <span class="arrow">➔</span> 提取控制信号
279
+ </div>
280
+ </div>
281
+ </div>
282
+ </div>
283
+
284
+ </body>
285
+ </html>