88plug-bot commited on
Commit
970d6cf
Β·
verified Β·
1 Parent(s): 4042270

Upload index.html with huggingface_hub

Browse files
Files changed (1) hide show
  1. index.html +261 -18
index.html CHANGED
@@ -1,19 +1,262 @@
1
- <!doctype html>
2
- <html>
3
- <head>
4
- <meta charset="utf-8" />
5
- <meta name="viewport" content="width=device-width" />
6
- <title>My static Space</title>
7
- <link rel="stylesheet" href="style.css" />
8
- </head>
9
- <body>
10
- <div class="card">
11
- <h1>Welcome to your static Space!</h1>
12
- <p>You can modify this app directly by editing <i>index.html</i> in the Files and versions tab.</p>
13
- <p>
14
- Also don't forget to check the
15
- <a href="https://huggingface.co/docs/hub/spaces" target="_blank">Spaces documentation</a>.
16
- </p>
17
- </div>
18
- </body>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  </html>
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>88plug AI Lab</title>
7
+ <style>
8
+ :root {
9
+ --bg: #0f1117;
10
+ --surface: #1a1d27;
11
+ --border: #2a2d3e;
12
+ --accent: #6366f1;
13
+ --accent2: #818cf8;
14
+ --text: #e2e8f0;
15
+ --muted: #94a3b8;
16
+ --code-bg: #0d1117;
17
+ --green: #22c55e;
18
+ }
19
+ * { box-sizing: border-box; margin: 0; padding: 0; }
20
+ body {
21
+ background: var(--bg);
22
+ color: var(--text);
23
+ font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', system-ui, sans-serif;
24
+ font-size: 15px;
25
+ line-height: 1.7;
26
+ max-width: 900px;
27
+ margin: 0 auto;
28
+ padding: 40px 24px 80px;
29
+ }
30
+ h1 { font-size: 2rem; font-weight: 700; color: #fff; margin-bottom: 4px; }
31
+ h2 { font-size: 1.25rem; font-weight: 600; color: #fff; margin: 40px 0 12px; padding-bottom: 8px; border-bottom: 1px solid var(--border); }
32
+ h3 { font-size: 1rem; font-weight: 600; color: var(--accent2); margin: 28px 0 10px; }
33
+ p { margin-bottom: 14px; color: var(--text); }
34
+ a { color: var(--accent2); text-decoration: none; }
35
+ a:hover { text-decoration: underline; }
36
+ hr { border: none; border-top: 1px solid var(--border); margin: 32px 0; }
37
+ .header { margin-bottom: 32px; }
38
+ .subtitle { color: var(--muted); font-size: 0.95rem; margin-top: 6px; }
39
+ .badge {
40
+ display: inline-block;
41
+ background: rgba(99, 102, 241, 0.15);
42
+ color: var(--accent2);
43
+ border: 1px solid rgba(99, 102, 241, 0.3);
44
+ border-radius: 4px;
45
+ font-size: 0.75rem;
46
+ font-weight: 600;
47
+ padding: 2px 8px;
48
+ margin-right: 6px;
49
+ letter-spacing: 0.05em;
50
+ text-transform: uppercase;
51
+ }
52
+ table { width: 100%; border-collapse: collapse; margin: 14px 0 24px; font-size: 0.9rem; }
53
+ th { background: var(--surface); color: var(--muted); font-weight: 600; text-align: left; padding: 8px 12px; border-bottom: 1px solid var(--border); font-size: 0.8rem; letter-spacing: 0.04em; text-transform: uppercase; }
54
+ td { padding: 8px 12px; border-bottom: 1px solid var(--border); vertical-align: top; }
55
+ tr:last-child td { border-bottom: none; }
56
+ tr:hover td { background: rgba(255,255,255,0.02); }
57
+ code {
58
+ font-family: 'JetBrains Mono', 'Fira Code', 'Cascadia Code', monospace;
59
+ font-size: 0.85em;
60
+ background: var(--code-bg);
61
+ border: 1px solid var(--border);
62
+ border-radius: 4px;
63
+ padding: 1px 5px;
64
+ color: #e879f9;
65
+ }
66
+ pre {
67
+ background: var(--code-bg);
68
+ border: 1px solid var(--border);
69
+ border-radius: 8px;
70
+ padding: 16px 20px;
71
+ overflow-x: auto;
72
+ margin: 12px 0 20px;
73
+ font-size: 0.85rem;
74
+ line-height: 1.6;
75
+ }
76
+ pre code {
77
+ background: none;
78
+ border: none;
79
+ padding: 0;
80
+ color: #a5f3fc;
81
+ font-size: inherit;
82
+ }
83
+ .model-family {
84
+ background: var(--surface);
85
+ border: 1px solid var(--border);
86
+ border-radius: 10px;
87
+ padding: 20px 24px;
88
+ margin-bottom: 16px;
89
+ }
90
+ .model-family h3 { margin-top: 0; }
91
+ .quality-grid { display: grid; grid-template-columns: 1fr 1fr; gap: 12px; margin: 16px 0 24px; }
92
+ .quality-card {
93
+ background: var(--surface);
94
+ border: 1px solid var(--border);
95
+ border-radius: 8px;
96
+ padding: 16px 20px;
97
+ }
98
+ .quality-card .tier { font-size: 1.1rem; font-weight: 700; color: #fff; margin-bottom: 4px; }
99
+ .quality-card .method { color: var(--muted); font-size: 0.85rem; margin-bottom: 8px; }
100
+ .quality-card .recovery { color: var(--green); font-weight: 600; font-size: 0.9rem; }
101
+ .contact { background: var(--surface); border: 1px solid var(--border); border-radius: 10px; padding: 20px 24px; }
102
+ ul { padding-left: 20px; margin-bottom: 14px; }
103
+ li { margin-bottom: 6px; }
104
+ @media (max-width: 600px) {
105
+ .quality-grid { grid-template-columns: 1fr; }
106
+ body { padding: 24px 16px 60px; }
107
+ h1 { font-size: 1.5rem; }
108
+ }
109
+ </style>
110
+ </head>
111
+ <body>
112
+
113
+ <div class="header">
114
+ <h1>πŸ”Œ 88plug AI Lab</h1>
115
+ <p class="subtitle">Production-grade compressed-tensors quantizations of frontier LLMs, VLMs, and omni models β€” engineered for native vLLM v0.9.0+ deployment.</p>
116
+ </div>
117
+
118
+ <h2>Why compressed-tensors</h2>
119
+ <p>Most quantization formats (AWQ, GPTQ, GGUF) target a single inference backend and ship a frozen weight layout that cannot be further composed or modified at load time. <code>compressed-tensors</code> is the format developed by Neural Magic and maintained as a first-class vLLM citizen.</p>
120
+ <ul>
121
+ <li><strong>Native vLLM integration.</strong> No format conversion, no plugin shims. vLLM reads compressed-tensors models directly via its built-in <code>CompressedTensorsWorker</code>. Full PagedAttention, continuous batching, and tensor parallelism work without modification.</li>
122
+ <li><strong>Composable precision.</strong> A single checkpoint can carry per-layer or per-group precision assignments. Mixed-precision MoE configurations are expressed in the same file.</li>
123
+ <li><strong>Reproducible calibration metadata.</strong> The quantization config, calibration scheme, and per-channel scales are stored inside the checkpoint.</li>
124
+ <li><strong>Forward compatibility.</strong> As vLLM adds new kernel support (FP8, INT8, sparse), compressed-tensors models gain that support without re-quantizing.</li>
125
+ </ul>
126
+ <p>AWQ and GPTQ remain fine for llama.cpp and older toolchains. If you are deploying on vLLM in production, compressed-tensors is the correct choice.</p>
127
+
128
+ <h2>Quality Standard</h2>
129
+ <div class="quality-grid">
130
+ <div class="quality-card">
131
+ <div class="tier">W8A16</div>
132
+ <div class="method">RTN / AutoRound iters=200</div>
133
+ <div class="recovery">&gt;99.5% MMLU recovery</div>
134
+ <p style="font-size:0.85rem;color:var(--muted);margin:8px 0 0">Ampere+ (A100, A6000, RTX 30xx+)</p>
135
+ </div>
136
+ <div class="quality-card">
137
+ <div class="tier">W4A16</div>
138
+ <div class="method">AutoRound iters=200 (SignSGD)</div>
139
+ <div class="recovery">β‰₯99% MMLU recovery</div>
140
+ <p style="font-size:0.85rem;color:var(--muted);margin:8px 0 0">Ampere+ (A100, A6000, RTX 30xx+)</p>
141
+ </div>
142
+ </div>
143
+ <p style="color:var(--muted);font-size:0.875rem">AutoRound at iters=200 runs sign-gradient optimization over a calibration set to minimize weight rounding error. At W4A16, this closes most of the gap between naive round-to-nearest and GPTQ/AWQ, while producing a checkpoint that vLLM can load natively.</p>
144
+
145
+ <h2>Model Catalog</h2>
146
+ <p style="color:var(--muted);font-size:0.875rem">All 16 models in compressed-tensors format, validated for vLLM v0.9.0+.</p>
147
+
148
+ <div class="model-family">
149
+ <h3>Qwen3.6-35B-A3B β€” Mixed-Precision MoE, 1M context</h3>
150
+ <table>
151
+ <tr><th>Precision</th><th>Repo</th><th>Architecture</th></tr>
152
+ <tr><td><span class="badge">W8A16</span></td><td><a href="https://huggingface.co/88plug/Qwen3.6-35B-A3B-W8A16">88plug/Qwen3.6-35B-A3B-W8A16</a></td><td>MoE, 35B total / 3.6B active</td></tr>
153
+ <tr><td><span class="badge">W4A16</span></td><td><a href="https://huggingface.co/88plug/Qwen3.6-35B-A3B-W4A16">88plug/Qwen3.6-35B-A3B-W4A16</a></td><td>MoE, 35B total / 3.6B active</td></tr>
154
+ </table>
155
+ </div>
156
+
157
+ <div class="model-family">
158
+ <h3>Qwen3.6-27B β€” Dense Hybrid, 262k context</h3>
159
+ <table>
160
+ <tr><th>Precision</th><th>Repo</th><th>Architecture</th></tr>
161
+ <tr><td><span class="badge">W8A16</span></td><td><a href="https://huggingface.co/88plug/Qwen3.6-27B-W8A16">88plug/Qwen3.6-27B-W8A16</a></td><td>Dense, 27B</td></tr>
162
+ <tr><td><span class="badge">W4A16</span></td><td><a href="https://huggingface.co/88plug/Qwen3.6-27B-W4A16">88plug/Qwen3.6-27B-W4A16</a></td><td>Dense, 27B</td></tr>
163
+ </table>
164
+ </div>
165
+
166
+ <div class="model-family">
167
+ <h3>Qwen3-Omni-30B-A3B β€” Audio + Vision + Speech</h3>
168
+ <table>
169
+ <tr><th>Precision</th><th>Repo</th><th>Architecture</th></tr>
170
+ <tr><td><span class="badge">W8A16</span></td><td><a href="https://huggingface.co/88plug/Qwen3-Omni-30B-A3B-W8A16">88plug/Qwen3-Omni-30B-A3B-W8A16</a></td><td>Omni MoE, 30B / 3B active</td></tr>
171
+ <tr><td><span class="badge">W4A16</span></td><td><a href="https://huggingface.co/88plug/Qwen3-Omni-30B-W4A16">88plug/Qwen3-Omni-30B-W4A16</a></td><td>Omni MoE, 30B / 3B active</td></tr>
172
+ </table>
173
+ </div>
174
+
175
+ <div class="model-family">
176
+ <h3>Qwen2.5-Omni-7B β€” Efficient Omni</h3>
177
+ <table>
178
+ <tr><th>Precision</th><th>Repo</th><th>Architecture</th></tr>
179
+ <tr><td><span class="badge">W8A16</span></td><td><a href="https://huggingface.co/88plug/Qwen2.5-Omni-7B-W8A16">88plug/Qwen2.5-Omni-7B-W8A16</a></td><td>Omni dense, 7B</td></tr>
180
+ <tr><td><span class="badge">W4A16</span></td><td><a href="https://huggingface.co/88plug/Qwen2.5-Omni-7B-W4A16">88plug/Qwen2.5-Omni-7B-W4A16</a></td><td>Omni dense, 7B</td></tr>
181
+ </table>
182
+ </div>
183
+
184
+ <div class="model-family">
185
+ <h3>Gemma4-E4B-it β€” Vision-Language Model</h3>
186
+ <table>
187
+ <tr><th>Precision</th><th>Repo</th><th>Architecture</th></tr>
188
+ <tr><td><span class="badge">W8A16</span></td><td><a href="https://huggingface.co/88plug/Gemma4-E4B-it-W8A16">88plug/Gemma4-E4B-it-W8A16</a></td><td>VLM MoE, 4B active / 28B total</td></tr>
189
+ <tr><td><span class="badge">W4A16</span></td><td><a href="https://huggingface.co/88plug/Gemma4-E4B-it-W4A16">88plug/Gemma4-E4B-it-W4A16</a></td><td>VLM MoE, 4B active / 28B total</td></tr>
190
+ </table>
191
+ </div>
192
+
193
+ <div class="model-family">
194
+ <h3>Gemma4-E2B-it β€” Ultra-Efficient VLM</h3>
195
+ <table>
196
+ <tr><th>Precision</th><th>Repo</th><th>Architecture</th></tr>
197
+ <tr><td><span class="badge">W8A16</span></td><td><a href="https://huggingface.co/88plug/Gemma4-E2B-it-W8A16">88plug/Gemma4-E2B-it-W8A16</a></td><td>VLM MoE, 2B active / 26B total</td></tr>
198
+ <tr><td><span class="badge">W4A16</span></td><td><a href="https://huggingface.co/88plug/Gemma4-E2B-it-W4A16">88plug/Gemma4-E2B-it-W4A16</a></td><td>VLM MoE, 2B active / 26B total</td></tr>
199
+ </table>
200
+ </div>
201
+
202
+ <div class="model-family">
203
+ <h3>MiniCPM-o-4.5 β€” Omni Model</h3>
204
+ <table>
205
+ <tr><th>Precision</th><th>Repo</th><th>Architecture</th></tr>
206
+ <tr><td><span class="badge">W8A16</span></td><td><a href="https://huggingface.co/88plug/MiniCPM-o-4.5-W8A16">88plug/MiniCPM-o-4.5-W8A16</a></td><td>Omni dense</td></tr>
207
+ <tr><td><span class="badge">W4A16</span></td><td><a href="https://huggingface.co/88plug/MiniCPM-o-4.5-W4A16">88plug/MiniCPM-o-4.5-W4A16</a></td><td>Omni dense</td></tr>
208
+ </table>
209
+ </div>
210
+
211
+ <div class="model-family">
212
+ <h3>Nemotron-3-Nano-30B-A3B β€” Hybrid SSM/Attention</h3>
213
+ <table>
214
+ <tr><th>Precision</th><th>Repo</th><th>Architecture</th></tr>
215
+ <tr><td><span class="badge">W8A16</span></td><td><a href="https://huggingface.co/88plug/Nemotron-3-Nano-30B-A3B-W8A16">88plug/Nemotron-3-Nano-30B-A3B-W8A16</a></td><td>Hybrid Mamba2 SSM + Attention MoE</td></tr>
216
+ <tr><td><span class="badge">W4A16</span></td><td><a href="https://huggingface.co/88plug/Nemotron-3-Nano-30B-A3B-W4A16">88plug/Nemotron-3-Nano-30B-A3B-W4A16</a></td><td>Hybrid Mamba2 SSM + Attention MoE</td></tr>
217
+ </table>
218
+ </div>
219
+
220
+ <h2>Quickstart</h2>
221
+ <p>Requires vLLM v0.9.0+ and an Ampere-class GPU (A100, A6000, RTX 3090/4090, or equivalent).</p>
222
+
223
+ <h3>Install</h3>
224
+ <pre><code>pip install vllm>=0.9.0</code></pre>
225
+
226
+ <h3>Offline inference</h3>
227
+ <pre><code>from vllm import LLM, SamplingParams
228
+
229
+ llm = LLM(
230
+ model="88plug/Qwen3.6-35B-A3B-W4A16",
231
+ max_model_len=131072,
232
+ tensor_parallel_size=1,
233
+ )
234
+
235
+ sampling_params = SamplingParams(temperature=0.6, top_p=0.95, max_tokens=512)
236
+ outputs = llm.generate(["Explain W4A16 vs W8A16 tradeoffs."], sampling_params)
237
+ print(outputs[0].outputs[0].text)</code></pre>
238
+
239
+ <h3>OpenAI-compatible server</h3>
240
+ <pre><code>vllm serve 88plug/Qwen3.6-35B-A3B-W4A16 \
241
+ --max-model-len 131072 \
242
+ --port 8000</code></pre>
243
+
244
+ <h2>Hardware Requirements</h2>
245
+ <table>
246
+ <tr><th>Model Size</th><th>W8A16 VRAM</th><th>W4A16 VRAM</th><th>Recommended</th></tr>
247
+ <tr><td>2B–7B</td><td>8–16 GB</td><td>6–10 GB</td><td>Single A6000 / RTX 4090</td></tr>
248
+ <tr><td>27B–35B (dense)</td><td>32–40 GB</td><td>20–28 GB</td><td>Single A100 80G or 2Γ— A6000</td></tr>
249
+ <tr><td>30B–35B (MoE, 3B active)</td><td>28–36 GB</td><td>18–24 GB</td><td>Single A100 80G or 2Γ— A6000</td></tr>
250
+ </table>
251
+
252
+ <hr>
253
+
254
+ <div class="contact">
255
+ <strong>Contact</strong><br>
256
+ Developer: Andrew Mello &nbsp;Β·&nbsp; <a href="https://88plug.com">88plug.com</a><br>
257
+ Issues and model requests: open a Discussion on the relevant model repo.<br>
258
+ <span style="color:var(--muted);font-size:0.85rem">Uploads automated via <a href="https://huggingface.co/88plug-bot">88plug-bot</a>.</span>
259
+ </div>
260
+
261
+ </body>
262
  </html>