wscholl commited on
Commit
d6a71ce
Β·
verified Β·
1 Parent(s): a9e62d5

feat: render squish-focused org card content

Browse files
Files changed (1) hide show
  1. index.html +149 -18
index.html CHANGED
@@ -1,19 +1,150 @@
1
- <!doctype html>
2
- <html>
3
- <head>
4
- <meta charset="utf-8" />
5
- <meta name="viewport" content="width=device-width" />
6
- <title>My static Space</title>
7
- <link rel="stylesheet" href="style.css" />
8
- </head>
9
- <body>
10
- <div class="card">
11
- <h1>Welcome to your static Space!</h1>
12
- <p>You can modify this app directly by editing <i>index.html</i> in the Files and versions tab.</p>
13
- <p>
14
- Also don't forget to check the
15
- <a href="https://huggingface.co/docs/hub/spaces" target="_blank">Spaces documentation</a>.
16
- </p>
17
- </div>
18
- </body>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  </html>
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="utf-8" />
5
+ <meta name="viewport" content="width=device-width, initial-scale=1" />
6
+ <title>Konjo AI</title>
7
+ <style>
8
+ :root { color-scheme: light dark; }
9
+ * { box-sizing: border-box; }
10
+ body {
11
+ margin: 0;
12
+ font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto,
13
+ Helvetica, Arial, sans-serif;
14
+ line-height: 1.6;
15
+ color: inherit;
16
+ background: transparent;
17
+ }
18
+ .wrap { max-width: 820px; margin: 0 auto; padding: 8px 4px 32px; }
19
+ h1 { font-size: 1.9rem; margin: 0 0 .25rem; }
20
+ h2 { font-size: 1.3rem; margin: 2rem 0 .5rem; }
21
+ h3 { font-size: 1.05rem; margin: 1.25rem 0 .4rem; }
22
+ p { margin: .5rem 0; }
23
+ a { color: #2563eb; text-decoration: none; }
24
+ a:hover { text-decoration: underline; }
25
+ ul { margin: .5rem 0; padding-left: 1.3rem; }
26
+ li { margin: .25rem 0; }
27
+ hr { border: none; border-top: 1px solid rgba(128,128,128,.3); margin: 1.5rem 0; }
28
+ code {
29
+ font-family: ui-monospace, SFMono-Regular, Menlo, Consolas, monospace;
30
+ font-size: .85em;
31
+ background: rgba(128,128,128,.15);
32
+ padding: .12em .35em;
33
+ border-radius: 4px;
34
+ }
35
+ pre {
36
+ background: rgba(128,128,128,.12);
37
+ border: 1px solid rgba(128,128,128,.2);
38
+ border-radius: 8px;
39
+ padding: .85rem 1rem;
40
+ overflow-x: auto;
41
+ }
42
+ pre code { background: none; padding: 0; }
43
+ table { border-collapse: collapse; width: 100%; margin: .75rem 0; font-size: .92rem; }
44
+ th, td { border: 1px solid rgba(128,128,128,.3); padding: .45rem .6rem; text-align: left; }
45
+ th { background: rgba(128,128,128,.12); }
46
+ .tagline { color: #6b7280; margin-top: 0; }
47
+ .links { font-size: .95rem; }
48
+ </style>
49
+ </head>
50
+ <body>
51
+ <div class="wrap">
52
+
53
+ <h1>πŸ—œ Konjo AI</h1>
54
+ <p class="tagline">Local AI infrastructure for Apple Silicon. We make models
55
+ that already exist run faster on the hardware you already own.</p>
56
+ <p class="links">🌐 <a href="https://squish.run">squish.run</a> ·
57
+ πŸ’» <a href="https://github.com/konjoai">github.com/konjoai</a></p>
58
+
59
+ <hr />
60
+
61
+ <h2>squish β€” Local LLM inference for Apple Silicon</h2>
62
+ <p><a href="https://github.com/konjoai/squish">squish</a> is an MLX-based local
63
+ inference server with a block-level paged KV cache and INT3 quantization
64
+ support for the Qwen3 family. On a 16 GB M3 MacBook against Ollama:</p>
65
+ <ul>
66
+ <li><strong>5.4Γ— faster</strong> end-to-end response at 4000-token prompts (12.78s vs 69.6s)</li>
67
+ <li><strong>1.5Γ— faster</strong> end-to-end on 75-token prompts (5.50s vs 8.09s)</li>
68
+ <li><strong>33% less RAM</strong> during inference (3.36 GB vs ~5 GB)</li>
69
+ <li><strong>INT3 support</strong> for Qwen3 with no measurable accuracy loss (Ollama doesn't ship INT3)</li>
70
+ </ul>
71
+ <p>The honest tradeoff: Ollama still wins first-token latency on short prompts.
72
+ squish wins when you care about total response time on real workloads.</p>
73
+
74
+ <h3>Install</h3>
75
+ <pre><code>brew tap konjoai/squish &amp;&amp; brew install squish
76
+ # or
77
+ pip install squish-ai</code></pre>
78
+
79
+ <h3>Use</h3>
80
+ <pre><code>squish pull konjoai/Qwen3-8B-squished
81
+ squish run Qwen3-8B-squished</code></pre>
82
+
83
+ <p class="links">
84
+ <a href="https://github.com/konjoai/squish/blob/main/docs/RESULTS.md">Full benchmarks</a> Β·
85
+ <a href="https://github.com/konjoai/squish">Repo</a> Β·
86
+ <a href="https://github.com/konjoai/squish/issues">Issues</a>
87
+ </p>
88
+
89
+ <hr />
90
+
91
+ <h2>Pre-Compressed Models</h2>
92
+ <p>This org hosts models pre-compressed by squish. Pull once, load instantly
93
+ every time after.</p>
94
+ <table>
95
+ <thead>
96
+ <tr><th>Model</th><th>Squish ID</th><th>Quantization</th><th>Disk size</th><th>Context</th></tr>
97
+ </thead>
98
+ <tbody>
99
+ <tr><td colspan="5"><em>Available after first publish batch</em></td></tr>
100
+ </tbody>
101
+ </table>
102
+ <p>The format is <code>mlx_lm</code>-compatible β€” you can also use these models directly:</p>
103
+ <pre><code>from mlx_lm import load, generate
104
+
105
+ model, tokenizer = load("konjoai/Qwen2.5-7B-Instruct-squished")
106
+ response = generate(model, tokenizer, prompt="Hello", max_tokens=100)
107
+ print(response)</code></pre>
108
+
109
+ <hr />
110
+
111
+ <h2>How models are compressed</h2>
112
+ <p>squish uses a three-tier pipeline:</p>
113
+ <ul>
114
+ <li><strong>INT4/INT3 quantization</strong> via a Rust extension
115
+ (<code>squish_quant_rs</code>) with ARM NEON acceleration</li>
116
+ <li><strong>Block-level paged KV cache</strong> β€” KV state is chunked into
117
+ fixed-size blocks for prefix reuse across sessions</li>
118
+ <li><strong>Quantization safeguards</strong> β€” squish hard-blocks INT3 on
119
+ model families where it collapses (e.g. Gemma-3 loses ~15pp on common
120
+ benchmarks); INT3 ships only for families that hold accuracy (Qwen3
121
+ specifically)</li>
122
+ </ul>
123
+
124
+ <hr />
125
+
126
+ <h2>Other projects</h2>
127
+ <p>We also build <a href="https://github.com/konjoai/squash">squash</a>, a
128
+ security and EU AI Act compliance scanner for HuggingFace models. Independent
129
+ codebase, related mission.</p>
130
+
131
+ <hr />
132
+
133
+ <h2>License</h2>
134
+ <p>squish is BUSL-1.1. Compressed models inherit their base model's license β€”
135
+ Qwen3 is Apache-2.0, Llama is the Llama Community License, etc. Check each
136
+ model's card for specifics.</p>
137
+
138
+ <hr />
139
+
140
+ <h2>Requirements</h2>
141
+ <ul>
142
+ <li>macOS 13.0 or later</li>
143
+ <li>Apple Silicon (M1 / M2 / M3 / M4 / M5)</li>
144
+ <li>Enough unified memory for the model (table above)</li>
145
+ </ul>
146
+ <p>Intel Macs and Linux are not supported.</p>
147
+
148
+ </div>
149
+ </body>
150
  </html>