Spaces:

konjoai
/

README

Running

App Files Files Community

wscholl commited on 3 days ago

Commit

d6a71ce

verified ·

1 Parent(s): a9e62d5

feat: render squish-focused org card content

Browse files

Files changed (1) hide show

index.html +149 -18

index.html CHANGED Viewed

@@ -1,19 +1,150 @@
-<!doctype html>
-<html>
-	<head>
-		<meta charset="utf-8" />
-		<meta name="viewport" content="width=device-width" />
-		<title>My static Space</title>
-		<link rel="stylesheet" href="style.css" />
-	</head>
-	<body>
-		<div class="card">
-			<h1>Welcome to your static Space!</h1>
-			<p>You can modify this app directly by editing <i>index.html</i> in the Files and versions tab.</p>
-			<p>
-				Also don't forget to check the
-				<a href="https://huggingface.co/docs/hub/spaces" target="_blank">Spaces documentation</a>.
-			</p>
-		</div>
-	</body>
 </html>

+<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="utf-8" />
+<meta name="viewport" content="width=device-width, initial-scale=1" />
+<title>Konjo AI</title>
+<style>
+  :root { color-scheme: light dark; }
+  * { box-sizing: border-box; }
+  body {
+    margin: 0;
+    font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto,
+                 Helvetica, Arial, sans-serif;
+    line-height: 1.6;
+    color: inherit;
+    background: transparent;
+  }
+  .wrap { max-width: 820px; margin: 0 auto; padding: 8px 4px 32px; }
+  h1 { font-size: 1.9rem; margin: 0 0 .25rem; }
+  h2 { font-size: 1.3rem; margin: 2rem 0 .5rem; }
+  h3 { font-size: 1.05rem; margin: 1.25rem 0 .4rem; }
+  p { margin: .5rem 0; }
+  a { color: #2563eb; text-decoration: none; }
+  a:hover { text-decoration: underline; }
+  ul { margin: .5rem 0; padding-left: 1.3rem; }
+  li { margin: .25rem 0; }
+  hr { border: none; border-top: 1px solid rgba(128,128,128,.3); margin: 1.5rem 0; }
+  code {
+    font-family: ui-monospace, SFMono-Regular, Menlo, Consolas, monospace;
+    font-size: .85em;
+    background: rgba(128,128,128,.15);
+    padding: .12em .35em;
+    border-radius: 4px;
+  }
+  pre {
+    background: rgba(128,128,128,.12);
+    border: 1px solid rgba(128,128,128,.2);
+    border-radius: 8px;
+    padding: .85rem 1rem;
+    overflow-x: auto;
+  }
+  pre code { background: none; padding: 0; }
+  table { border-collapse: collapse; width: 100%; margin: .75rem 0; font-size: .92rem; }
+  th, td { border: 1px solid rgba(128,128,128,.3); padding: .45rem .6rem; text-align: left; }
+  th { background: rgba(128,128,128,.12); }
+  .tagline { color: #6b7280; margin-top: 0; }
+  .links { font-size: .95rem; }
+</style>
+</head>
+<body>
+<div class="wrap">
+  <h1>🗜 Konjo AI</h1>
+  <p class="tagline">Local AI infrastructure for Apple Silicon. We make models
+    that already exist run faster on the hardware you already own.</p>
+  <p class="links">🌐 <a href="https://squish.run">squish.run</a> ·
+    💻 <a href="https://github.com/konjoai">github.com/konjoai</a></p>
+  <hr />
+  <h2>squish — Local LLM inference for Apple Silicon</h2>
+  <p><a href="https://github.com/konjoai/squish">squish</a> is an MLX-based local
+    inference server with a block-level paged KV cache and INT3 quantization
+    support for the Qwen3 family. On a 16 GB M3 MacBook against Ollama:</p>
+  <ul>
+    <li><strong>5.4× faster</strong> end-to-end response at 4000-token prompts (12.78s vs 69.6s)</li>
+    <li><strong>1.5× faster</strong> end-to-end on 75-token prompts (5.50s vs 8.09s)</li>
+    <li><strong>33% less RAM</strong> during inference (3.36 GB vs ~5 GB)</li>
+    <li><strong>INT3 support</strong> for Qwen3 with no measurable accuracy loss (Ollama doesn't ship INT3)</li>
+  </ul>
+  <p>The honest tradeoff: Ollama still wins first-token latency on short prompts.
+    squish wins when you care about total response time on real workloads.</p>
+  <h3>Install</h3>
+  <pre><code>brew tap konjoai/squish &amp;&amp; brew install squish
+# or
+pip install squish-ai</code></pre>
+  <h3>Use</h3>
+  <pre><code>squish pull konjoai/Qwen3-8B-squished
+squish run Qwen3-8B-squished</code></pre>
+  <p class="links">
+    <a href="https://github.com/konjoai/squish/blob/main/docs/RESULTS.md">Full benchmarks</a> ·
+    <a href="https://github.com/konjoai/squish">Repo</a> ·
+    <a href="https://github.com/konjoai/squish/issues">Issues</a>
+  </p>
+  <hr />
+  <h2>Pre-Compressed Models</h2>
+  <p>This org hosts models pre-compressed by squish. Pull once, load instantly
+    every time after.</p>
+  <table>
+    <thead>
+      <tr><th>Model</th><th>Squish ID</th><th>Quantization</th><th>Disk size</th><th>Context</th></tr>
+    </thead>
+    <tbody>
+      <tr><td colspan="5"><em>Available after first publish batch</em></td></tr>
+    </tbody>
+  </table>
+  <p>The format is <code>mlx_lm</code>-compatible — you can also use these models directly:</p>
+  <pre><code>from mlx_lm import load, generate
+model, tokenizer = load("konjoai/Qwen2.5-7B-Instruct-squished")
+response = generate(model, tokenizer, prompt="Hello", max_tokens=100)
+print(response)</code></pre>
+  <hr />
+  <h2>How models are compressed</h2>
+  <p>squish uses a three-tier pipeline:</p>
+  <ul>
+    <li><strong>INT4/INT3 quantization</strong> via a Rust extension
+      (<code>squish_quant_rs</code>) with ARM NEON acceleration</li>
+    <li><strong>Block-level paged KV cache</strong> — KV state is chunked into
+      fixed-size blocks for prefix reuse across sessions</li>
+    <li><strong>Quantization safeguards</strong> — squish hard-blocks INT3 on
+      model families where it collapses (e.g. Gemma-3 loses ~15pp on common
+      benchmarks); INT3 ships only for families that hold accuracy (Qwen3
+      specifically)</li>
+  </ul>
+  <hr />
+  <h2>Other projects</h2>
+  <p>We also build <a href="https://github.com/konjoai/squash">squash</a>, a
+    security and EU AI Act compliance scanner for HuggingFace models. Independent
+    codebase, related mission.</p>
+  <hr />
+  <h2>License</h2>
+  <p>squish is BUSL-1.1. Compressed models inherit their base model's license —
+    Qwen3 is Apache-2.0, Llama is the Llama Community License, etc. Check each
+    model's card for specifics.</p>
+  <hr />
+  <h2>Requirements</h2>
+  <ul>
+    <li>macOS 13.0 or later</li>
+    <li>Apple Silicon (M1 / M2 / M3 / M4 / M5)</li>
+    <li>Enough unified memory for the model (table above)</li>
+  </ul>
+  <p>Intel Macs and Linux are not supported.</p>
+</div>
+</body>
 </html>