Spaces:

AIencoder
/

turboquant-visualizer

Running

App Files Files Community

turboquant-visualizer / README.md

AIencoder

v6: bump to Gradio 5 (fixes Starlette 0.40+ TemplateResponse signature)

54f1cb1 verified 10 days ago

preview code

raw

history blame contribute delete

1.36 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

metadata

title: TurboCPP Demo
emoji: 🌀
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.5.0
app_file: app.py
pinned: false
license: mit
python_version: '3.12'
short_description: Live llama.cpp + Hadamard rotation demo (TurboQuant)

turbocpp — llama.cpp + TurboQuant

Live demo of github.com/Ary5272/turbocpp.

Two tabs:

Run inference — TinyLlama-1.1B-Chat (Q4_K_M) loaded via llama-cpp-python and run on this Space's CPU. Type a prompt, get tokens, see tok/s.
TurboQuant math viz — interactive sliders showing how the Hadamard rotation Gaussianizes per-block weight distributions and reduces the per-block max-abs that drives Q4 / Q4_K rounding error.

Build details

Gradio 5 + Python 3.12 — Gradio 4 + new Starlette is broken in ways that don't resolve cleanly with version pins (TemplateResponse signature change, pydantic schema change), so we just upgrade.
llama-cpp-python installed from a prebuilt wheel at AIencoder/llama-cpp-wheels (variant 0.3.16+basic_avx2_fma_f16c-cp312). HF Spaces don't reliably build this from source, so we ship the binary.
First generate cold-starts (~668 MB GGUF download). Subsequent calls are fast (model stays loaded in memory).