Spaces:
Running
Running
| title: TurboCPP Demo | |
| emoji: π | |
| colorFrom: indigo | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 5.5.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| python_version: "3.12" | |
| short_description: Live llama.cpp + Hadamard rotation demo (TurboQuant) | |
| # turbocpp β llama.cpp + TurboQuant | |
| Live demo of [github.com/Ary5272/turbocpp](https://github.com/Ary5272/turbocpp). | |
| Two tabs: | |
| 1. **Run inference** β TinyLlama-1.1B-Chat (Q4_K_M) loaded via | |
| `llama-cpp-python` and run on this Space's CPU. Type a prompt, get | |
| tokens, see tok/s. | |
| 2. **TurboQuant math viz** β interactive sliders showing how the | |
| Hadamard rotation Gaussianizes per-block weight distributions and | |
| reduces the per-block max-abs that drives Q4 / Q4_K rounding error. | |
| ## Build details | |
| - **Gradio 5** + **Python 3.12** β Gradio 4 + new Starlette is broken in | |
| ways that don't resolve cleanly with version pins (TemplateResponse | |
| signature change, pydantic schema change), so we just upgrade. | |
| - **llama-cpp-python** installed from a **prebuilt wheel** at | |
| [AIencoder/llama-cpp-wheels](https://huggingface.co/datasets/AIencoder/llama-cpp-wheels) | |
| (variant `0.3.16+basic_avx2_fma_f16c-cp312`). HF Spaces don't reliably | |
| build this from source, so we ship the binary. | |
| - First `generate` cold-starts (~668 MB GGUF download). Subsequent calls | |
| are fast (model stays loaded in memory). | |