Spaces:
Running on Zero
Running on Zero
A newer version of the Gradio SDK is available: 6.17.3
metadata
title: Tiny Army BLS Mini-Code ZeroGPU
emoji: πͺ
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 6.15.2
app_file: app.py
pinned: false
suggested_hardware: zero-a10g
Tiny Army β BLS Mini-Code 1.0 (ZeroGPU coding sidecar)
A ZeroGPU sidecar that serves CohereLabs/BLS-Mini-Code-1.0
(30B MoE coding model) to the Tiny Army app via the same Gradio API the Mellum2 / Tiny Aya
sidecars expose.
API contract (consumed by the main app's gradio_client)
POST /generate_streamβ args(system, user, max_tokens:int, temperature:float), streams cumulative decoded text (the app diffs successive frames into deltas).POST /generateβ same args, returns the final text in one shot.
Config (Space β Settings β Variables)
| Var | Default | Notes |
|---|---|---|
TINY_BLS_MODEL |
CohereLabs/BLS-Mini-Code-1.0 |
source repo |
TINY_BLS_QUANT |
4bit |
4bit (8bit (bf16 (~60GB, tight) β no FP8 weight exists upstream, so we quantize at load |
TINY_BLS_GPU_DURATION |
120 |
ZeroGPU seconds per call |
Hardware: set the Space to a ZeroGPU tier with enough VRAM. 30B at 4-bit fits an A10G/H200 ZeroGPU slice;
bf16/8bitneed the larger H200 slice. Adjust thehardware:field above to the ZeroGPU flavor you provision.
Wiring into the main app (later step)
Once this Space is live and the two endpoints respond, set TINY_BLS_CODE_SPACE=<owner>/<space>
in the main app and add the routing branch + web/codingModel.js entry (mirrors Mellum2).