Spaces:

polats
/

tiny-army-bls-code-zerogpu

Running on Zero

App Files Files Community

tiny-army-bls-code-zerogpu / README.md

polats

Add BLS Mini-Code 1.0 ZeroGPU coding sidecar

1419b82 verified 3 days ago

preview code

raw

history blame contribute delete

1.61 kB

A newer version of the Gradio SDK is available: 6.17.3

Upgrade

metadata

title: Tiny Army BLS Mini-Code ZeroGPU
emoji: 🪖
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 6.15.2
app_file: app.py
pinned: false
suggested_hardware: zero-a10g

Tiny Army — BLS Mini-Code 1.0 (ZeroGPU coding sidecar)

A ZeroGPU sidecar that serves CohereLabs/BLS-Mini-Code-1.0 (30B MoE coding model) to the Tiny Army app via the same Gradio API the Mellum2 / Tiny Aya sidecars expose.

API contract (consumed by the main app's `gradio_client`)

POST /generate_stream — args (system, user, max_tokens:int, temperature:float), streams cumulative decoded text (the app diffs successive frames into deltas).
POST /generate — same args, returns the final text in one shot.

Config (Space → Settings → Variables)

Var	Default	Notes
`TINY_BLS_MODEL`	`CohereLabs/BLS-Mini-Code-1.0`	source repo
`TINY_BLS_QUANT`	`4bit`	`4bit` (~~18GB) / `8bit` (~~32GB) / `bf16` (~60GB, tight) — no FP8 weight exists upstream, so we quantize at load
`TINY_BLS_GPU_DURATION`	`120`	ZeroGPU seconds per call

Hardware: set the Space to a ZeroGPU tier with enough VRAM. 30B at 4-bit fits an A10G/H200 ZeroGPU slice; bf16/8bit need the larger H200 slice. Adjust the hardware: field above to the ZeroGPU flavor you provision.

Wiring into the main app (later step)

Once this Space is live and the two endpoints respond, set TINY_BLS_CODE_SPACE=<owner>/<space> in the main app and add the routing branch + web/codingModel.js entry (mirrors Mellum2).

Tiny Army — BLS Mini-Code 1.0 (ZeroGPU coding sidecar)

API contract (consumed by the main app's gradio_client)

Config (Space → Settings → Variables)

Wiring into the main app (later step)

API contract (consumed by the main app's `gradio_client`)