GGUF + pure-C++ runtime in CrispASR — OmniASR-LLM-1B
We've added OmniASR-LLM-1B to CrispASR's omniasr backend. Same src/omniasr.cpp runtime as the CTC variants — dispatched by GGUF metadata to the LLM decode path when the LLaMA decoder weights are present.
Architecture: same 48-layer encoder (d=1280) as CTC-1B + a 12-layer LLaMA decoder (d=4096, SwiGLU, RoPE) + enc_proj projector. Autoregressive — KV-cached decode with flash attention, native punctuation/capitalisation from the LM (unlike the CTC variants which need --punc-model).
Smoke test: 8.5 GB .pt → 4.55 GB F16 GGUF (918 tensors). JFK on Q4_K transcribes:
"fellow americas ask not what your country can do for you"
(LM cosmetic differences from the CTC reference are expected — autoregressive models pick different but valid punctuation/spelling.)
Pre-quantised GGUFs (Apache-2.0): cstr/omniasr-llm-1b-GGUF
./build/bin/crispasr --backend omniasr -m omniasr-llm-1b-q4_k.gguf -f audio.wav -osrt
CTC siblings (faster, no native punctuation): CTC-300M, CTC-1B. Smaller LLM variant: cstr/omniasr-llm-300m-v2-GGUF. Dynamic language selection (1693 FLORES-200 codes).