Kimi-Audio random / test fixture
Tiny random-init bundle of Kimi-Audio-7B-Instruct
for vLLM-Omni's L1/L2 core_model CI tests.
Verifies the full pipeline end-to-end without paying the ~42 GB checkpoint cost.
It follows the same on-disk schema as upstream, but every transformer-style component has shrunk dimensions and random weights:
| Component | File | Upstream | Random |
|---|---|---|---|
| LM (Qwen-2-style + MIMO) | model.safetensors |
16 GB sharded | 555 MB (single shard) |
| Whisper encoder | whisper-large-v3/model.safetensors |
3 GB | 17 MB (encoder only) |
| Audio detokenizer (FM DiT) | audio_detokenizer/model.pt |
19 GB | 35 MB |
Shrunk dims (token IDs / vocab sizes kept at upstream values):
- LM:
hidden_size 3584β512,num_hidden_layers 28β4,num_attention_heads 28β8,intermediate_size 18944β1536,kimia_mimo_layers 6β2,kimia_mimo_transformer_from_layer_index 21β2,kimia_adaptor_input_dim 5120β1536 - Whisper:
d_model 1280β384,encoder_layers 32β4,encoder_ffn_dim 5120β1536,encoder_attention_heads 20β6(decoder weights dropped β vLLM only uses the encoder) - FM DiT:
hidden_size 2304β384,depth 16β4,num_heads 18β6,condition_input_dim 1280β384
The bundle does not ship a vocoder/ subdir β KimiBigVGAN loads from
zhangj1an/kimi-audio-bigvgan-hf at runtime.
modeling_moonshot_kimia.py was patched to stub flash_attn symbols (instead of raising)
so AutoModelForCausalLM.from_config(trust_remote_code=True) works in CI without
flash_attn installed; vLLM-Omni replaces the attention impl anyway.
Do not use for actual generation β outputs are noise.
- Downloads last month
- 338