Kimi-Audio random / test fixture

Tiny random-init bundle of Kimi-Audio-7B-Instruct for vLLM-Omni's L1/L2 core_model CI tests. Verifies the full pipeline end-to-end without paying the ~42 GB checkpoint cost.

It follows the same on-disk schema as upstream, but every transformer-style component has shrunk dimensions and random weights:

Component	File	Upstream	Random
LM (Qwen-2-style + MIMO)	`model.safetensors`	16 GB sharded	555 MB (single shard)
Whisper encoder	`whisper-large-v3/model.safetensors`	3 GB	17 MB (encoder only)
Audio detokenizer (FM DiT)	`audio_detokenizer/model.pt`	19 GB	35 MB

Shrunk dims (token IDs / vocab sizes kept at upstream values):

LM: hidden_size 3584→512, num_hidden_layers 28→4, num_attention_heads 28→8, intermediate_size 18944→1536, kimia_mimo_layers 6→2, kimia_mimo_transformer_from_layer_index 21→2, kimia_adaptor_input_dim 5120→1536
Whisper: d_model 1280→384, encoder_layers 32→4, encoder_ffn_dim 5120→1536, encoder_attention_heads 20→6 (decoder weights dropped — vLLM only uses the encoder)
FM DiT: hidden_size 2304→384, depth 16→4, num_heads 18→6, condition_input_dim 1280→384

The bundle does not ship a vocoder/ subdir — KimiBigVGAN loads from zhangj1an/kimi-audio-bigvgan-hf at runtime.

modeling_moonshot_kimia.py was patched to stub flash_attn symbols (instead of raising) so AutoModelForCausalLM.from_config(trust_remote_code=True) works in CI without flash_attn installed; vLLM-Omni replaces the attention impl anyway.

Do not use for actual generation — outputs are noise.

Downloads last month: 338

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support