How to use from the
Use from the
Diffusers library
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline

# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("zhangj1an/audiox_random", dtype=torch.bfloat16, device_map="cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]

AudioX random / test fixture

A tiny random-init bundle of vLLM-Omni's AudioXPipeline. Used by the L1/L2 core_model CI tests (tests/e2e/offline_inference/test_audiox_model.py, tests/e2e/online_serving/test_audiox_online.py) so they can verify the full pipeline (load β†’ forward β†’ trim β†’ return numpy WAV) end-to-end without paying the cost of the real ~11 GB checkpoint.

It follows the same config.json schema as zhangj1an/AudioX, but with much smaller transformer dimensions:

  • embed_dim: 1536 β†’ 384
  • depth: 24 β†’ 4
  • num_heads: 24 β†’ 6
  • gate_type_config.num_experts_per_modality: 64 β†’ 16
  • gate_type_config.num_fusion_layers: 8 β†’ 2
  • sample_size: 485100 β†’ 483328 (still gives latent_len = sample_size // 2048 = 236, matching the transformer's RoPE precompute)

All weights are random, fp16, generated by running the AudioXPipeline.__init__ with the small config and dumping its state_dict() with the bundle's legacy naming convention. Do not use for actual generation β€” outputs are noise.

Downloads last month
172
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support