Instructions to use zhangj1an/audiox_random with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use zhangj1an/audiox_random with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("zhangj1an/audiox_random", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| tags: | |
| - vllm-omni | |
| - audiox | |
| - test-fixture | |
| # AudioX random / test fixture | |
| A tiny **random-init** bundle of [vLLM-Omni](https://github.com/vllm-project/vllm-omni)'s | |
| `AudioXPipeline`. Used by the L1/L2 `core_model` CI tests | |
| (`tests/e2e/offline_inference/test_audiox_model.py`, | |
| `tests/e2e/online_serving/test_audiox_online.py`) so they can verify the full | |
| pipeline (load β forward β trim β return numpy WAV) end-to-end without paying | |
| the cost of the real ~11 GB checkpoint. | |
| It follows the same `config.json` schema as | |
| [`zhangj1an/AudioX`](https://huggingface.co/zhangj1an/AudioX), but with much | |
| smaller transformer dimensions: | |
| - `embed_dim`: 1536 β 384 | |
| - `depth`: 24 β 4 | |
| - `num_heads`: 24 β 6 | |
| - `gate_type_config.num_experts_per_modality`: 64 β 16 | |
| - `gate_type_config.num_fusion_layers`: 8 β 2 | |
| - `sample_size`: 485100 β 483328 (still gives `latent_len = sample_size // 2048 = 236`, | |
| matching the transformer's RoPE precompute) | |
| All weights are random, fp16, generated by running the `AudioXPipeline.__init__` | |
| with the small config and dumping its `state_dict()` with the bundle's legacy | |
| naming convention. **Do not use for actual generation** β outputs are noise. | |