Instructions to use Reza2kn/Cosmos3-Nano-MLX-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use Reza2kn/Cosmos3-Nano-MLX-8bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Cosmos3-Nano-MLX-8bit Reza2kn/Cosmos3-Nano-MLX-8bit
- Cosmos
How to use Reza2kn/Cosmos3-Nano-MLX-8bit with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Cosmos3-Nano β MLX 8-bit (Apple Silicon, quality tier)
An 8-bit MLX build of nvidia/Cosmos3-Nano that
runs on Apple Silicon. The custom Cosmos3 omni-MoT diffusion transformer was ported to MLX from
scratch (no mlx-vlm support exists) and every block validated against torch. This is the quality
tier: near-lossless, and it fixes the hand/anatomy wobble seen in the 4-bit build.
Derivative of
nvidia/Cosmos3-Nano. Β© NVIDIA. Distributed under OpenMDW-1.1 (license + NVIDIA copyright/origin notices retained). Not affiliated with, nor endorsed by, NVIDIA.
Highlights
- Transformer: 30.3 GB bf16 β 18.7 GB MLX-8bit (1.6Γ; attn+MLP linears at 8-bit, group-64).
- Runs ~19 GB β fits a 24 GB+ Mac. Near-lossless quality.
- Quality: hands and complex anatomy come out clean (compare
samples/barista.png,samples/anime.pnghere vs the 4-bit build) β use this build when quality matters; use the 4-bit build (~11 GB) for the smallest footprint. - Validated: every module matches torch (primitives ~1e-6, full layer ~1e-3, packing bit-exact).
Usage
import torch
from huggingface_hub import snapshot_download
from mlx_pipeline import MLXCosmos3Transformer # included in this repo
from diffusers import Cosmos3OmniPipeline, AutoencoderKLWan, UniPCMultistepScheduler
from diffusers.models.autoencoders.autoencoder_cosmos3_audio import Cosmos3AVAEAudioTokenizer
from transformers import AutoTokenizer
repo = snapshot_download("Reza2kn/Cosmos3-Nano-MLX-8bit")
vae = AutoencoderKLWan.from_pretrained(repo, subfolder="vae", torch_dtype=torch.float32).eval()
sched = UniPCMultistepScheduler.from_pretrained(repo, subfolder="scheduler")
tok = AutoTokenizer.from_pretrained(repo, subfolder="text_tokenizer")
st = Cosmos3AVAEAudioTokenizer.from_pretrained(repo, subfolder="sound_tokenizer", torch_dtype=torch.float32).eval()
pipe = Cosmos3OmniPipeline(transformer=MLXCosmos3Transformer(repo + "/transformer"),
text_tokenizer=tok, vae=vae, scheduler=sched, sound_tokenizer=st, enable_safety_checker=False)
img = pipe("A red panda astronaut floating in a nebula", num_frames=1, height=384, width=384).video[0][0]
img.save("out.png")
Requires: mlx, diffusers (git main/β₯0.39), transformers, torch (VAE/scheduler only).
Status
- text2image: working (clean, see
samples/). - text2video: working (
num_frames>1). - image2video / audio: in progress (conditioning + sound paths).
The 8-bit runner reads bits/group_size from transformer/mlx_quant_config.json, so the same
mlx_cosmos3.py/mlx_pipeline.py code runs both the 4-bit and 8-bit builds.
8-bit
Model tree for Reza2kn/Cosmos3-Nano-MLX-8bit
Base model
nvidia/Cosmos3-Nano