Quasar-Preview-mlx-4bit

A 4-bit (โ‰ˆ4.5 bits/weight) MLX conversion of silx-ai/Quasar-Preview, runnable on Apple Silicon with mlx-lm.

MLX support for this architecture is added in ml-explore/mlx-lm#1407. Until that PR is merged, install mlx-lm from the branch (see below).

Usage

# mlx-lm with the quasar_long model (until #1407 is merged):
pip install "mlx-lm @ git+https://github.com/SahilChachra/mlx-lm@add-quasar-long-model"

python -m mlx_lm.generate \
  --model sahilchachra/Quasar-Preview-mlx-4bit \
  --prompt "The capital of France is" \
  --max-tokens 60 --temp 0.0 --ignore-chat-template

Use --ignore-chat-template. This is a base / preview checkpoint, not instruction-tuned โ€” applying the chat template produces degenerate output. Prompt it as a text-completion model.

Example output:

The capital of France is Paris. The city is located in the northeastern part of
France, along the banks of the Seine River. Paris is known for its rich history,
art, culture, and fashion. It is also a ...

Architecture

Quasar-Long is a hybrid linear-attention MoE model. Every layer runs standard GQA softmax attention (partial RoPE + NoPE-after-512, QK-norm). Layers 4โ€“19 additionally run one linear-attention branch โ€” assigned per layer by hybrid_layerwise_cycle โ€” whose gated output is added to the attention output. The MLP is a 256-expert DeepSeek-V3-style sparse MoE (sigmoid router, group top-k, shared expert + expert bias); layer 0 is dense.

Branch Layers Underlying op
GLA 8, 13, 18 gated linear attention (fla.ops.simple_gla)
Raven 5, 10, 15 gated slot attention (fla.ops.gsa), Mamba2 decay + top-k slot router
Quasar 4,6,7,9,11,12,14,16,17,19 gated delta-rule (fla.ops.quasar)

Conversion & verification

Converted with mlx_lm.convert -q --q-bits 4 --q-group-size 64. The MLX port's GLA and Raven recurrences were validated against the reference PyTorch fla naive ops (to 1e-6 / 1e-7); all 580 checkpoint tensors map exactly; the 4-bit model generates coherent text (above).

Credits & license

Downloads last month
79
Safetensors
Model size
17B params
Tensor type
BF16
ยท
U32
ยท
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sahilchachra/Quasar-Preview-mlx-4bit

Quantized
(1)
this model