phi-3.5-mini-instruct-bf16 (MLX, CBA artifact)

MLX-format BF16 (uncompressed baseline) variant of microsoft/Phi-3.5-mini-instruct.

This is one of the 15 model artifacts from the paper:

Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels Plawan Kumar Rath, Rahul Maliakkal. IEEE Cloud Summit 2026. Code: https://github.com/plawanrath/compression-bias-amplification

Quantization

This is the BF16 baseline used as the uncompressed reference in the paper. Weights have been re-serialized via mlx_lm.convert (no quantization) so this directory is loadable directly by MLX without an extra conversion step.

How this artifact was produced

python -m mlx_lm.convert \
    --hf-path microsoft/Phi-3.5-mini-instruct \
    --mlx-path ./phi-3.5-mini-instruct-bf16 \

This is the exact artifact used to produce the inference results in ยง4.3 of the paper (911,100 records over BBQ ambiguous, 5 seeds ร— 12,148 items ร— 15 configs).

Usage (MLX)

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("plawanrath/phi-3.5-mini-instruct-bf16-mlx-cba")
prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Hello!"}],
    add_generation_prompt=True,
    tokenize=False,
)
print(generate(model, tokenizer, prompt=prompt, max_tokens=128))

Or via CLI:

mlx_lm.generate --model plawanrath/phi-3.5-mini-instruct-bf16-mlx-cba --prompt "Hello!"

Paper findings relevant to this variant

The paper documents a dose-response relationship between quantization aggressiveness and emergent stereotypical behavior on BBQ ambiguous questions:

Variant % of BF16-unbiased items that became biased
Q8 0.1โ€“0.9%
Q6 0.3โ€“1.3%
Q4 2.2โ€“5.6%
Q3 6.0โ€“21.1%

These changes are largely invisible to perplexity (<0.5% shift at Q8, <3% at Q4 across all three families). Treat any deployment of compressed instruction-tuned models on fairness-sensitive tasks accordingly.

Model details

License

Inherited from the base model (mit). See the upstream model page for the full license text.

Citation

@inproceedings{rath2026quantization,
  title     = { Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels },
  author    = {Rath, Plawan Kumar and Maliakkal, Rahul},
  booktitle = { IEEE Cloud Summit 2026 },
  year      = {2026}
}
Downloads last month
50
Safetensors
Model size
4B params
Tensor type
BF16
ยท
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for plawanrath/phi-3.5-mini-instruct-bf16-mlx-cba

Finetuned
(274)
this model

Collection including plawanrath/phi-3.5-mini-instruct-bf16-mlx-cba