PersonaPlex 7B — NF4 Distilled

Knowledge-distilled NF4 weights achieving 90% token match vs the bf16 teacher (up from 75% raw NF4).

Training

5 epochs, 3000 samples, 73 min on A100
KL divergence + hard label cross-entropy loss
Loss: 0.58 → 0.07 (88% reduction)

Eval

Model	Token Match	Output
bf16 teacher	100%	Reference
NF4 raw	75%	Coherent but divergent
NF4 distilled	90%	Close match to teacher

Files

student_best.pt — Distilled weights (16 GB, bf16 state dict)
training_log.json — Training metrics
distill_v2.py — Training script

Usage

Load as a standard PersonaPlex model:

import torch
from moshi.models.loaders import LMModel, _lm_kwargs

lm_kwargs = dict(_lm_kwargs); lm_kwargs["dep_q"] = 16
model = LMModel(device="cpu", dtype=torch.bfloat16, **lm_kwargs)
state = torch.load("student_best.pt", map_location="cpu", weights_only=True)
state = {k: v.to(torch.bfloat16) if v.is_floating_point() else v for k, v in state.items()}
model.load_state_dict(state, strict=False)
model.to("cuda").eval()

License

NVIDIA Open Model License.

Built by open-agents-ai.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cudabenchmarktest/personaplex-7b-nf4-distilled

Base model

kyutai/moshiko-pytorch-bf16

Finetuned

nvidia/personaplex-7b-v1

Finetuned

(36)

this model