PersonaPlex 7B β€” NF4 Distilled

Knowledge-distilled NF4 weights achieving 90% token match vs the bf16 teacher (up from 75% raw NF4).

Training

  • 5 epochs, 3000 samples, 73 min on A100
  • KL divergence + hard label cross-entropy loss
  • Loss: 0.58 β†’ 0.07 (88% reduction)

Eval

Model Token Match Output
bf16 teacher 100% Reference
NF4 raw 75% Coherent but divergent
NF4 distilled 90% Close match to teacher

Files

  • student_best.pt β€” Distilled weights (16 GB, bf16 state dict)
  • training_log.json β€” Training metrics
  • distill_v2.py β€” Training script

Usage

Load as a standard PersonaPlex model:

import torch
from moshi.models.loaders import LMModel, _lm_kwargs

lm_kwargs = dict(_lm_kwargs); lm_kwargs["dep_q"] = 16
model = LMModel(device="cpu", dtype=torch.bfloat16, **lm_kwargs)
state = torch.load("student_best.pt", map_location="cpu", weights_only=True)
state = {k: v.to(torch.bfloat16) if v.is_floating_point() else v for k, v in state.items()}
model.load_state_dict(state, strict=False)
model.to("cuda").eval()

License

NVIDIA Open Model License.

Built by open-agents-ai.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for cudabenchmarktest/personaplex-7b-nf4-distilled

Finetuned
(36)
this model