PersonaPlex 7B β NF4 Distilled
Knowledge-distilled NF4 weights achieving 90% token match vs the bf16 teacher (up from 75% raw NF4).
Training
- 5 epochs, 3000 samples, 73 min on A100
- KL divergence + hard label cross-entropy loss
- Loss: 0.58 β 0.07 (88% reduction)
Eval
| Model | Token Match | Output |
|---|---|---|
| bf16 teacher | 100% | Reference |
| NF4 raw | 75% | Coherent but divergent |
| NF4 distilled | 90% | Close match to teacher |
Files
student_best.ptβ Distilled weights (16 GB, bf16 state dict)training_log.jsonβ Training metricsdistill_v2.pyβ Training script
Usage
Load as a standard PersonaPlex model:
import torch
from moshi.models.loaders import LMModel, _lm_kwargs
lm_kwargs = dict(_lm_kwargs); lm_kwargs["dep_q"] = 16
model = LMModel(device="cpu", dtype=torch.bfloat16, **lm_kwargs)
state = torch.load("student_best.pt", map_location="cpu", weights_only=True)
state = {k: v.to(torch.bfloat16) if v.is_floating_point() else v for k, v in state.items()}
model.load_state_dict(state, strict=False)
model.to("cuda").eval()
License
NVIDIA Open Model License.
Built by open-agents-ai.
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support