L6_compact_distilled

Ultra-compact multilingual sentence encoder (~71.2MB) for intent classification. 6 layers bottom + 20K vocab + distilled

Performance

Model Size MassiveIntent MassiveScenario Average
Teacher (12L, full) ~480MB 55.52% 61.01% 58.27%
L6_bottom (38K vocab) 98MB 54.70% 59.39% 57.05%
L6_compact_distilled 71.2MB 51.21% 58.33% 54.77%

Model Details

Property Value
Teacher paraphrase-multilingual-MiniLM-L12-v2
Vocab ~20,000 (frequency-based pruning, 97.4% coverage)
Size 71.2MB
Distilled Yes

Quick Start

Distillation Details

  • Loss: 0.3 * MSE + 2.0 * (1 - CosineSimilarity)
  • Epochs: 10, LR: 5e-6, Batch: 64
  • Cosine-dominant loss preserves existing representations
Downloads last month
18
Safetensors
Model size
18.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support