L4_uniform

Lightweight sentence encoder created from sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 via layer pruning + vocabulary pruning.

Model Details

Property Value
Teacher sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
Architecture MiniLM-L12 (pruned)
Hidden dim 384
Layers 4 / 12
Layer indices [0, 4, 7, 11]
Strategy 4 layers, evenly spaced (compact)
Parameters 103,283,328
Model size (FP32) 84.6MB
Distilled No

Architecture

==============================================================
  TEACHER: MiniLM-L12  β†’  STUDENT: 4L / 38,755 vocab
==============================================================

            TEACHER                        STUDENT          
  ───────────────────────────    ───────────────────────────

  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚   Input Tokens          β”‚    β”‚   Input Tokens          β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚                              β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  Embeddings             β”‚    β”‚  Embeddings (pruned)    β”‚
  β”‚  vocab: 250,002         β”‚    β”‚  vocab:  38,755         β”‚
  β”‚  dim:  384              β”‚    β”‚  dim:  384              β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚                              β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  Layer  0               β”‚ ──►  β”‚  Layer  0 ← L0         β”‚
  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
  β”‚  Layer  1               β”‚  β•³   β”‚                         β”‚
  β”œ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ──    β”‚                         β”‚
  β”‚  Layer  2               β”‚  β•³   β”‚                         β”‚
  β”œ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ──    β”‚                         β”‚
  β”‚  Layer  3               β”‚  β•³   β”‚                         β”‚
  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
  β”‚  Layer  4               β”‚ ──►  β”‚  Layer  1 ← L4         β”‚
  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
  β”‚  Layer  5               β”‚  β•³   β”‚                         β”‚
  β”œ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ──    β”‚                         β”‚
  β”‚  Layer  6               β”‚  β•³   β”‚                         β”‚
  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
  β”‚  Layer  7               β”‚ ──►  β”‚  Layer  2 ← L7         β”‚
  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
  β”‚  Layer  8               β”‚  β•³   β”‚                         β”‚
  β”œ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ──    β”‚                         β”‚
  β”‚  Layer  9               β”‚  β•³   β”‚                         β”‚
  β”œ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ──    β”‚                         β”‚
  β”‚  Layer 10               β”‚  β•³   β”‚                         β”‚
  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
  β”‚  Layer 11               β”‚ ──►  β”‚  Layer  3 ← L11        β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚                              β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  Mean Pooling           β”‚    β”‚  Mean Pooling           β”‚
  β”‚  β†’ 384d embedding       β”‚    β”‚  β†’ 384d embedding       β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

  Size: 448.0MB (FP32)           β†’  84.6MB (FP32)
  Params: 117,451,392        β†’  22,164,480
  Reduction: 81.1%
==============================================================

Quick Start

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("L4_uniform", trust_remote_code=True)

sentences = [
    "Hello, how are you?",
    "μ•ˆλ…•ν•˜μ„Έμš”",
    "Bonjour, comment allez-vous?",
]

embeddings = model.encode(sentences)
print(embeddings.shape)  # (3, 384)

MTEB Evaluation Results

Overall Average: 49.02%

Task Group Average
Classification 56.87%
Clustering 32.04%
STS 57.15%

Classification

Task Average Details
AmazonCounterfactualClassification 67.02% en: 70.31%, en-ext: 68.1%, de: 65.73%
Banking77Classification 69.18% default: 69.18%
ImdbClassification 59.38% default: 59.38%
MTOPDomainClassification 71.48% en: 80.02%, es: 73.78%, hi: 71.07%
MassiveIntentClassification 36.9% en: 58.41%, zh-CN: 58.07%, ja: 56.73%
MassiveScenarioClassification 39.51% zh-CN: 63.96%, en: 62.71%, ja: 59.84%
ToxicConversationsClassification 62.02% default: 62.02%
TweetSentimentExtractionClassification 49.43% default: 49.43%

Clustering

Task Average Details
ArXivHierarchicalClusteringP2P 49.93% default: 49.93%
ArXivHierarchicalClusteringS2S 46.08% default: 46.08%
BiorxivClusteringP2P.v2 21.47% default: 21.47%
MedrxivClusteringP2P.v2 26.05% default: 26.05%
MedrxivClusteringS2S.v2 22.94% default: 22.94%
StackExchangeClustering.v2 41.23% default: 41.23%
StackExchangeClusteringP2P.v2 32.19% default: 32.19%
TwentyNewsgroupsClustering.v2 16.43% default: 16.43%

STS

Task Average Details
BIOSSES 45.64% default: 45.64%
SICK-R 62.01% default: 62.01%
STS12 57.85% default: 57.85%
STS13 65.48% default: 65.48%
STS14 60.39% default: 60.39%
STS15 73.93% default: 73.93%
STS17 46.29% en-en: 76.54%, es-es: 75.88%, ko-ko: 62.72%
STS22.v2 37.34% zh: 57.86%, es: 54.85%, fr: 51.41%
STSBenchmark 65.38% default: 65.38%

Training

Created via layer pruning + vocabulary pruning (no additional training):

  1. Teacher: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 (12 layers, 384d)
  2. Layer selection: [0, 4, 7, 11] - 4 layers, evenly spaced (compact)
  3. Vocab pruning: Corpus-based filtering for target languages

Supported Languages (18)

ko, en, ja, zh, es, fr, de, pt, it, ru, ar, hi, th, vi, id, tr, nl, pl

Downloads last month
22
Safetensors
Model size
22.2M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support