dmedhi
/

PawanEmbd-68M

Sentence Similarity

sentence-transformers

knowledge-distillation

Model card Files Files and versions

dmedhi commited on Dec 8, 2025

Commit

0244931

·

verified ·

1 Parent(s): 988445b

Update README.md

Files changed (1) hide show

README.md +4 -22

README.md CHANGED Viewed

@@ -21,7 +21,7 @@ A 68M parameter embedding model distilled from Granite-278M
 - **Model Type**: Sentence Embedding Model
 - **Architecture**: Transformer-based encoder with projection layer
-- **Parameters**: ~5 million
 - **Teacher Model**: IBM Granite-278M Multilingual Embedding
 - **Training Method**: Knowledge Distillation
 - **Output Dimensions**: 768
@@ -46,7 +46,7 @@ This model was trained using knowledge distillation from the [IBM Granite-278M](
 ### Using Transformers
-```
 from transformers import AutoModel, AutoTokenizer
 import torch
 import torch.nn.functional as F
@@ -72,7 +72,7 @@ print(f"Similarity: {similarity.item():.4f}")
 ### Using Sentence-Transformers
-```
 from sentence_transformers import SentenceTransformer
 from sentence_transformers.util import cos_sim
@@ -90,24 +90,6 @@ similarity = cos_sim(embeddings[0], embeddings[1])
 print(f"✅ Similarity: {similarity.item():.4f}")
 ```
-======================================================================
-COMPARING INFERENCE SPEED (Student vs Teacher)
-======================================================================
-Average inference time over 100 runs with 10 sentences (max_length=128):
-  Teacher Model: 17.944 ms
-  Student Model: 2.759 ms
-  Student is 6.5x faster than Teacher.
-  CPU speed comparision
-======================================================================
-COMPARING INFERENCE SPEED (Student vs Teacher)
-======================================================================
-Average inference time over 100 runs with 10 sentences (max_length=128):
-  Teacher Model: 269.578 ms
-  Student Model: 11.577 ms
-  Student is 23.3x faster than Teacher.
 ## Performance
 ### Comparison with Teacher Model
@@ -145,7 +127,7 @@ The model was trained using PyTorch with knowledge distillation. Training code a
   title = {PawanEmbd: A Lightweight Embedding Model via Knowledge Distillation},
   year = {2025},
   publisher = {Hugging Face},
-  howpublished = { \url{https://huggingface.co/dmedhi/pawanembd-68m} }
 }
 ```

 - **Model Type**: Sentence Embedding Model
 - **Architecture**: Transformer-based encoder with projection layer
+- **Parameters**: ~68 million
 - **Teacher Model**: IBM Granite-278M Multilingual Embedding
 - **Training Method**: Knowledge Distillation
 - **Output Dimensions**: 768
 ### Using Transformers
+```Python
 from transformers import AutoModel, AutoTokenizer
 import torch
 import torch.nn.functional as F
 ### Using Sentence-Transformers
+```Python
 from sentence_transformers import SentenceTransformer
 from sentence_transformers.util import cos_sim
 print(f"✅ Similarity: {similarity.item():.4f}")
 ```
 ## Performance
 ### Comparison with Teacher Model
   title = {PawanEmbd: A Lightweight Embedding Model via Knowledge Distillation},
   year = {2025},
   publisher = {Hugging Face},
+  howpublished = { \url{https://huggingface.co/dmedhi/PawanEmbd-68M} }
 }
 ```