Update README.md
Browse files
README.md
CHANGED
|
@@ -21,7 +21,7 @@ A 68M parameter embedding model distilled from Granite-278M
|
|
| 21 |
|
| 22 |
- **Model Type**: Sentence Embedding Model
|
| 23 |
- **Architecture**: Transformer-based encoder with projection layer
|
| 24 |
-
- **Parameters**: ~
|
| 25 |
- **Teacher Model**: IBM Granite-278M Multilingual Embedding
|
| 26 |
- **Training Method**: Knowledge Distillation
|
| 27 |
- **Output Dimensions**: 768
|
|
@@ -46,7 +46,7 @@ This model was trained using knowledge distillation from the [IBM Granite-278M](
|
|
| 46 |
|
| 47 |
### Using Transformers
|
| 48 |
|
| 49 |
-
```
|
| 50 |
from transformers import AutoModel, AutoTokenizer
|
| 51 |
import torch
|
| 52 |
import torch.nn.functional as F
|
|
@@ -72,7 +72,7 @@ print(f"Similarity: {similarity.item():.4f}")
|
|
| 72 |
|
| 73 |
### Using Sentence-Transformers
|
| 74 |
|
| 75 |
-
```
|
| 76 |
from sentence_transformers import SentenceTransformer
|
| 77 |
from sentence_transformers.util import cos_sim
|
| 78 |
|
|
@@ -90,24 +90,6 @@ similarity = cos_sim(embeddings[0], embeddings[1])
|
|
| 90 |
print(f"✅ Similarity: {similarity.item():.4f}")
|
| 91 |
```
|
| 92 |
|
| 93 |
-
======================================================================
|
| 94 |
-
COMPARING INFERENCE SPEED (Student vs Teacher)
|
| 95 |
-
======================================================================
|
| 96 |
-
Average inference time over 100 runs with 10 sentences (max_length=128):
|
| 97 |
-
Teacher Model: 17.944 ms
|
| 98 |
-
Student Model: 2.759 ms
|
| 99 |
-
Student is 6.5x faster than Teacher.
|
| 100 |
-
|
| 101 |
-
CPU speed comparision
|
| 102 |
-
|
| 103 |
-
======================================================================
|
| 104 |
-
COMPARING INFERENCE SPEED (Student vs Teacher)
|
| 105 |
-
======================================================================
|
| 106 |
-
Average inference time over 100 runs with 10 sentences (max_length=128):
|
| 107 |
-
Teacher Model: 269.578 ms
|
| 108 |
-
Student Model: 11.577 ms
|
| 109 |
-
Student is 23.3x faster than Teacher.
|
| 110 |
-
|
| 111 |
## Performance
|
| 112 |
|
| 113 |
### Comparison with Teacher Model
|
|
@@ -145,7 +127,7 @@ The model was trained using PyTorch with knowledge distillation. Training code a
|
|
| 145 |
title = {PawanEmbd: A Lightweight Embedding Model via Knowledge Distillation},
|
| 146 |
year = {2025},
|
| 147 |
publisher = {Hugging Face},
|
| 148 |
-
howpublished = { \url{https://huggingface.co/dmedhi/
|
| 149 |
}
|
| 150 |
```
|
| 151 |
|
|
|
|
| 21 |
|
| 22 |
- **Model Type**: Sentence Embedding Model
|
| 23 |
- **Architecture**: Transformer-based encoder with projection layer
|
| 24 |
+
- **Parameters**: ~68 million
|
| 25 |
- **Teacher Model**: IBM Granite-278M Multilingual Embedding
|
| 26 |
- **Training Method**: Knowledge Distillation
|
| 27 |
- **Output Dimensions**: 768
|
|
|
|
| 46 |
|
| 47 |
### Using Transformers
|
| 48 |
|
| 49 |
+
```Python
|
| 50 |
from transformers import AutoModel, AutoTokenizer
|
| 51 |
import torch
|
| 52 |
import torch.nn.functional as F
|
|
|
|
| 72 |
|
| 73 |
### Using Sentence-Transformers
|
| 74 |
|
| 75 |
+
```Python
|
| 76 |
from sentence_transformers import SentenceTransformer
|
| 77 |
from sentence_transformers.util import cos_sim
|
| 78 |
|
|
|
|
| 90 |
print(f"✅ Similarity: {similarity.item():.4f}")
|
| 91 |
```
|
| 92 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 93 |
## Performance
|
| 94 |
|
| 95 |
### Comparison with Teacher Model
|
|
|
|
| 127 |
title = {PawanEmbd: A Lightweight Embedding Model via Knowledge Distillation},
|
| 128 |
year = {2025},
|
| 129 |
publisher = {Hugging Face},
|
| 130 |
+
howpublished = { \url{https://huggingface.co/dmedhi/PawanEmbd-68M} }
|
| 131 |
}
|
| 132 |
```
|
| 133 |
|