|
|
--- |
|
|
language: en |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- sentence-transformers |
|
|
- sentence-similarity |
|
|
- embedding |
|
|
- knowledge-distillation |
|
|
datasets: |
|
|
- sentence-transformers/all-nli |
|
|
metrics: |
|
|
- cosine_similarity |
|
|
pipeline_tag: sentence-similarity |
|
|
--- |
|
|
|
|
|
# PawanEmbd-68M |
|
|
|
|
|
A 68M parameter embedding model distilled from Granite-278M |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Model Type**: Sentence Embedding Model |
|
|
- **Architecture**: Transformer-based encoder with projection layer |
|
|
- **Parameters**: ~68 million |
|
|
- **Teacher Model**: IBM Granite-278M Multilingual Embedding |
|
|
- **Training Method**: Knowledge Distillation |
|
|
- **Output Dimensions**: 768 |
|
|
- **Max Sequence Length**: 512 tokens |
|
|
|
|
|
## Training Details |
|
|
|
|
|
This model was trained using knowledge distillation from the [IBM Granite-278M](https://huggingface.co/ibm-granite/granite-embedding-278m-multilingual) teacher model on the All-NLI dataset (SNLI + MultiNLI). |
|
|
|
|
|
### Training Hyperparameters |
|
|
|
|
|
- **Dataset**: sentence-transformers/all-nli (100K samples) |
|
|
- **Epochs**: 20 |
|
|
- **Batch Size**: 32 |
|
|
- **Learning Rate**: 5e-4 with OneCycleLR scheduler |
|
|
- **Loss Function**: Combined MSE + Cosine Similarity (α=0.5, β=0.5) |
|
|
- **Mixed Precision**: FP16 (AMP) |
|
|
- **Hardware**: NVIDIA T4 GPU |
|
|
|
|
|
|
|
|
## Usage |
|
|
|
|
|
### Using Transformers |
|
|
|
|
|
```Python |
|
|
from transformers import AutoModel, AutoTokenizer |
|
|
import torch |
|
|
import torch.nn.functional as F |
|
|
|
|
|
# Load model and tokenizer |
|
|
model = AutoModel.from_pretrained("dmedhi/PawanEmbd-68M") |
|
|
tokenizer = AutoTokenizer.from_pretrained("dmedhi/PawanEmbd-68M") |
|
|
|
|
|
# Encode sentences |
|
|
sentences = ["This is an example sentence", "Each sentence is converted to a vector"] |
|
|
encoded = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt') |
|
|
|
|
|
# Get embeddings |
|
|
with torch.no_grad(): |
|
|
outputs = model(**encoded) |
|
|
embeddings = outputs.pooler_output # Already normalized |
|
|
|
|
|
# Compute similarity |
|
|
similarity = F.cosine_similarity(embeddings[0:1], embeddings[1:2]) |
|
|
print(f"Similarity: {similarity.item():.4f}") |
|
|
``` |
|
|
|
|
|
|
|
|
### Using Sentence-Transformers |
|
|
|
|
|
```Python |
|
|
from sentence_transformers import SentenceTransformer |
|
|
from sentence_transformers.util import cos_sim |
|
|
|
|
|
# Load your model (should work now!) |
|
|
model = SentenceTransformer("dmedhi/PawanEmbd-68M") |
|
|
|
|
|
# Test encoding |
|
|
sentences = ["This is an example sentence", "Each sentence is converted to a vector"] |
|
|
embeddings = model.encode(sentences) |
|
|
|
|
|
print(f"✅ Embeddings shape: {embeddings.shape}") |
|
|
|
|
|
# Compute similarity |
|
|
similarity = cos_sim(embeddings[0], embeddings[1]) |
|
|
print(f"✅ Similarity: {similarity.item():.4f}") |
|
|
``` |
|
|
|
|
|
## Performance |
|
|
|
|
|
### Comparison with Teacher Model |
|
|
|
|
|
| Metric | Teacher (Granite-278M) | Student (PawanEmbd-68M) | |
|
|
|--------|----------------------|----------------------| |
|
|
| Parameters | 278M | 68M (4.1x smaller) | |
|
|
| Model Size | ~1.1 GB | ~258.7 MB | |
|
|
| Inference Speed (CPU) | 269.57 ms | 11.57 (23.3x faster) | |
|
|
| Inference Speed (GPU) | 17.94.57 ms | 2.75 (6.5x faster) | |
|
|
| Cosine Similarity | 1.000 | 0.943 | |
|
|
|
|
|
|
|
|
## Intended Uses |
|
|
|
|
|
This model is suitable for: |
|
|
|
|
|
✅ **Semantic Search**: Find similar documents or passages \ |
|
|
✅ **Clustering**: Group similar texts together \ |
|
|
✅ **Duplicate Detection**: Identify near-duplicate content \ |
|
|
✅ **Recommendation Systems**: Find similar items \ |
|
|
✅ **Question Answering**: Retrieve relevant passages \ |
|
|
✅ **Sentence Similarity**: Measure semantic similarity between texts |
|
|
|
|
|
|
|
|
## Training Code |
|
|
|
|
|
The model was trained using PyTorch with knowledge distillation. Training code available at: TODO |
|
|
|
|
|
## Citation |
|
|
|
|
|
``` |
|
|
@misc{pawanembdmodel2025, |
|
|
author = {Dipankar Medhi}, |
|
|
title = {PawanEmbd: A Lightweight Embedding Model via Knowledge Distillation}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face}, |
|
|
howpublished = { \url{https://huggingface.co/dmedhi/PawanEmbd-68M} } |
|
|
} |
|
|
``` |
|
|
|
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- Teacher model: [IBM Granite-278M](https://huggingface.co/ibm-granite/granite-embedding-278m-multilingual) |
|
|
- Training data: [Sentence-Transformers All-NLI](https://huggingface.co/datasets/sentence-transformers/all-nli) |
|
|
- Framework: Hugging Face Transformers & PyTorch |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 |
|
|
|
|
|
## Contact |
|
|
|
|
|
For questions or feedback, please open an issue on Github. |
|
|
|