shaipro
/

avi-m3

diaslmb commited on Aug 26

Commit

449adaa

verified ·

1 Parent(s): 4ddc8b1

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -49,6 +49,13 @@ We evaluated both the **base BGE-M3** and the **fine-tuned AVI-M3** on a held-ou
 ---
 ## 📊 Metrics Explained
 - **Accuracy:** proportion of queries where the top-1 retrieved document is correct.
@@ -57,22 +64,30 @@ We evaluated both the **base BGE-M3** and the **fine-tuned AVI-M3** on a held-ou
 ---
-## 🛠️ Training
-- **Training dataset:** Custom domain-specific dataset (1088 train, 273 eval)
-- **Evaluation dataset:** 273 examples (~20 % held-out split)
-- **Hardware:** 1× NVIDIA A40 48 GB
-- **Batch size:** (depends on your config)
-- **Optimizer:** AdamW
-- **Loss:** Contrastive / InfoNCE
-- **Framework:** FlagEmbedding
 ---
-## 💡 Usage Example (Python)
-```python
-from FlagEmbedding import FlagModel
-model = FlagModel("shaipro/avi-m3", query_max_len=128, doc_max_len=512, use_fp16=True)
-embeddings = model.encode(["your query"], normalize_embeddings=True)

 ---
+### 📚 Dataset
+- **Training set:** 1088 examples
+- **Evaluation set:** 273 examples (~20% held-out split)
+- **Task:** Query → Positive passage retrieval
+---
 ## 📊 Metrics Explained
 - **Accuracy:** proportion of queries where the top-1 retrieved document is correct.
 ---
+### 💻 Hardware
+- **GPU:** 1× NVIDIA A40 (48 GB VRAM)
+- **Precision:** FP16 with gradient checkpointing
+- **Effective batch size:** 32 (8 × grad accumulation 4)
 ---
+## 🛠️ Training
+- **Evaluation dataset:** 273 examples (~20 % held-out split)
+- **Epochs:** 10
+- **Learning rate:** 2e-5
+- **Per-device batch size:** 8
+- **Gradient accumulation:** 4
+- **Pooling method:** `cls`
+- **Temperature:** 0.02
+- **Loss:** `m3_kd_loss` (knowledge distillation + contrastive)
+- **Knowledge distillation:** Enabled
+- **Self-distillation:** Enabled
+- **Unified fine-tuning:** Enabled
+- **Encoder freezing:** Disabled
+- **Optimizer:** AdamW
+- **Scheduler:** Linear with 10% warmup
+---