Update README.md
Browse files
README.md
CHANGED
|
@@ -125,14 +125,14 @@ print(scores.tolist())
|
|
| 125 |
- **Warmup Steps**: 200
|
| 126 |
- **Max Sequence Length**: 512 tokens
|
| 127 |
- **FP16 Training**: Enabled
|
| 128 |
-
- **Gradient Clipping**:
|
| 129 |
|
| 130 |
### Optimization Configuration
|
| 131 |
- **Framework**: DeepSpeed ZeRO Stage 3
|
| 132 |
- **Optimizer**: AdamW with auto learning rate and weight decay
|
| 133 |
- **Mixed Precision**: bfloat16 (`bf16`) enabled
|
| 134 |
- **ZeRO Optimization**: Stage 3 with:
|
| 135 |
-
- No parameter offloading
|
| 136 |
- Overlap communication enabled
|
| 137 |
- Contiguous gradients enabled
|
| 138 |
- Auto-tuned reduce and prefetch bucket sizes
|
|
@@ -149,7 +149,7 @@ print(scores.tolist())
|
|
| 149 |
|
| 150 |
## Model Details
|
| 151 |
- **Model Name**: ArmEmbed
|
| 152 |
-
- **Model Type**: Text Embeddings for Armenian Language
|
| 153 |
- **Version**: 1.0.0
|
| 154 |
- **License**: Apache 2.0
|
| 155 |
- **Last Updated**: May 2025
|
|
|
|
| 125 |
- **Warmup Steps**: 200
|
| 126 |
- **Max Sequence Length**: 512 tokens
|
| 127 |
- **FP16 Training**: Enabled
|
| 128 |
+
- **Gradient Clipping**: 1.0
|
| 129 |
|
| 130 |
### Optimization Configuration
|
| 131 |
- **Framework**: DeepSpeed ZeRO Stage 3
|
| 132 |
- **Optimizer**: AdamW with auto learning rate and weight decay
|
| 133 |
- **Mixed Precision**: bfloat16 (`bf16`) enabled
|
| 134 |
- **ZeRO Optimization**: Stage 3 with:
|
| 135 |
+
- No parameter offloading
|
| 136 |
- Overlap communication enabled
|
| 137 |
- Contiguous gradients enabled
|
| 138 |
- Auto-tuned reduce and prefetch bucket sizes
|
|
|
|
| 149 |
|
| 150 |
## Model Details
|
| 151 |
- **Model Name**: ArmEmbed
|
| 152 |
+
- **Model Type**: Text Embeddings for the Armenian Language
|
| 153 |
- **Version**: 1.0.0
|
| 154 |
- **License**: Apache 2.0
|
| 155 |
- **Last Updated**: May 2025
|