Update README.md
Browse files
README.md
CHANGED
|
@@ -63,7 +63,7 @@ Distillation process lasts for 120 hours on 4 Nvidia V100.
|
|
| 63 |
|
| 64 |
See all logs at [WandB](https://wandb.ai/d0rj/distill-ruroberta/runs/lehtr3bk/workspace).
|
| 65 |
|
| 66 |
-
## Configuration
|
| 67 |
|
| 68 |
- Activation GELU -> GELUFast
|
| 69 |
- Attention heads 16 -> 8
|
|
|
|
| 63 |
|
| 64 |
See all logs at [WandB](https://wandb.ai/d0rj/distill-ruroberta/runs/lehtr3bk/workspace).
|
| 65 |
|
| 66 |
+
## Configuration changes
|
| 67 |
|
| 68 |
- Activation GELU -> GELUFast
|
| 69 |
- Attention heads 16 -> 8
|