Raphael Scheible commited on
Commit
bfd02fd
·
verified ·
1 Parent(s): 3a8c121

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -6
README.md CHANGED
@@ -30,12 +30,15 @@ The dataset amounts to **approximately 1.3T tokens**, shuffled for improved vari
30
  - **Gradient accumulation** was used for **Longformer**, requiring **more VRAM** compared to Nyströmformer and RoBERTa, which fit on a single RTX 3090.
31
 
32
  ### Hyperparameters
33
- - Training steps: **100k**
34
- - Learning rate: **2e-4**
35
- - Warmup steps: **10k**
36
- - Batch sizes: **48 / 64 (using gradient accumulation for Longformer)**
37
- - Optimizer: **AdamW**
38
- - Weight Initialization: **GottBERT**
 
 
 
39
 
40
  ## Performance
41
  GeistBERT achieves **SOTA results** on multiple tasks:
 
30
  - **Gradient accumulation** was used for **Longformer**, requiring **more VRAM** compared to Nyströmformer and RoBERTa, which fit on a single RTX 3090.
31
 
32
  ### Hyperparameters
33
+ | Parameter | Value |
34
+ |--------------------|------------------------|
35
+ | **Model Architecture** | RoBERTa (Base) |
36
+ | **Batch Size** | 8,000 |
37
+ | **Training Steps** | 100k |
38
+ | **Weight Initialization** | [GottBERT filtered base](https://huggingface.co/TUM/GottBERT_filtered_base_best) |
39
+ | **Warmup Iterations** | 10k |
40
+ | **Peak Learning Rate** | 0.0007 |
41
+ | **Learning Rate Decay** | Polynomial to zero |
42
 
43
  ## Performance
44
  GeistBERT achieves **SOTA results** on multiple tasks: