GeistBERT
/

GeistBERT_base

@@ -7,12 +7,7 @@ base_model:
 ---
 # GeistBERT
-GeistBERT is a **German language model** trained on a **for the most part deduplicated corpus** including **OSCAR23, OPUS, and MC4**. It builds on **GottBERT** while introducing **Whole Word Masking (WWM)** to improve contextual language representation. The model achieves **state-of-the-art (SOTA) performance** on multiple German NLP benchmarks.
-GeistBERT comes in **three versions**:
-- GeistBERT (Standard, this repo)
-- [GeistBERT-Nyströmformer](https://huggingface.co/GeistBERT/GeistBERT_base_nystromformer) (Efficient self-attention)
-- [GeistBERT-Longformer](https://huggingface.co/GeistBERT/GeistBERT_base_longformer) (Extended context length)
 ## Training Data
 GeistBERT was trained on a **diverse German corpus** combining:
@@ -27,7 +22,6 @@ The dataset amounts to **approximately 1.3T tokens**, shuffled for improved vari
 ## Training Procedure
 ### Hardware
 - Training was conducted on **multiple GPUs**, including **NVIDIA RTX 3090 (24GB VRAM)**.
-- **Gradient accumulation** was used for **Longformer**, requiring **more VRAM** compared to Nyströmformer and RoBERTa, which fit on a single RTX 3090.
 ### Hyperparameters
 | Parameter          | Value                  |
@@ -56,9 +50,7 @@ Details:
 | Model                               | Accuracy NLI | GermEval\_14 F1 | CoNLL F1 | Coarse F1 | Fine F1 | 10kGNAD F1 |
 |-------------------------------------|--------------|----------------|----------|-----------|---------|------------|
-| [GeistBERT](https://huggingface.co/GeistBERT/GeistBERT_base)                | **82.67**   | **88.47**   | _86.17_  | _79.67_  | 66.42   | **90.89**  |
-| [GeistBERT-Nyströmformer](https://huggingface.co/GeistBERT/GeistBERT_base_nystromformer) | 82.50       | 88.23          | 85.76    | 79.17     | **78.57** | 90.33      |
-| [GeistBERT-Longformer](https://huggingface.co/GeistBERT/GeistBERT_base_longformer) | _82.51_     | _88.45_        | **86.71** | **80.56** | _66.76_ | 90.32      |
 | [GottBERT_base_best](https://huggingface.co/TUM/GottBERT_base_best)                | 80.82       | 87.55          | 85.93  | 78.17     | 53.30   | 89.64      |
 | [GottBERT_base_last](https://huggingface.co/TUM/GottBERT_base_last)                | 81.04       | 87.48          | 85.61    | 78.18   | 53.92 | 90.27  |
 | [GottBERT_filtered_base_best](https://huggingface.co/TUM/GottBERT_filtered_base_best)         | 80.56       | 87.57 | 86.14 | 78.65 | 52.82   | 89.79      |
@@ -86,7 +78,6 @@ This model is designed for **German NLP tasks**, including:
 ## Limitations
 - Trained on **unfiltered data**, meaning some **redundant or lower-quality samples** may be present.
-- Longformer **requires more VRAM**, making it less accessible for smaller GPU setups.
 - While deduplication was applied to **specific subcorpora**, the full corpus **was not manually curated**.
 ## Fairseq Checkpoints

 ---
 # GeistBERT
+GeistBERT is a **German language model** trained on a **for the most part deduplicated corpus** including **OSCAR23, OPUS, and MC4**. It builds on **GottBERT** while introducing **Whole Word Masking (WWM)** to improve contextual language representation. Achieving state-of-the-art among base models, the model also performs competitively with larger ones on several German NLP benchmarks.
 ## Training Data
 GeistBERT was trained on a **diverse German corpus** combining:
 ## Training Procedure
 ### Hardware
 - Training was conducted on **multiple GPUs**, including **NVIDIA RTX 3090 (24GB VRAM)**.
 ### Hyperparameters
 | Parameter          | Value                  |
 | Model                               | Accuracy NLI | GermEval\_14 F1 | CoNLL F1 | Coarse F1 | Fine F1 | 10kGNAD F1 |
 |-------------------------------------|--------------|----------------|----------|-----------|---------|------------|
+| [GeistBERT](https://huggingface.co/GeistBERT/GeistBERT_base)                | **82.67**   | **88.47**   | **86.17**  | **79.67**  | **66.42**   | **90.89**  |
 | [GottBERT_base_best](https://huggingface.co/TUM/GottBERT_base_best)                | 80.82       | 87.55          | 85.93  | 78.17     | 53.30   | 89.64      |
 | [GottBERT_base_last](https://huggingface.co/TUM/GottBERT_base_last)                | 81.04       | 87.48          | 85.61    | 78.18   | 53.92 | 90.27  |
 | [GottBERT_filtered_base_best](https://huggingface.co/TUM/GottBERT_filtered_base_best)         | 80.56       | 87.57 | 86.14 | 78.65 | 52.82   | 89.79      |
 ## Limitations
 - Trained on **unfiltered data**, meaning some **redundant or lower-quality samples** may be present.
 - While deduplication was applied to **specific subcorpora**, the full corpus **was not manually curated**.
 ## Fairseq Checkpoints