przvl
/

PopEuroBERT-binary-210m

Text Classification

political-speech

Model card Files Files and versions

przvl commited on Mar 19, 2025

Commit

0125fef

·

verified ·

1 Parent(s): febf31d

Update README.md

Files changed (1) hide show

README.md +5 -4

README.md CHANGED Viewed

@@ -86,9 +86,10 @@ Predicted class: Populist (Confidence: 0.90)
 ## Training Data
-- **Dataset:** PopBERT (annotated German Bundestag speeches).
 - **Preprocessing:**
-  - Removed duplicates.
   - Converted labels to binary format (`populist = 1`, `neutral = 0`).
   - Tokenized using **EuroBERT tokenizer** with a max length of `256` tokens.
@@ -108,9 +109,9 @@ Predicted class: Populist (Confidence: 0.90)
 | Weight Decay          | `0.0`   |
 | Gradient Accumulation | `2`     |
 | Warmup Ratio          | `0.1`   |
-| Epochs                | `5`     |
 | Batch Size            | `16`    |
-| Max Length            | `512`   |
 - **Mixed Precision (fp16):** Used for efficiency on GPU.

 ## Training Data
+- **Dataset:** [PopBERT](https://github.com/luerhard/PopBERT)
+  - Sentence-level annotated German Bundestag speeches
+  - `train/test: 7017/1758`
 - **Preprocessing:**
   - Converted labels to binary format (`populist = 1`, `neutral = 0`).
   - Tokenized using **EuroBERT tokenizer** with a max length of `256` tokens.
 | Weight Decay          | `0.0`   |
 | Gradient Accumulation | `2`     |
 | Warmup Ratio          | `0.1`   |
+| Epochs                | `2`     |
 | Batch Size            | `16`    |
+| Max Length            | `256`   |
 - **Mixed Precision (fp16):** Used for efficiency on GPU.