novelcore
/

gem-modernbert

data-repetition

flash-attention

Model card Files Files and versions

alexaapo commited on Oct 21, 2025

Commit

515d4d3

·

verified ·

1 Parent(s): 85f9445

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -17,11 +17,11 @@ base_model:
 - answerdotai/ModernBERT-base
 ---
-# GEM-ModernBERT Legal: A Greek Legal Language Model with Advanced Optimization
 ## Model Description
-**GEM-ModernBERT Legal** is a ModernBERT-base model pre-trained from scratch on a strategically curated 21GB corpus of Greek legal, parliamentary, and governmental text. This model leverages ModernBERT's cutting-edge architectural innovations including **Flash Attention 2**, **StableAdamW optimizer**, **1024-token context length**, and **advanced memory optimization** techniques to deliver superior performance on Greek legal document understanding tasks.
 Building upon our proven **quality-based data repetition strategy**, this model incorporates ModernBERT's state-of-the-art training methodology with **30% masking probability**, **trapezoidal learning rate scheduling**, and **optimized batch sizing** for enhanced convergence and performance. The model is specifically designed to handle longer legal documents with its extended 1024-token context window while maintaining computational efficiency through advanced optimization techniques.

 - answerdotai/ModernBERT-base
 ---
+# GEM-ModernBERT HQ Legal: A Greek Legal Language Model with Advanced Optimization
 ## Model Description
+**GEM-ModernBERT HQ Legal** is a ModernBERT-base model pre-trained from scratch on a strategically curated 21GB corpus of Greek legal, parliamentary, and governmental text. This model leverages ModernBERT's cutting-edge architectural innovations including **Flash Attention 2**, **StableAdamW optimizer**, **1024-token context length**, and **advanced memory optimization** techniques to deliver superior performance on Greek legal document understanding tasks.
 Building upon our proven **quality-based data repetition strategy**, this model incorporates ModernBERT's state-of-the-art training methodology with **30% masking probability**, **trapezoidal learning rate scheduling**, and **optimized batch sizing** for enhanced convergence and performance. The model is specifically designed to handle longer legal documents with its extended 1024-token context window while maintaining computational efficiency through advanced optimization techniques.