novelcore
/

gem-longformer

Model card Files Files and versions

alexaapo commited on Jul 23, 2025

Commit

cdf101f

·

verified ·

1 Parent(s): b5b7353

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -127,9 +127,12 @@ The key hyperparameters used were:
 ### Training Results
-The model achieved stable convergence with the following characteristics:
 - **Training Infrastructure**: 8x NVIDIA A100 40GB GPUs
 - **Total Training Steps**: 120,000
 - **Distributed Training**: NCCL backend with enhanced stability settings
 - **Memory Optimization**: BFloat16 precision with gradient accumulation

 ### Training Results
+The model achieved the following performance metrics:
+- **Final Training Loss**: 1.2823
+- **Final Evaluation Loss**: 1.1720
 - **Training Infrastructure**: 8x NVIDIA A100 40GB GPUs
+- **Training Duration**: 262:24:39 hours
 - **Total Training Steps**: 120,000
 - **Distributed Training**: NCCL backend with enhanced stability settings
 - **Memory Optimization**: BFloat16 precision with gradient accumulation