Felldude commited on
Commit
ea1e135
·
verified ·
1 Parent(s): 589247e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -5
README.md CHANGED
@@ -22,11 +22,10 @@ Fully trained in native FP32 precision.
22
  Optimization performed using standard AdamW.
23
  No Adam8bit, quantized optimizer states, or reduced-precision optimizer approximations were used during training.
24
  Intended to preserve numerical stability and high-fidelity gradient accumulation throughout all training phases.
 
25
  DIT Ernie Model
26
  Uses a Monte Carlo estimation approach to approximate FP32 behavior.
27
- The model does not operate as a strict full FP32 pipeline.
28
- Instead, stochastic estimation techniques are applied to emulate FP32 statistical characteristics while reducing computational overhead.
29
- This approach trades exact deterministic FP32 arithmetic for probabilistic approximation efficiency.
30
  Training Details
31
  Mistral LLM
32
  Precision
@@ -86,8 +85,6 @@ optimizer precision analysis
86
  numerical stability benchmarking
87
  transformer architecture experimentation
88
  Limitations
89
- Full FP32 training incurs substantial VRAM and compute costs.
90
- Monte Carlo FP32 approximation may not exactly reproduce deterministic FP32 outputs.
91
  Results can vary depending on:
92
  sampling strategy
93
  hardware backend
 
22
  Optimization performed using standard AdamW.
23
  No Adam8bit, quantized optimizer states, or reduced-precision optimizer approximations were used during training.
24
  Intended to preserve numerical stability and high-fidelity gradient accumulation throughout all training phases.
25
+
26
  DIT Ernie Model
27
  Uses a Monte Carlo estimation approach to approximate FP32 behavior.
28
+
 
 
29
  Training Details
30
  Mistral LLM
31
  Precision
 
85
  numerical stability benchmarking
86
  transformer architecture experimentation
87
  Limitations
 
 
88
  Results can vary depending on:
89
  sampling strategy
90
  hardware backend