Update README.md
Browse files
README.md
CHANGED
|
@@ -133,7 +133,7 @@ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated sys
|
|
| 133 |
|
| 134 |
## Training Method
|
| 135 |
|
| 136 |
-
 = r[-log(π_θ⁺(a|q) / π(a|q))] + (1-r)[-log((1 - r_q * (π_θ⁺(a|q) / π(a|q))) / (1-r_q))]
|
| 148 |
```
|
| 149 |
|
| 150 |
-
 while using a simpler supervised learning approach.
|
| 194 |
|
| 195 |
-

|
| 137 |
|
| 138 |
The NFT training pipeline consists of three main components:
|
| 139 |
|
|
|
|
| 147 |
L_NFT(θ) = r[-log(π_θ⁺(a|q) / π(a|q))] + (1-r)[-log((1 - r_q * (π_θ⁺(a|q) / π(a|q))) / (1-r_q))]
|
| 148 |
```
|
| 149 |
|
| 150 |
+

|
| 151 |
|
| 152 |
## Training Datasets
|
| 153 |
|
|
|
|
| 176 |
|
| 177 |
## Performance
|
| 178 |
|
| 179 |
+

|
| 180 |
|
| 181 |
NFT-32B achieves state-of-the-art performance among supervised learning methods for mathematical reasoning:
|
| 182 |
|
|
|
|
| 192 |
|
| 193 |
Notably, NFT-32B performs similarly to DAPO (59.2% vs 59.9%) while using a simpler supervised learning approach.
|
| 194 |
|
| 195 |
+

|
| 196 |
|
| 197 |
## Usage
|
| 198 |
|