Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -28,6 +28,7 @@ We are **not** making bad (or we try not to!) models and we try to fully open so
|
|
| 28 |
- Supra Mini **v3** 0.5M: the third version of the Supra Mini series.
|
| 29 |
- Supra Mini **v4** 2M: the fourth version of the Supra Mini series. Improved. More powerful. With context understanding.
|
| 30 |
- Supra Mini **v5** 8M: the fifth version of the Supra Mini series. A huge token-eater monster compared to its siblings.
|
|
|
|
| 31 |
- MicroSupra 1k: Trained on GTX 750 Ti 4GB, a scaling laws experiment.
|
| 32 |
- StorySupra-10M: Trained on RTX 5060 Ti 16GB for 10 minutes, coherent.
|
| 33 |
- DistillSupra-0.2M: Trained on GTX 750 Ti 4GB for 30 minutes, still incoherent, but the first step for distillation research.
|
|
|
|
| 28 |
- Supra Mini **v3** 0.5M: the third version of the Supra Mini series.
|
| 29 |
- Supra Mini **v4** 2M: the fourth version of the Supra Mini series. Improved. More powerful. With context understanding.
|
| 30 |
- Supra Mini **v5** 8M: the fifth version of the Supra Mini series. A huge token-eater monster compared to its siblings.
|
| 31 |
+
- Supra Mini **v6** 1M: the sixth version of the Supra Mini series. Again a smaller one. Beating v2, v3 and v4 of the Supra Mini series
|
| 32 |
- MicroSupra 1k: Trained on GTX 750 Ti 4GB, a scaling laws experiment.
|
| 33 |
- StorySupra-10M: Trained on RTX 5060 Ti 16GB for 10 minutes, coherent.
|
| 34 |
- DistillSupra-0.2M: Trained on GTX 750 Ti 4GB for 30 minutes, still incoherent, but the first step for distillation research.
|