Update README.md
Browse files
README.md
CHANGED
|
@@ -13,10 +13,10 @@ tags:
|
|
| 13 |
- vae
|
| 14 |
- pytorch
|
| 15 |
---
|
| 16 |
-
#
|
| 17 |
This model was made with the Micro Distillery app available at:
|
| 18 |
-
webxos.netlify.app/MICROD
|
| 19 |
|
|
|
|
| 20 |
|
| 21 |
Model Distillation Training: Simulate GRPO optimization with VAE filtering for small LLMs (42M-345M params).
|
| 22 |
Policy Experimentation: Test group sizes, KL penalties, cache reuse for RLHF-like training.
|
|
@@ -25,7 +25,6 @@ webxos.netlify.app/MICROD
|
|
| 25 |
Export & Deployment: Generate deployable models for inference in various frameworks.
|
| 26 |
Offline Usage: PWA supports offline training simulation and exports.
|
| 27 |
|
| 28 |
-
|
| 29 |
## Model Description
|
| 30 |
This is a distilled language model trained using Group Relative Policy Optimization (GRPO) with VAE filtering.
|
| 31 |
|
|
|
|
| 13 |
- vae
|
| 14 |
- pytorch
|
| 15 |
---
|
| 16 |
+
# Microd v1.0 by MICRO DISTILLERY
|
| 17 |
This model was made with the Micro Distillery app available at:
|
|
|
|
| 18 |
|
| 19 |
+
webxos.netlify.app/MICROD
|
| 20 |
|
| 21 |
Model Distillation Training: Simulate GRPO optimization with VAE filtering for small LLMs (42M-345M params).
|
| 22 |
Policy Experimentation: Test group sizes, KL penalties, cache reuse for RLHF-like training.
|
|
|
|
| 25 |
Export & Deployment: Generate deployable models for inference in various frameworks.
|
| 26 |
Offline Usage: PWA supports offline training simulation and exports.
|
| 27 |
|
|
|
|
| 28 |
## Model Description
|
| 29 |
This is a distilled language model trained using Group Relative Policy Optimization (GRPO) with VAE filtering.
|
| 30 |
|