Update README.md
Browse files
README.md
CHANGED
|
@@ -13,6 +13,19 @@ tags:
|
|
| 13 |
base_model:
|
| 14 |
- openai-community/gpt2
|
| 15 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
<div id="app">
|
| 17 |
<!-- TOP BAR -->
|
| 18 |
<div class="top-bar">
|
|
@@ -27,19 +40,6 @@ base_model:
|
|
| 27 |
|
| 28 |
|
| 29 |
|
| 30 |
-
# MICROD v1.0 (micro-distill-grpo-vae)
|
| 31 |
-
This model was made with the Micro Distillery app available at:
|
| 32 |
-
|
| 33 |
-
webxos.netlify.app/MICROD
|
| 34 |
-
|
| 35 |
-
-Model Distillation Training: Simulate GRPO optimization with VAE filtering for small LLMs (42M-345M params).
|
| 36 |
-
-Policy Experimentation: Test group sizes, KL penalties, cache reuse for RLHF-like training.
|
| 37 |
-
-VAE Filtering: Apply latent space compression to improve distillation quality.
|
| 38 |
-
-Sandbox Testing: Execute safe Python code with feedback masking.
|
| 39 |
-
-Export & Deployment: Generate deployable models for inference in various frameworks.
|
| 40 |
-
-Offline Usage: PWA supports offline training simulation and exports.
|
| 41 |
-
```
|
| 42 |
-
|
| 43 |
## Model Description
|
| 44 |
This is a distilled language model trained using Group Relative Policy Optimization (GRPO) with VAE filtering.
|
| 45 |
**MICROD v1.0 (micro-distill-grpo-vae)** is a small template model designed to be built upon for custom ground up builds. It is distillated into a
|
|
|
|
| 13 |
base_model:
|
| 14 |
- openai-community/gpt2
|
| 15 |
---
|
| 16 |
+
|
| 17 |
+
# MICROD v1.0 (micro-distill-grpo-vae)
|
| 18 |
+
This model was made with the Micro Distillery app available at:
|
| 19 |
+
|
| 20 |
+
webxos.netlify.app/MICROD
|
| 21 |
+
|
| 22 |
+
-Model Distillation Training: Simulate GRPO optimization with VAE filtering for small LLMs (42M-345M params).
|
| 23 |
+
-Policy Experimentation: Test group sizes, KL penalties, cache reuse for RLHF-like training.
|
| 24 |
+
-VAE Filtering: Apply latent space compression to improve distillation quality.
|
| 25 |
+
-Sandbox Testing: Execute safe Python code with feedback masking.
|
| 26 |
+
-Export & Deployment: Generate deployable models for inference in various frameworks.
|
| 27 |
+
-Offline Usage: PWA supports offline training simulation and exports.
|
| 28 |
+
|
| 29 |
<div id="app">
|
| 30 |
<!-- TOP BAR -->
|
| 31 |
<div class="top-bar">
|
|
|
|
| 40 |
|
| 41 |
|
| 42 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
## Model Description
|
| 44 |
This is a distilled language model trained using Group Relative Policy Optimization (GRPO) with VAE filtering.
|
| 45 |
**MICROD v1.0 (micro-distill-grpo-vae)** is a small template model designed to be built upon for custom ground up builds. It is distillated into a
|