Update README.md
Browse files
README.md
CHANGED
|
@@ -33,13 +33,6 @@ This model was made with the Micro Distillery app available at:
|
|
| 33 |
|
| 34 |
webxos.netlify.app/MICROD
|
| 35 |
|
| 36 |
-
-Model Distillation Training: Simulate GRPO optimization with VAE filtering for small LLMs (42M-345M params).
|
| 37 |
-
-Policy Experimentation: Test group sizes, KL penalties, cache reuse for RLHF-like training.
|
| 38 |
-
-VAE Filtering: Apply latent space compression to improve distillation quality.
|
| 39 |
-
-Sandbox Testing: Execute safe Python code with feedback masking.
|
| 40 |
-
-Export & Deployment: Generate deployable models for inference in various frameworks.
|
| 41 |
-
-Offline Usage: PWA supports offline training simulation and exports.
|
| 42 |
-
|
| 43 |
<div id="app">
|
| 44 |
<!-- TOP BAR -->
|
| 45 |
<div class="top-bar">
|
|
@@ -51,8 +44,6 @@ webxos.netlify.app/MICROD
|
|
| 51 |
<div class="pill">- **Model type**: micro-distill-grpo-vae</div>
|
| 52 |
<button id="invertBtn" class="btn-ghost">- **License**: Apache 2.0</button>
|
| 53 |
</div>
|
| 54 |
-
|
| 55 |
-
|
| 56 |
|
| 57 |
## Model Description
|
| 58 |
This is a distilled language model trained using Group Relative Policy Optimization (GRPO) with VAE filtering.
|
|
@@ -85,6 +76,17 @@ making it runnable on modest hardware like CPUs or even browsers via TensorFlow.
|
|
| 85 |
|
| 86 |
## Usage
|
| 87 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
|
| 89 |
## Citation
|
| 90 |
|
|
|
|
| 33 |
|
| 34 |
webxos.netlify.app/MICROD
|
| 35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
<div id="app">
|
| 37 |
<!-- TOP BAR -->
|
| 38 |
<div class="top-bar">
|
|
|
|
| 44 |
<div class="pill">- **Model type**: micro-distill-grpo-vae</div>
|
| 45 |
<button id="invertBtn" class="btn-ghost">- **License**: Apache 2.0</button>
|
| 46 |
</div>
|
|
|
|
|
|
|
| 47 |
|
| 48 |
## Model Description
|
| 49 |
This is a distilled language model trained using Group Relative Policy Optimization (GRPO) with VAE filtering.
|
|
|
|
| 76 |
|
| 77 |
## Usage
|
| 78 |
|
| 79 |
+
-Model Distillation Training: Simulate GRPO optimization with VAE filtering for small LLMs (42M-345M params).
|
| 80 |
+
|
| 81 |
+
-Policy Experimentation: Test group sizes, KL penalties, cache reuse for RLHF-like training.
|
| 82 |
+
|
| 83 |
+
-VAE Filtering: Apply latent space compression to improve distillation quality.
|
| 84 |
+
|
| 85 |
+
-Sandbox Testing: Execute safe Python code with feedback masking.
|
| 86 |
+
|
| 87 |
+
-Export & Deployment: Generate deployable models for inference in various frameworks.
|
| 88 |
+
|
| 89 |
+
-Offline Usage: PWA supports offline training simulation and exports.
|
| 90 |
|
| 91 |
## Citation
|
| 92 |
|