webxos commited on
Commit
20570d0
·
verified ·
1 Parent(s): 975648e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -13
README.md CHANGED
@@ -13,6 +13,19 @@ tags:
13
  base_model:
14
  - openai-community/gpt2
15
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  <div id="app">
17
  <!-- TOP BAR -->
18
  <div class="top-bar">
@@ -27,19 +40,6 @@ base_model:
27
 
28
 
29
 
30
- # MICROD v1.0 (micro-distill-grpo-vae)
31
- This model was made with the Micro Distillery app available at:
32
-
33
- webxos.netlify.app/MICROD
34
-
35
- -Model Distillation Training: Simulate GRPO optimization with VAE filtering for small LLMs (42M-345M params).
36
- -Policy Experimentation: Test group sizes, KL penalties, cache reuse for RLHF-like training.
37
- -VAE Filtering: Apply latent space compression to improve distillation quality.
38
- -Sandbox Testing: Execute safe Python code with feedback masking.
39
- -Export & Deployment: Generate deployable models for inference in various frameworks.
40
- -Offline Usage: PWA supports offline training simulation and exports.
41
- ```
42
-
43
  ## Model Description
44
  This is a distilled language model trained using Group Relative Policy Optimization (GRPO) with VAE filtering.
45
  **MICROD v1.0 (micro-distill-grpo-vae)** is a small template model designed to be built upon for custom ground up builds. It is distillated into a
 
13
  base_model:
14
  - openai-community/gpt2
15
  ---
16
+
17
+ # MICROD v1.0 (micro-distill-grpo-vae)
18
+ This model was made with the Micro Distillery app available at:
19
+
20
+ webxos.netlify.app/MICROD
21
+
22
+ -Model Distillation Training: Simulate GRPO optimization with VAE filtering for small LLMs (42M-345M params).
23
+ -Policy Experimentation: Test group sizes, KL penalties, cache reuse for RLHF-like training.
24
+ -VAE Filtering: Apply latent space compression to improve distillation quality.
25
+ -Sandbox Testing: Execute safe Python code with feedback masking.
26
+ -Export & Deployment: Generate deployable models for inference in various frameworks.
27
+ -Offline Usage: PWA supports offline training simulation and exports.
28
+
29
  <div id="app">
30
  <!-- TOP BAR -->
31
  <div class="top-bar">
 
40
 
41
 
42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
  ## Model Description
44
  This is a distilled language model trained using Group Relative Policy Optimization (GRPO) with VAE filtering.
45
  **MICROD v1.0 (micro-distill-grpo-vae)** is a small template model designed to be built upon for custom ground up builds. It is distillated into a