webxos commited on
Commit
ab96051
·
verified ·
1 Parent(s): fe8e7b3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -9
README.md CHANGED
@@ -33,13 +33,6 @@ This model was made with the Micro Distillery app available at:
33
 
34
  webxos.netlify.app/MICROD
35
 
36
- -Model Distillation Training: Simulate GRPO optimization with VAE filtering for small LLMs (42M-345M params).
37
- -Policy Experimentation: Test group sizes, KL penalties, cache reuse for RLHF-like training.
38
- -VAE Filtering: Apply latent space compression to improve distillation quality.
39
- -Sandbox Testing: Execute safe Python code with feedback masking.
40
- -Export & Deployment: Generate deployable models for inference in various frameworks.
41
- -Offline Usage: PWA supports offline training simulation and exports.
42
-
43
  <div id="app">
44
  <!-- TOP BAR -->
45
  <div class="top-bar">
@@ -51,8 +44,6 @@ webxos.netlify.app/MICROD
51
  <div class="pill">- **Model type**: micro-distill-grpo-vae</div>
52
  <button id="invertBtn" class="btn-ghost">- **License**: Apache 2.0</button>
53
  </div>
54
-
55
-
56
 
57
  ## Model Description
58
  This is a distilled language model trained using Group Relative Policy Optimization (GRPO) with VAE filtering.
@@ -85,6 +76,17 @@ making it runnable on modest hardware like CPUs or even browsers via TensorFlow.
85
 
86
  ## Usage
87
 
 
 
 
 
 
 
 
 
 
 
 
88
 
89
  ## Citation
90
 
 
33
 
34
  webxos.netlify.app/MICROD
35
 
 
 
 
 
 
 
 
36
  <div id="app">
37
  <!-- TOP BAR -->
38
  <div class="top-bar">
 
44
  <div class="pill">- **Model type**: micro-distill-grpo-vae</div>
45
  <button id="invertBtn" class="btn-ghost">- **License**: Apache 2.0</button>
46
  </div>
 
 
47
 
48
  ## Model Description
49
  This is a distilled language model trained using Group Relative Policy Optimization (GRPO) with VAE filtering.
 
76
 
77
  ## Usage
78
 
79
+ -Model Distillation Training: Simulate GRPO optimization with VAE filtering for small LLMs (42M-345M params).
80
+
81
+ -Policy Experimentation: Test group sizes, KL penalties, cache reuse for RLHF-like training.
82
+
83
+ -VAE Filtering: Apply latent space compression to improve distillation quality.
84
+
85
+ -Sandbox Testing: Execute safe Python code with feedback masking.
86
+
87
+ -Export & Deployment: Generate deployable models for inference in various frameworks.
88
+
89
+ -Offline Usage: PWA supports offline training simulation and exports.
90
 
91
  ## Citation
92