webxos commited on
Commit
fe8e7b3
·
verified ·
1 Parent(s): 597edad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -0
README.md CHANGED
@@ -60,6 +60,11 @@ This is a distilled language model trained using Group Relative Policy Optimizat
60
  small set of files the user can use to template their own agents. Designed for educational learning and micro scalling.
61
  Use **MICROD V1.0 (micro-distill-grpo-vae)** in your own custom projects and train it from the ground up.
62
 
 
 
 
 
 
63
  ## Model Details
64
  - **Model type**: micro-distill-grpo-vae
65
  - **Model size**: 42M parameters
 
60
  small set of files the user can use to template their own agents. Designed for educational learning and micro scalling.
61
  Use **MICROD V1.0 (micro-distill-grpo-vae)** in your own custom projects and train it from the ground up.
62
 
63
+ The model's architecture details further underscore its educational bent: a hidden size of 512, 8 layers, 8 attention heads, a vocabulary of 50,257 tokens,
64
+ and a max sequence length of 1024. It supports KV-cache reuse with a 512 cache size, enabling faster generation for sequential thoughts, though this feature
65
+ is noted as inactive in some interfaces. Licensed under Apache 2.0, it's openly available for modification, and its small footprint allows quantization,
66
+ making it runnable on modest hardware like CPUs or even browsers via TensorFlow.js integration.
67
+
68
  ## Model Details
69
  - **Model type**: micro-distill-grpo-vae
70
  - **Model size**: 42M parameters