webxos commited on
Commit
970442c
·
verified ·
1 Parent(s): e98e499

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -51,7 +51,7 @@ This is a distilled language model trained using Group Relative Policy Optimizat
51
  small set of files the user can use to template their own agents. Designed for educational learning and micro scalling.
52
  Use **MICROD V1.0 (micro-distill-grpo-vae)** in your own custom projects and train it from the ground up.
53
 
54
- The model's architecture details further underscore its educational niche: a hidden size of 512, 8 layers, 8 attention heads, a vocabulary of 50,257 tokens,
55
  and a max sequence length of 1024. It supports KV-cache reuse with a 512 cache size, enabling faster generation for sequential thoughts, though this feature
56
  is noted as inactive in some interfaces. Licensed under Apache 2.0, it's openly available for modification, and its small footprint allows quantization,
57
  making it runnable on modest hardware like CPUs or even browsers via TensorFlow.js integration.
 
51
  small set of files the user can use to template their own agents. Designed for educational learning and micro scalling.
52
  Use **MICROD V1.0 (micro-distill-grpo-vae)** in your own custom projects and train it from the ground up.
53
 
54
+ The model's architecture details further underscore an educational niche: a hidden size of 512, 8 layers, 8 attention heads, a vocabulary of 50,257 tokens,
55
  and a max sequence length of 1024. It supports KV-cache reuse with a 512 cache size, enabling faster generation for sequential thoughts, though this feature
56
  is noted as inactive in some interfaces. Licensed under Apache 2.0, it's openly available for modification, and its small footprint allows quantization,
57
  making it runnable on modest hardware like CPUs or even browsers via TensorFlow.js integration.