webxos
/

microd_v1

webxos commited on 23 days ago

Commit

fe8e7b3

verified ·

1 Parent(s): 597edad

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -60,6 +60,11 @@ This is a distilled language model trained using Group Relative Policy Optimizat
 small set of files the user can use to template their own agents. Designed for educational learning and micro scalling.
 Use **MICROD V1.0 (micro-distill-grpo-vae)** in your own custom projects and train it from the ground up.
 ## Model Details
 - **Model type**: micro-distill-grpo-vae
 - **Model size**: 42M parameters

 small set of files the user can use to template their own agents. Designed for educational learning and micro scalling.
 Use **MICROD V1.0 (micro-distill-grpo-vae)** in your own custom projects and train it from the ground up.
+The model's architecture details further underscore its educational bent: a hidden size of 512, 8 layers, 8 attention heads, a vocabulary of 50,257 tokens,
+and a max sequence length of 1024. It supports KV-cache reuse with a 512 cache size, enabling faster generation for sequential thoughts, though this feature
+is noted as inactive in some interfaces. Licensed under Apache 2.0, it's openly available for modification, and its small footprint allows quantization,
+making it runnable on modest hardware like CPUs or even browsers via TensorFlow.js integration.
 ## Model Details
 - **Model type**: micro-distill-grpo-vae
 - **Model size**: 42M parameters