webxos
/

microd_v1

webxos commited on Dec 31, 2025

Commit

970442c

verified ·

1 Parent(s): e98e499

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -51,7 +51,7 @@ This is a distilled language model trained using Group Relative Policy Optimizat
 small set of files the user can use to template their own agents. Designed for educational learning and micro scalling.
 Use **MICROD V1.0 (micro-distill-grpo-vae)** in your own custom projects and train it from the ground up.
-The model's architecture details further underscore its educational niche: a hidden size of 512, 8 layers, 8 attention heads, a vocabulary of 50,257 tokens,
 and a max sequence length of 1024. It supports KV-cache reuse with a 512 cache size, enabling faster generation for sequential thoughts, though this feature
 is noted as inactive in some interfaces. Licensed under Apache 2.0, it's openly available for modification, and its small footprint allows quantization,
 making it runnable on modest hardware like CPUs or even browsers via TensorFlow.js integration.

 small set of files the user can use to template their own agents. Designed for educational learning and micro scalling.
 Use **MICROD V1.0 (micro-distill-grpo-vae)** in your own custom projects and train it from the ground up.
+The model's architecture details further underscore an educational niche: a hidden size of 512, 8 layers, 8 attention heads, a vocabulary of 50,257 tokens,
 and a max sequence length of 1024. It supports KV-cache reuse with a 512 cache size, enabling faster generation for sequential thoughts, though this feature
 is noted as inactive in some interfaces. Licensed under Apache 2.0, it's openly available for modification, and its small footprint allows quantization,
 making it runnable on modest hardware like CPUs or even browsers via TensorFlow.js integration.