| | --- |
| | license: apache-2.0 |
| | tags: |
| | - krystalmind |
| | - qwen3 |
| | - routing |
| | - pharma-crm |
| | - json-planner |
| | base_model: Qwen/Qwen3-0.6B-Instruct |
| | --- |
| | |
| | # KrystalMind Cortex (0.6B JSON Planner) |
| |
|
| | Structured JSON action planner for the L4 intelligence pipeline. Routes divergence signals to appropriate data sources. |
| |
|
| | ## Architecture |
| | - **Base**: Qwen3-0.6B-Instruct (28 layers, d_model=1024, 16Q/8KV heads) |
| | - **LoRA SFT**: rank=16, scale=1.0, 1000 iters on routing_sft_v1 (14,836 examples) |
| | - **Output**: Structured JSON plan (db_queries, ecs_metrics, ssm_context, simulation) |
| | - **Decoding**: Greedy (T=0.0) with JSON-aware brace-depth stop |
| |
|
| | ## Files |
| | - `krystalball_cortex.kb` β Baked production model (571 MB, FP32) |
| | - `fused/model.safetensors` β Full fused Qwen3-0.6B + LoRA (1.1 GB) |
| | - `adapters/adapters.safetensors` β LoRA adapters only (10 MB) |
| | - `tokenizer.json` β Qwen3 tokenizer |
| |
|
| | ## Performance |
| | - **INT8**: ~86 tok/s, **BF16**: ~53 tok/s |
| | - **Typical plan**: 150 tokens greedy @ ~1.75s |
| | - **3-tier parse robustness**: Direct JSON β repair β deterministic fallback |
| |
|