OsirisSoul-v1-MLX / README.md
osirisbrain's picture
Initial upload: OsirisSoul-v1-MLX (Qwen3-0.6B rebrand for personality filtering)
a1430ef verified
---
license: apache-2.0
language:
- en
- es
- zh
tags:
- mlx
- tool-calling
- function-calling
- intent-classification
- osirisbrain
- apple-silicon
- qwen3
base_model: Qwen/Qwen3-0.6B
pipeline_tag: text-generation
library_name: mlx
---
# OsirisTalon-v3-0.6B-MLX
**The Talon** — Osiris's ultra-fast tool classifier brain. Runs alongside the main Cortex (9B) on Apple Silicon unified memory via MLX.
## Purpose
Pre-classifies user intent in **<100ms**, selecting the optimal tool and arguments _before_ the main Cortex model processes the request. This eliminates an entire ReAct inference cycle, cutting total response time from ~60-134s to ~25s.
## Architecture
- **Base Model:** Qwen3-0.6B (600M parameters)
- **Format:** MLX 4-bit quantized (Apple Silicon native)
- **Size:** ~335MB
- **Speed:** ~200+ tokens/sec on M2 Pro (MLX Metal)
- **Purpose:** Tool selection, intent classification, complexity rating
## Usage
```python
from mlx_lm import load, generate
model, tokenizer = load("osirisbrain/OsirisTalon-v3-0.6B-MLX")
prompt = tokenizer.apply_chat_template(
[{"role": "user", "content": "cuanto espacio tengo en disco"}],
add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, max_tokens=100)
```
## Integration
Runs as a dedicated MLX inference server on port 8086, coexisting with llama-server (Cortex 9B) on port 8085. Both share Apple Silicon unified memory without conflict.
## Credits
Rebranded from [mlx-community/Qwen3-0.6B-4bit](https://huggingface.co/mlx-community/Qwen3-0.6B-4bit) for the OsirisBrain sovereign AGI ecosystem.
Original model: [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) by Alibaba.