OsirisSoul-v1-MLX / README.md
osirisbrain's picture
Initial upload: OsirisSoul-v1-MLX (Qwen3-0.6B rebrand for personality filtering)
a1430ef verified
metadata
license: apache-2.0
language:
  - en
  - es
  - zh
tags:
  - mlx
  - tool-calling
  - function-calling
  - intent-classification
  - osirisbrain
  - apple-silicon
  - qwen3
base_model: Qwen/Qwen3-0.6B
pipeline_tag: text-generation
library_name: mlx

OsirisTalon-v3-0.6B-MLX

The Talon — Osiris's ultra-fast tool classifier brain. Runs alongside the main Cortex (9B) on Apple Silicon unified memory via MLX.

Purpose

Pre-classifies user intent in <100ms, selecting the optimal tool and arguments before the main Cortex model processes the request. This eliminates an entire ReAct inference cycle, cutting total response time from ~60-134s to ~25s.

Architecture

  • Base Model: Qwen3-0.6B (600M parameters)
  • Format: MLX 4-bit quantized (Apple Silicon native)
  • Size: ~335MB
  • Speed: ~200+ tokens/sec on M2 Pro (MLX Metal)
  • Purpose: Tool selection, intent classification, complexity rating

Usage

from mlx_lm import load, generate

model, tokenizer = load("osirisbrain/OsirisTalon-v3-0.6B-MLX")
prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "cuanto espacio tengo en disco"}],
    add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, max_tokens=100)

Integration

Runs as a dedicated MLX inference server on port 8086, coexisting with llama-server (Cortex 9B) on port 8085. Both share Apple Silicon unified memory without conflict.

Credits

Rebranded from mlx-community/Qwen3-0.6B-4bit for the OsirisBrain sovereign AGI ecosystem. Original model: Qwen/Qwen3-0.6B by Alibaba.