README.md · osirisbrain/OsirisSoul-v1-MLX at main

OsirisSoul-v1-MLX / README.md

osirisbrain

Initial upload: OsirisSoul-v1-MLX (Qwen3-0.6B rebrand for personality filtering)

a1430ef verified about 1 month ago

preview code

raw

history blame contribute delete

1.7 kB

	---
	license: apache-2.0
	language:
	- en
	- es
	- zh
	tags:
	- mlx
	- tool-calling
	- function-calling
	- intent-classification
	- osirisbrain
	- apple-silicon
	- qwen3
	base_model: Qwen/Qwen3-0.6B
	pipeline_tag: text-generation
	library_name: mlx
	---

	# OsirisTalon-v3-0.6B-MLX

	The Talon — Osiris's ultra-fast tool classifier brain. Runs alongside the main Cortex (9B) on Apple Silicon unified memory via MLX.

	## Purpose

	Pre-classifies user intent in <100ms, selecting the optimal tool and arguments _before_ the main Cortex model processes the request. This eliminates an entire ReAct inference cycle, cutting total response time from ~60-134s to ~25s.

	## Architecture

	- Base Model: Qwen3-0.6B (600M parameters)
	- Format: MLX 4-bit quantized (Apple Silicon native)
	- Size: ~335MB
	- Speed: ~200+ tokens/sec on M2 Pro (MLX Metal)
	- Purpose: Tool selection, intent classification, complexity rating

	## Usage

	```python
	from mlx_lm import load, generate

	model, tokenizer = load("osirisbrain/OsirisTalon-v3-0.6B-MLX")
	prompt = tokenizer.apply_chat_template(
	[{"role": "user", "content": "cuanto espacio tengo en disco"}],
	add_generation_prompt=True
	)
	response = generate(model, tokenizer, prompt=prompt, max_tokens=100)
	```

	## Integration

	Runs as a dedicated MLX inference server on port 8086, coexisting with llama-server (Cortex 9B) on port 8085. Both share Apple Silicon unified memory without conflict.

	## Credits

	Rebranded from [mlx-community/Qwen3-0.6B-4bit](https://huggingface.co/mlx-community/Qwen3-0.6B-4bit) for the OsirisBrain sovereign AGI ecosystem.
	Original model: [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) by Alibaba.