Update README.md

aebcafa verified 25 days ago

4.41 kB

	---
	datasets:
	- microsoft/rStar-Coder
	- patrickfleith/instruction-freak-reasoning
	- nvidia/OpenCodeReasoning
	- open-r1/codeforces-cots
	base_model:
	- Qwen/Qwen3-0.6B
	---
	Qwen3-Desert.Coder.MoE-8X0.6B

	📌 Model Overview

	Model Name: WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B
	Organization: Within Us AI
	Model Type: Mixture-of-Experts (MoE) Code LLM
	Architecture: Qwen 3 (MoE)
	Expert Configuration: 8 × 0.6B experts
	Active Parameters (per token): ~0.6B–1.2B (estimated routing)
	Total Parameters: ~2B–4B class (sparse MoE structure)
	Primary Focus: Efficient agentic coding + sparse reasoning

	This model is a Mixture-of-Experts coding system, designed to deliver high capability at low compute cost by activating only a subset of its network per token.

	It’s part of the Within Us AI push toward:

	“Sparse intelligence: bigger thinking, smaller runtime.”

	The model appears in the WithinUsAI lineup as a MoE-based coding variant alongside dense and nano models.

	⸻

	🧬 Architecture & Lineage

	Base Foundation

	* Built on Qwen 3 architecture, a strong open LLM family known for multilingual understanding and coding capability
	* Qwen models are widely used for efficient, high-performance reasoning and coding systems

	MoE Design (8×0.6B)

	This model uses a Mixture-of-Experts (MoE) structure:

	* 8 specialized expert subnetworks (~0.6B each)
	* A router dynamically selects which experts activate per token
	* Only a subset runs → reducing compute cost

	Why MoE Matters

	Instead of one monolithic brain 🧠
	this model is more like a team of specialists:

	* One expert for syntax
	* One for logic
	* One for debugging
	* One for reasoning patterns

	Only the needed “experts” wake up per task.

	⸻

	🧠 Core Design Philosophy

	Don’t make one model smarter… make many small ones collaborate.

	Design Goals:

	* High coding performance per FLOP
	* Sparse activation for efficiency
	* Agent-compatible reasoning
	* Local + scalable deployment

	⸻

	⚙️ Key Capabilities

	💻 Coding

	* Multi-language support (Python, JS, C++, etc.)
	* Function generation and debugging
	* Algorithm reasoning

	🤖 Agentic Behavior

	* Task decomposition
	* Tool-use compatibility
	* Structured outputs (JSON, steps)

	🧠 Sparse Reasoning

	* Expert specialization improves efficiency
	* Handles diverse coding tasks with targeted computation

	⸻

	📦 Deployment Characteristics

	Runtime Behavior

	* Activates only part of the network → lower compute cost
	* Faster inference than dense models of similar total size
	* Scales well across CPU and GPU environments

	Supported Environments

	* Hugging Face Transformers
	* vLLM (if MoE supported)
	* Custom inference pipelines
	* GGUF possible if converted

	⸻

	🚀 Intended Use

	✅ Ideal Use Cases

	* Coding agents (multi-step workflows)
	* Efficient local deployments
	* Multi-agent systems (many small models)
	* Research into MoE architectures
	* Cost-sensitive AI systems

	⚠️ Limitations

	* MoE routing can be unstable in edge cases
	* Requires proper inference support (not all runtimes handle MoE well)
	* Smaller active parameter size limits deep reasoning vs large dense models

	⸻

	🧪 Training & Methodology

	Within Us AI pipeline includes:

	* Code-focused instruction tuning
	* Agentic workflow datasets
	* Reasoning trace integration
	* Evaluation-driven refinement

	Data Sources

	* Proprietary Within Us AI datasets
	* Third-party datasets (no ownership claimed)
	* Focus on:
	* Coding tasks
	* Debugging workflows
	* Structured reasoning

	⸻

	📊 Expected Performance Profile

	Capability Strength
	Coding High
	Efficiency Very High
	Reasoning depth Moderate
	Scalability High
	Agent readiness High

	⸻

	📜 License

	License Type: Inherits from Qwen / base model ecosystem

	Attribution Notes:

	* Base architecture: Qwen (Alibaba ecosystem)
	* MoE + training methodology: Within Us AI
	* Third-party datasets used without ownership claims
	* Credit belongs to original creators

	⸻

	🙏 Acknowledgements

	* Alibaba Qwen team
	* Open-source MoE research community
	* Hugging Face ecosystem
	* Dataset contributors

	⸻

	🔗 Links

	* Model: https://huggingface.co/WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B
	* Organization: https://huggingface.co/WithinUsAI

	⸻

	🧩 Closing Note

	This model feels like a desert outpost of specialists 🏜️

	Quiet. Efficient.
	Each expert waiting…

	…and when the problem arrives,
	only the right minds step forward.