GODsStrongestSoldier's picture
Update README.md
aebcafa verified
---
datasets:
- microsoft/rStar-Coder
- patrickfleith/instruction-freak-reasoning
- nvidia/OpenCodeReasoning
- open-r1/codeforces-cots
base_model:
- Qwen/Qwen3-0.6B
---
Qwen3-Desert.Coder.MoE-8X0.6B
📌 Model Overview
Model Name: WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B
Organization: Within Us AI
Model Type: Mixture-of-Experts (MoE) Code LLM
Architecture: Qwen 3 (MoE)
Expert Configuration: 8 × 0.6B experts
Active Parameters (per token): ~0.6B–1.2B (estimated routing)
Total Parameters: ~2B–4B class (sparse MoE structure)
Primary Focus: Efficient agentic coding + sparse reasoning
This model is a Mixture-of-Experts coding system, designed to deliver high capability at low compute cost by activating only a subset of its network per token.
It’s part of the Within Us AI push toward:
“Sparse intelligence: bigger thinking, smaller runtime.”
The model appears in the WithinUsAI lineup as a MoE-based coding variant alongside dense and nano models. 
🧬 Architecture & Lineage
Base Foundation
* Built on Qwen 3 architecture, a strong open LLM family known for multilingual understanding and coding capability
* Qwen models are widely used for efficient, high-performance reasoning and coding systems 
MoE Design (8×0.6B)
This model uses a Mixture-of-Experts (MoE) structure:
* 8 specialized expert subnetworks (~0.6B each)
* A router dynamically selects which experts activate per token
* Only a subset runs → reducing compute cost
Why MoE Matters
Instead of one monolithic brain 🧠
this model is more like a team of specialists:
* One expert for syntax
* One for logic
* One for debugging
* One for reasoning patterns
Only the needed “experts” wake up per task.
🧠 Core Design Philosophy
Don’t make one model smarter… make many small ones collaborate.
Design Goals:
* High coding performance per FLOP
* Sparse activation for efficiency
* Agent-compatible reasoning
* Local + scalable deployment
⚙️ Key Capabilities
💻 Coding
* Multi-language support (Python, JS, C++, etc.)
* Function generation and debugging
* Algorithm reasoning
🤖 Agentic Behavior
* Task decomposition
* Tool-use compatibility
* Structured outputs (JSON, steps)
🧠 Sparse Reasoning
* Expert specialization improves efficiency
* Handles diverse coding tasks with targeted computation
📦 Deployment Characteristics
Runtime Behavior
* Activates only part of the network → lower compute cost
* Faster inference than dense models of similar total size
* Scales well across CPU and GPU environments
Supported Environments
* Hugging Face Transformers
* vLLM (if MoE supported)
* Custom inference pipelines
* GGUF possible if converted
🚀 Intended Use
✅ Ideal Use Cases
* Coding agents (multi-step workflows)
* Efficient local deployments
* Multi-agent systems (many small models)
* Research into MoE architectures
* Cost-sensitive AI systems
⚠️ Limitations
* MoE routing can be unstable in edge cases
* Requires proper inference support (not all runtimes handle MoE well)
* Smaller active parameter size limits deep reasoning vs large dense models
🧪 Training & Methodology
Within Us AI pipeline includes:
* Code-focused instruction tuning
* Agentic workflow datasets
* Reasoning trace integration
* Evaluation-driven refinement
Data Sources
* Proprietary Within Us AI datasets
* Third-party datasets (no ownership claimed)
* Focus on:
* Coding tasks
* Debugging workflows
* Structured reasoning
📊 Expected Performance Profile
Capability Strength
Coding High
Efficiency Very High
Reasoning depth Moderate
Scalability High
Agent readiness High
📜 License
License Type: Inherits from Qwen / base model ecosystem
Attribution Notes:
* Base architecture: Qwen (Alibaba ecosystem)
* MoE + training methodology: Within Us AI
* Third-party datasets used without ownership claims
* Credit belongs to original creators
🙏 Acknowledgements
* Alibaba Qwen team
* Open-source MoE research community
* Hugging Face ecosystem
* Dataset contributors
🔗 Links
* Model: https://huggingface.co/WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B
* Organization: https://huggingface.co/WithinUsAI
🧩 Closing Note
This model feels like a desert outpost of specialists 🏜️
Quiet. Efficient.
Each expert waiting…
…and when the problem arrives,
only the right minds step forward.