| --- |
| datasets: |
| - microsoft/rStar-Coder |
| - patrickfleith/instruction-freak-reasoning |
| - nvidia/OpenCodeReasoning |
| - open-r1/codeforces-cots |
| base_model: |
| - Qwen/Qwen3-0.6B |
| --- |
| Qwen3-Desert.Coder.MoE-8X0.6B |
|
|
| 📌 Model Overview |
|
|
| Model Name: WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B |
| Organization: Within Us AI |
| Model Type: Mixture-of-Experts (MoE) Code LLM |
| Architecture: Qwen 3 (MoE) |
| Expert Configuration: 8 × 0.6B experts |
| Active Parameters (per token): ~0.6B–1.2B (estimated routing) |
| Total Parameters: ~2B–4B class (sparse MoE structure) |
| Primary Focus: Efficient agentic coding + sparse reasoning |
|
|
| This model is a Mixture-of-Experts coding system, designed to deliver high capability at low compute cost by activating only a subset of its network per token. |
|
|
| It’s part of the Within Us AI push toward: |
|
|
| “Sparse intelligence: bigger thinking, smaller runtime.” |
|
|
| The model appears in the WithinUsAI lineup as a MoE-based coding variant alongside dense and nano models.  |
|
|
| ⸻ |
|
|
| 🧬 Architecture & Lineage |
|
|
| Base Foundation |
|
|
| * Built on Qwen 3 architecture, a strong open LLM family known for multilingual understanding and coding capability |
| * Qwen models are widely used for efficient, high-performance reasoning and coding systems  |
|
|
| MoE Design (8×0.6B) |
|
|
| This model uses a Mixture-of-Experts (MoE) structure: |
|
|
| * 8 specialized expert subnetworks (~0.6B each) |
| * A router dynamically selects which experts activate per token |
| * Only a subset runs → reducing compute cost |
|
|
| Why MoE Matters |
|
|
| Instead of one monolithic brain 🧠 |
| this model is more like a team of specialists: |
|
|
| * One expert for syntax |
| * One for logic |
| * One for debugging |
| * One for reasoning patterns |
|
|
| Only the needed “experts” wake up per task. |
|
|
| ⸻ |
|
|
| 🧠 Core Design Philosophy |
|
|
| Don’t make one model smarter… make many small ones collaborate. |
|
|
| Design Goals: |
|
|
| * High coding performance per FLOP |
| * Sparse activation for efficiency |
| * Agent-compatible reasoning |
| * Local + scalable deployment |
|
|
| ⸻ |
|
|
| ⚙️ Key Capabilities |
|
|
| 💻 Coding |
|
|
| * Multi-language support (Python, JS, C++, etc.) |
| * Function generation and debugging |
| * Algorithm reasoning |
|
|
| 🤖 Agentic Behavior |
|
|
| * Task decomposition |
| * Tool-use compatibility |
| * Structured outputs (JSON, steps) |
|
|
| 🧠 Sparse Reasoning |
|
|
| * Expert specialization improves efficiency |
| * Handles diverse coding tasks with targeted computation |
|
|
| ⸻ |
|
|
| 📦 Deployment Characteristics |
|
|
| Runtime Behavior |
|
|
| * Activates only part of the network → lower compute cost |
| * Faster inference than dense models of similar total size |
| * Scales well across CPU and GPU environments |
|
|
| Supported Environments |
|
|
| * Hugging Face Transformers |
| * vLLM (if MoE supported) |
| * Custom inference pipelines |
| * GGUF possible if converted |
|
|
| ⸻ |
|
|
| 🚀 Intended Use |
|
|
| ✅ Ideal Use Cases |
|
|
| * Coding agents (multi-step workflows) |
| * Efficient local deployments |
| * Multi-agent systems (many small models) |
| * Research into MoE architectures |
| * Cost-sensitive AI systems |
|
|
| ⚠️ Limitations |
|
|
| * MoE routing can be unstable in edge cases |
| * Requires proper inference support (not all runtimes handle MoE well) |
| * Smaller active parameter size limits deep reasoning vs large dense models |
|
|
| ⸻ |
|
|
| 🧪 Training & Methodology |
|
|
| Within Us AI pipeline includes: |
|
|
| * Code-focused instruction tuning |
| * Agentic workflow datasets |
| * Reasoning trace integration |
| * Evaluation-driven refinement |
|
|
| Data Sources |
|
|
| * Proprietary Within Us AI datasets |
| * Third-party datasets (no ownership claimed) |
| * Focus on: |
| * Coding tasks |
| * Debugging workflows |
| * Structured reasoning |
|
|
| ⸻ |
|
|
| 📊 Expected Performance Profile |
|
|
| Capability Strength |
| Coding High |
| Efficiency Very High |
| Reasoning depth Moderate |
| Scalability High |
| Agent readiness High |
|
|
| ⸻ |
|
|
| 📜 License |
|
|
| License Type: Inherits from Qwen / base model ecosystem |
|
|
| Attribution Notes: |
|
|
| * Base architecture: Qwen (Alibaba ecosystem) |
| * MoE + training methodology: Within Us AI |
| * Third-party datasets used without ownership claims |
| * Credit belongs to original creators |
|
|
| ⸻ |
|
|
| 🙏 Acknowledgements |
|
|
| * Alibaba Qwen team |
| * Open-source MoE research community |
| * Hugging Face ecosystem |
| * Dataset contributors |
|
|
| ⸻ |
|
|
| 🔗 Links |
|
|
| * Model: https://huggingface.co/WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B |
| * Organization: https://huggingface.co/WithinUsAI |
|
|
| ⸻ |
|
|
| 🧩 Closing Note |
|
|
| This model feels like a desert outpost of specialists 🏜️ |
|
|
| Quiet. Efficient. |
| Each expert waiting… |
|
|
| …and when the problem arrives, |
| only the right minds step forward. |