--- datasets: - microsoft/rStar-Coder - patrickfleith/instruction-freak-reasoning - nvidia/OpenCodeReasoning - open-r1/codeforces-cots base_model: - Qwen/Qwen3-0.6B --- Qwen3-Desert.Coder.MoE-8X0.6B 📌 Model Overview Model Name: WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B Organization: Within Us AI Model Type: Mixture-of-Experts (MoE) Code LLM Architecture: Qwen 3 (MoE) Expert Configuration: 8 × 0.6B experts Active Parameters (per token): ~0.6B–1.2B (estimated routing) Total Parameters: ~2B–4B class (sparse MoE structure) Primary Focus: Efficient agentic coding + sparse reasoning This model is a Mixture-of-Experts coding system, designed to deliver high capability at low compute cost by activating only a subset of its network per token. It’s part of the Within Us AI push toward: “Sparse intelligence: bigger thinking, smaller runtime.” The model appears in the WithinUsAI lineup as a MoE-based coding variant alongside dense and nano models.  ⸻ 🧬 Architecture & Lineage Base Foundation * Built on Qwen 3 architecture, a strong open LLM family known for multilingual understanding and coding capability * Qwen models are widely used for efficient, high-performance reasoning and coding systems  MoE Design (8×0.6B) This model uses a Mixture-of-Experts (MoE) structure: * 8 specialized expert subnetworks (~0.6B each) * A router dynamically selects which experts activate per token * Only a subset runs → reducing compute cost Why MoE Matters Instead of one monolithic brain 🧠 this model is more like a team of specialists: * One expert for syntax * One for logic * One for debugging * One for reasoning patterns Only the needed “experts” wake up per task. ⸻ 🧠 Core Design Philosophy Don’t make one model smarter… make many small ones collaborate. Design Goals: * High coding performance per FLOP * Sparse activation for efficiency * Agent-compatible reasoning * Local + scalable deployment ⸻ ⚙️ Key Capabilities 💻 Coding * Multi-language support (Python, JS, C++, etc.) * Function generation and debugging * Algorithm reasoning 🤖 Agentic Behavior * Task decomposition * Tool-use compatibility * Structured outputs (JSON, steps) 🧠 Sparse Reasoning * Expert specialization improves efficiency * Handles diverse coding tasks with targeted computation ⸻ 📦 Deployment Characteristics Runtime Behavior * Activates only part of the network → lower compute cost * Faster inference than dense models of similar total size * Scales well across CPU and GPU environments Supported Environments * Hugging Face Transformers * vLLM (if MoE supported) * Custom inference pipelines * GGUF possible if converted ⸻ 🚀 Intended Use ✅ Ideal Use Cases * Coding agents (multi-step workflows) * Efficient local deployments * Multi-agent systems (many small models) * Research into MoE architectures * Cost-sensitive AI systems ⚠️ Limitations * MoE routing can be unstable in edge cases * Requires proper inference support (not all runtimes handle MoE well) * Smaller active parameter size limits deep reasoning vs large dense models ⸻ 🧪 Training & Methodology Within Us AI pipeline includes: * Code-focused instruction tuning * Agentic workflow datasets * Reasoning trace integration * Evaluation-driven refinement Data Sources * Proprietary Within Us AI datasets * Third-party datasets (no ownership claimed) * Focus on: * Coding tasks * Debugging workflows * Structured reasoning ⸻ 📊 Expected Performance Profile Capability Strength Coding High Efficiency Very High Reasoning depth Moderate Scalability High Agent readiness High ⸻ 📜 License License Type: Inherits from Qwen / base model ecosystem Attribution Notes: * Base architecture: Qwen (Alibaba ecosystem) * MoE + training methodology: Within Us AI * Third-party datasets used without ownership claims * Credit belongs to original creators ⸻ 🙏 Acknowledgements * Alibaba Qwen team * Open-source MoE research community * Hugging Face ecosystem * Dataset contributors ⸻ 🔗 Links * Model: https://huggingface.co/WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B * Organization: https://huggingface.co/WithinUsAI ⸻ 🧩 Closing Note This model feels like a desert outpost of specialists 🏜️ Quiet. Efficient. Each expert waiting… …and when the problem arrives, only the right minds step forward.