Instructions to use WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF",
	filename="Qwen3-Desert.Coder.MoE-8X0.6B.Q4_K_M.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF:Q4_K_M

Use Docker

docker model run hf.co/WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF with Ollama:
```
ollama run hf.co/WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF:Q4_K_M
```

Unsloth Studio

How to use WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF to start chatting

Atomic Chat new
Docker Model Runner
How to use WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF with Docker Model Runner:
```
docker model run hf.co/WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF:Q4_K_M
```

Lemonade

How to use WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Qwen3-Desert.Coder.MoE-8X0.6B-GGUF-Q4_K_M

List all available models

lemonade list

Qwen3-Desert.Coder.MoE-8X0.6B

📌 Model Overview

Model Name: WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B Organization: Within Us AI Model Type: Mixture-of-Experts (MoE) Code LLM Architecture: Qwen 3 (MoE) Expert Configuration: 8 × 0.6B experts Active Parameters (per token): ~0.6B–1.2B (estimated routing) Total Parameters: ~2B–4B class (sparse MoE structure) Primary Focus: Efficient agentic coding + sparse reasoning

This model is a Mixture-of-Experts coding system, designed to deliver high capability at low compute cost by activating only a subset of its network per token.

It’s part of the Within Us AI push toward:

“Sparse intelligence: bigger thinking, smaller runtime.”

The model appears in the WithinUsAI lineup as a MoE-based coding variant alongside dense and nano models.

⸻

🧬 Architecture & Lineage

Base Foundation

Built on Qwen 3 architecture, a strong open LLM family known for multilingual understanding and coding capability
Qwen models are widely used for efficient, high-performance reasoning and coding systems

MoE Design (8×0.6B)

This model uses a Mixture-of-Experts (MoE) structure:

8 specialized expert subnetworks (~0.6B each)
A router dynamically selects which experts activate per token
Only a subset runs → reducing compute cost

Why MoE Matters

Instead of one monolithic brain 🧠 this model is more like a team of specialists:

One expert for syntax
One for logic
One for debugging
One for reasoning patterns

Only the needed “experts” wake up per task.

⸻

🧠 Core Design Philosophy

Don’t make one model smarter… make many small ones collaborate.

Design Goals:

High coding performance per FLOP
Sparse activation for efficiency
Agent-compatible reasoning
Local + scalable deployment

⸻

⚙️ Key Capabilities

💻 Coding

Multi-language support (Python, JS, C++, etc.)
Function generation and debugging
Algorithm reasoning

🤖 Agentic Behavior

Task decomposition
Tool-use compatibility
Structured outputs (JSON, steps)

🧠 Sparse Reasoning

Expert specialization improves efficiency
Handles diverse coding tasks with targeted computation

⸻

📦 Deployment Characteristics

Runtime Behavior

Activates only part of the network → lower compute cost
Faster inference than dense models of similar total size
Scales well across CPU and GPU environments

Supported Environments

Hugging Face Transformers
vLLM (if MoE supported)
Custom inference pipelines
GGUF possible if converted

⸻

🚀 Intended Use

✅ Ideal Use Cases

Coding agents (multi-step workflows)
Efficient local deployments
Multi-agent systems (many small models)
Research into MoE architectures
Cost-sensitive AI systems

⚠️ Limitations

MoE routing can be unstable in edge cases
Requires proper inference support (not all runtimes handle MoE well)
Smaller active parameter size limits deep reasoning vs large dense models

⸻

🧪 Training & Methodology

Within Us AI pipeline includes:

Code-focused instruction tuning
Agentic workflow datasets
Reasoning trace integration
Evaluation-driven refinement

Data Sources

Proprietary Within Us AI datasets
Third-party datasets (no ownership claimed)
Focus on:
- Coding tasks
- Debugging workflows
- Structured reasoning

⸻

📊 Expected Performance Profile

Capability Strength Coding High Efficiency Very High Reasoning depth Moderate Scalability High Agent readiness High

⸻

📜 License

License Type: Inherits from Qwen / base model ecosystem

Attribution Notes:

Base architecture: Qwen (Alibaba ecosystem)
MoE + training methodology: Within Us AI
Third-party datasets used without ownership claims
Credit belongs to original creators

⸻

🙏 Acknowledgements

Alibaba Qwen team
Open-source MoE research community
Hugging Face ecosystem
Dataset contributors

⸻

🔗 Links

Model: https://huggingface.co/WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B
Organization: https://huggingface.co/WithinUsAI

⸻

🧩 Closing Note

This model feels like a desert outpost of specialists 🏜️

Quiet. Efficient. Each expert waiting…

…and when the problem arrives, only the right minds step forward.

Downloads last month: 333

GGUF

Model size

2B params

Architecture

qwen3moe

Hardware compatibility

4-bit

5-bit

6-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Quantized

(351)

this model

Datasets used to train WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF

Collections including WithinUsAI/Qwen3-Desert.Coder.MoE-8X0.6B-GGUF

WithIn US AI (((GGUF MODELS)))

Collection

LLM MODELS TRAINED, FINE-TUNED, MERGED and Refusal Removal BY (WITHIN US AI) • 24 items • Updated about 9 hours ago • 7

“Qwen 3”

Collection

All models are “Alibaba Qwen 3” at core fine-tuned, merged & trained by (WithIn Us AI) • 10 items • Updated about 9 hours ago