Qwen Models Guide
Overview
ATLES primarily uses Qwen models from Alibaba Cloud for its AI capabilities. These models are state-of-the-art language models that provide excellent performance for conversation, reasoning, and code generation tasks.
π€ Qwen Models in ATLES
Primary Models
1. Qwen2.5:7b - Main Conversational Model
- Size: ~4.7 GB
- Purpose: Primary model for general conversations, reasoning, and question answering
- Capabilities:
- Natural language understanding and generation
- Complex reasoning and problem-solving
- General knowledge and question answering
- Mathematical calculations
- Multi-turn conversations with context retention
- Performance: 95% task success rate
- Resource Usage: High (optimal GPU usage)
- When to Use: Default for all standard interactions, general queries, reasoning tasks
2. Qwen2.5-Coder:latest - Specialized Coding Model
- Size: ~4.7 GB
- Purpose: Specialized model for programming and technical tasks
- Capabilities:
- Code generation in multiple languages
- Code debugging and optimization
- Technical documentation
- Algorithm explanation
- Software architecture analysis
- Performance: 98% confidence for coding tasks
- Resource Usage: High
- When to Use: Programming help, code review, debugging, technical analysis
3. Qwen2:7b - Alternative Generative Model
- Size: ~4.4 GB
- Purpose: Previous generation Qwen model, backup option
- Capabilities:
- General conversation
- Text generation
- Basic reasoning
- When to Use: Fallback when newer models unavailable
Embedding Model
EmbeddingGemma:300m
- Size: ~300 MB (lightweight!)
- Purpose: Generate embeddings for semantic search and document analysis
- Capabilities:
- Text embedding generation
- Semantic similarity analysis
- Document clustering
- Search and retrieval
- Content analysis
- Performance: 90% effectiveness for embedding tasks
- Resource Usage: Low (25-50% GPU)
- When to Use: Finding similar documents, semantic search, document analysis
Backup Models
Llama3.2:3b
- Size: ~2.0 GB
- Purpose: Lightweight backup model for simple tasks
- Capabilities:
- Basic conversation
- Simple math
- Lightweight queries
- Resource Usage: Low to medium
- When to Use: Only as backup when main models unavailable, or for very simple tasks
Gemma3:4b
- Size: ~3.3 GB
- Purpose: Alternative lightweight model
- Capabilities: General conversation, basic reasoning
- When to Use: Alternative backup option
π§ Intelligent Model Router
ATLES includes an Intelligent Model Router that automatically selects the best model for each task:
Automatic Task Detection
# Example routing decisions:
"Find similar documents" β EmbeddingGemma (95% confidence)
"What is quantum computing?" β Qwen2.5:7b (90% confidence)
"Write a Python function" β Qwen2.5-Coder (98% confidence)
"Analyze this document" β EmbeddingGemma (90% confidence)
Model Selection Strategy
- Pattern-based detection - Analyzes request keywords and structure
- Performance-based selection - Chooses model with best success rate
- Confidence scoring - Provides transparency in routing decisions
- Fallback chains - Ensures reliability if primary model unavailable
Task Type Routing
| Task Type | Primary Model | Confidence |
|---|---|---|
| Embedding generation | EmbeddingGemma:300m | 95% |
| Similarity analysis | EmbeddingGemma:300m | 95% |
| Document clustering | EmbeddingGemma:300m | 90% |
| Search/retrieval | EmbeddingGemma:300m | 90% |
| Conversation | Qwen2.5:7b | 90% |
| Reasoning | Qwen2.5:7b | 90% |
| Question answering | Qwen2.5:7b | 90% |
| Code generation | Qwen2.5-Coder | 98% |
| Debugging | Qwen2.5-Coder | 95% |
| Technical analysis | Qwen2.5-Coder | 95% |
π§ Model Hierarchy
Priority Order
1. Qwen2.5:7b (PRIMARY)
β Best for: General conversations, reasoning, questions
2. Qwen2.5-Coder:latest (SPECIALIST)
β Best for: Code, programming, technical tasks
3. Llama3.2:3b (BACKUP)
β Best for: Simple tasks, low resource situations
4. Gemma3:4b (ALTERNATIVE)
β Best for: Alternative backup option
Fallback Chain
If primary model fails or unavailable:
Qwen2.5:7b β Qwen2.5-Coder:latest β Llama3.2:3b β Gemma3:4b
π¦ Installation & Setup
1. Install Ollama
ATLES uses Ollama to manage models locally.
# Windows
winget install Ollama.Ollama
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
2. Pull Qwen Models
# Primary conversational model
ollama pull qwen2.5:7b
# Specialized coding model
ollama pull qwen2.5-coder:latest
# Embedding model
ollama pull embeddinggemma:300m
# Backup models (optional)
ollama pull llama3.2:3b
ollama pull gemma3:4b
3. Verify Installation
# List installed models
ollama list
# Should show:
# qwen2.5:7b 4.7 GB
# qwen2.5-coder:latest 4.7 GB
# embeddinggemma:300m 300 MB
# llama3.2:3b 2.0 GB
# gemma3:4b 3.3 GB
4. Start Ollama Server
# The server should auto-start, but if needed:
ollama serve
π¨ Custom ATLES Models
ATLES can create custom enhanced versions of Qwen models with:
- Direct model weight modifications
- Constitutional reasoning enhancements
- Truth-seeking capabilities
- Manipulation detection
Create Custom Model
# Create Modelfile.atles
FROM qwen2.5:7b
# ATLES Enhanced Configuration
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
SYSTEM """You are ATLES (Autonomous Truth-seeking Learning Enhancement System),
an advanced AI with enhanced constitutional reasoning, truth-seeking capabilities,
and manipulation detection."""
# Build custom model
ollama create atles-qwen2.5:7b-enhanced -f Modelfile.atles
See CUSTOM_MODEL_SETUP_INSTRUCTIONS.md for detailed setup.
π Model Performance
Resource Usage
| Model | GPU Usage | CPU Usage | RAM Usage |
|---|---|---|---|
| Qwen2.5:7b | 60-80% | 30-40% | ~6 GB |
| Qwen2.5-Coder | 60-80% | 30-40% | ~6 GB |
| EmbeddingGemma:300m | 25-50% | 15-25% | ~1 GB |
| Llama3.2:3b | 40-60% | 20-30% | ~3 GB |
| Gemma3:4b | 50-70% | 25-35% | ~4 GB |
Speed Benchmarks
| Model | Tokens/Second | Response Time |
|---|---|---|
| Qwen2.5:7b | 25-35 | Fast |
| Qwen2.5-Coder | 25-35 | Fast |
| EmbeddingGemma:300m | 50-100 | Very Fast |
| Llama3.2:3b | 40-50 | Very Fast |
| Gemma3:4b | 30-40 | Fast |
Quality Ratings
| Model | Accuracy | Reasoning | Creativity | Code Quality |
|---|---|---|---|---|
| Qwen2.5:7b | βββββ | βββββ | ββββ | ββββ |
| Qwen2.5-Coder | ββββ | ββββ | βββ | βββββ |
| Llama3.2:3b | βββ | βββ | βββ | ββ |
| Gemma3:4b | ββββ | βββ | βββ | βββ |
π οΈ Configuration
Model Selection in Code
from atles.intelligent_model_router import IntelligentModelRouter
router = IntelligentModelRouter()
# Automatic routing
decision = router.route_request("What is quantum computing?")
print(f"Using: {decision.selected_model}") # qwen2.5:7b
decision = router.route_request("Write a Python function")
print(f"Using: {decision.selected_model}") # qwen2.5-coder:latest
Manual Model Selection
# In Desktop App
selected_model = "qwen2.5:7b" # Change in UI dropdown
# In configuration files
{
"preferred_models": [
"qwen2.5:7b",
"qwen2.5-coder:latest",
"llama3.2:3b"
]
}
π Model Capabilities Comparison
Qwen2.5:7b vs Qwen2.5-Coder
| Feature | Qwen2.5:7b | Qwen2.5-Coder |
|---|---|---|
| General Conversation | βββββ | ββββ |
| Code Generation | ββββ | βββββ |
| Mathematical Reasoning | βββββ | ββββ |
| Creative Writing | ββββ | βββ |
| Technical Documentation | ββββ | βββββ |
| Debugging | βββ | βββββ |
| Algorithm Design | ββββ | βββββ |
| Natural Language | βββββ | ββββ |
π Troubleshooting
Model Not Found (404 Error)
# Check if model is installed
ollama list
# If missing, pull it
ollama pull qwen2.5:7b
# Verify Ollama is running
curl http://localhost:11434
Memory Issues
Error: model requires more system memory (4.3 GiB) than is available
Solution: Use lighter models or close other applications
# Use lighter model
ollama pull llama3.2:3b # Only 2GB
# Or increase system memory/swap
Slow Performance
- Check GPU usage: Qwen models perform best with GPU acceleration
- Verify CUDA: For NVIDIA GPUs, ensure CUDA is properly installed
- Reduce concurrent models: Only run one large model at a time
- Use appropriate model: Use Llama3.2:3b for simple tasks
Model Selection Issues
If router selects wrong model:
- Check task patterns in
intelligent_model_router.py - Manually specify model in UI dropdown
- Review router logs for confidence scores
π Related Documentation
- CUSTOM_MODEL_SETUP_INSTRUCTIONS.md - Create enhanced ATLES models
- OLLAMA_INTEGRATION_GUIDE.md - Deep dive into Ollama integration
- DEVELOPER_GUIDE.md - Development with ATLES models
- CORRECT_MODEL_HIERARCHY_SUMMARY.md - Model priority details
π― Best Practices
- Use Qwen2.5:7b as default for all general interactions
- Switch to Qwen2.5-Coder when working with code
- Let the router decide for optimal automatic selection
- Keep models updated with
ollama pull model:tag - Monitor resource usage to ensure optimal performance
- Create custom models for specialized use cases
- Use EmbeddingGemma for all semantic search tasks
β FAQ
Q: Why Qwen instead of other models?
A: Qwen models offer the best balance of performance, speed, and capability for ATLES. They excel at reasoning, coding, and conversation.
Q: Can I use other models?
A: Yes! ATLES supports any Ollama-compatible model. Just add it to the router configuration.
Q: Which model is fastest?
A: EmbeddingGemma:300m is fastest for its tasks. For generation, Llama3.2:3b is fastest but Qwen2.5:7b offers better quality.
Q: How much disk space do I need?
A: Minimum 10 GB for Qwen2.5:7b + Qwen2.5-Coder. Recommended 15 GB to include all models.
Q: Do I need a GPU?
A: No, but highly recommended. Qwen models work on CPU but are much faster with GPU acceleration.
Q: Can I run multiple models simultaneously?
A: Yes, but resource intensive. The router handles switching automatically for optimal performance.
Last Updated: November 2025
ATLES Version: v6.0+
For questions or issues, see OLLAMA_TROUBLESHOOTING.md