Spaces:

mnoorchenar
/

langgraph-support-agent

Sleeping

App Files Files Community

langgraph-support-agent / MODELS.md

mnoorchenar

Update 2026-03-22 16:47:51

4c66cda 2 months ago

preview code

raw

history blame contribute delete

5.23 kB

	# 🤖 Available Models - Feature Guide

	## What Changed?

	We added 4 different free HuggingFace models that you can switch between in real-time. Each has different trade-offs.

	---

	## Available Models

	### 1. ⚡ Qwen 2.5 7B (Default)
	- Speed: ⚡⚡⚡ Very Fast
	- Quality: ⭐⭐⭐⭐⭐ Excellent
	- Model: `Qwen/Qwen2.5-7B-Instruct`
	- Best for: Production use, balanced quality & speed
	- Size: 7B parameters
	- Why it's first: Most reliable on free-tier HuggingFace

	### 2. 💎 Gemma 2 9B
	- Speed: ⚡⚡⚡ Fast
	- Quality: ⭐⭐⭐⭐⭐ Excellent
	- Model: `google/gemma-2-9b-it`
	- Best for: High-quality responses
	- Size: 9B parameters
	- Note: Google model, very reliable

	### 3. 🌀 Mistral 7B
	- Speed: ⚡⚡⚡ Fast
	- Quality: ⭐⭐⭐⭐ Very Good
	- Model: `mistralai/Mistral-7B-Instruct-v0.2`
	- Best for: Balanced option, newer version
	- Size: 7B parameters
	- Note: Good alternative to Qwen

	### 4. ⚙️ TinyLlama 1.1B
	- Speed: ⚡⚡⚡⚡⚡ Ultra Fast
	- Quality: ⭐⭐⭐ Good
	- Model: `TinyLlama/TinyLlama-1.1B-Chat-v1.0`
	- Best for: Testing, learning, super fast responses
	- Size: 1.1B parameters (tiny!)
	- Note: Much smaller, faster, but less capable

	---

	## How Switching Works

	### User Selects Model:
	```
	1. User picks "Gemma 2 9B" from dropdown
	2. Frontend sends: {"model": "google/gemma-2-9b-it"}
	3. Backend receives request
	4. LangGraph agent uses Gemma instead of Qwen
	```

	### Smart Fallback (NEW!):
	If the selected model fails, the agent automatically tries fallback models:

	```
	User selected: Qwen
	↓
	Qwen fails
	↓
	Try: Gemma 2 9B
	↓
	Gemma works!
	↓
	Returns result using Gemma
	(User sees: "Model switched to Gemma")
	```

	Fallback order:
	1. Gemma 2 9B
	2. Mistral 7B
	3. TinyLlama 1.1B

	---

	## When to Use Each Model

	\| Scenario \| Model \| Why \|
	\|----------\|-------\|-----\|
	\| Production \| Qwen 2.5 \| Reliable + fast + free-tier \|
	\| Quality matters \| Gemma 2 \| Highest quality responses \|
	\| Testing \| TinyLlama \| Instant responses \|
	\| Unsure \| Qwen 2.5 \| Default is always safe \|

	---

	## Code Changes Made

	### 1. `app.py` - Added 4 Models
	```python
	AVAILABLE_MODELS = [
	{"id":"Qwen/Qwen2.5-7B-Instruct","name":"Qwen 2.5 7B","badge":"⚡ Fast & Reliable"},
	{"id":"google/gemma-2-9b-it","name":"Gemma 2 9B","badge":"💎 Quality"},
	{"id":"mistralai/Mistral-7B-Instruct-v0.2","name":"Mistral 7B","badge":"🌀 Balanced"},
	{"id":"TinyLlama/TinyLlama-1.1B-Chat-v1.0","name":"TinyLlama 1.1B","badge":"⚙️ Lightweight"},
	]

	FALLBACK_MODEL = "Qwen/Qwen2.5-7B-Instruct"
	```

	### 2. `agent/llm.py` - Added Fallback Logic
	```python
	def call_llm_with_fallback(client, primary_model, fallback_models, messages, emit_token):
	"""Try primary model, then fallback models if it fails"""
	# Tries models in order until one works
	# Returns (result, model_used)
	```

	### 3. `agent/nodes.py` - Using Fallback
	```python
	full_text, used_model = call_llm_with_fallback(
	client, state["model_name"], FALLBACK_MODELS, messages, emit_token
	)

	# Logs if model was switched
	if used_model != state["model_name"]:
	ev.emit(sid, {"type":"model_switch","from":state["model_name"],"to":used_model})
	```

	### 4. `templates/index.html` - Already Works!
	The dropdown automatically loads all 4 models from backend.

	---

	## Testing Models

	Try each model with the same question to see differences:

	Test Question: "What is your return policy?"

	1. Qwen 2.5 - Fast, concise answer
	2. Gemma 2 - Detailed, thorough answer
	3. Mistral 7B - Clear, structured answer
	4. TinyLlama - Shorter but good answer

	---

	## Next Steps You Can Do

	1. Add more models from HuggingFace:
	```python
	{"id":"meta-llama/Llama-2-7b-chat-hf","name":"Llama 2","badge":"🦙"},
	```

	2. Compare models on your own use cases

	3. Use TinyLlama for testing (instant responses)

	4. Set Gemma as default if you prefer quality:
	```python
	AVAILABLE_MODELS = [Gemma, ...] # Move Gemma to top
	```

	---

	## Important Notes

	✅ All models are free - HuggingFace Inference API free tier

	✅ No API keys needed - Uses your HF_TOKEN (already set)

	✅ Auto-fallback - Never get stuck without response

	✅ Easy to switch - Just select from dropdown

	❌ Not real APIs - All are HuggingFace inference endpoints

	❌ Free tier limits - May have rate limits (but generous)

	---

	## How Free-Tier HF Works

	```
	Your Space
	↓
	HuggingFace Inference API (Free)
	↓
	├─ Qwen 2.5 7B ✓
	├─ Gemma 2 9B ✓
	├─ Mistral 7B ✓
	└─ TinyLlama ✓

	All free, no payment needed!
	```

	Your HF_TOKEN gives you access to all these models.

	---

	## Performance Tips

	1. Use TinyLlama for testing (1.1B = super fast)
	2. Use Qwen for production (best balance)
	3. Use Gemma when you need better quality
	4. Use Mistral as middle ground

	Speed comparison:
	- TinyLlama: ~500ms
	- Qwen: ~1-2s
	- Gemma: ~2-3s
	- Mistral: ~2-3s

	---

	Enjoy your multi-model support! 🚀