Spaces:

Pulastya0
/

Data-Science-Agent

Running

App Files Files Community

Data-Science-Agent / DYNAMIC_PROMPTS.md

Pulastya B

docs: Add comprehensive guide for dynamic prompt system

72a3bd7 4 days ago

preview code

raw

history blame contribute delete

10.2 kB

	# Dynamic Prompts for Small Context Windows

	## Problem

	Production systems often face context window constraints:

	\| Model \| Context Window \| Your Full Prompt \| Fits? \|
	\|-------\|---------------\|------------------\|-------\|
	\| Groq Llama 3.3 70B \| 8K tokens \| ~20K tokens \| ❌ Overflow \|
	\| Gemini 2.5 Flash \| 1M tokens \| ~20K tokens \| ✅ No problem \|
	\| GPT-4 Turbo \| 128K tokens \| ~20K tokens \| ✅ OK \|
	\| Claude 3.5 Sonnet \| 200K tokens \| ~20K tokens \| ✅ OK \|

	Your system prompt with 82+ tools is ~20,000 tokens - too large for Groq!

	## Solution: Dynamic Tool Loading

	Instead of loading all 82 tools, detect user intent and load only relevant tools:

	```
	User: "Generate plots for magnitude"
	→ Detects: visualization intent
	→ Loads: 9 visualization tools + 4 core tools
	→ Result: ~2,000 tokens (90% reduction!) ✅
	```

	## How It Works

	### 1. Intent Detection (Keyword-Based)

	```python
	INTENT_KEYWORDS = {
	"visualization": ["plot", "chart", "graph", "visualize", "dashboard"],
	"model_training": ["train", "model", "predict", "classify"],
	"data_quality": ["clean", "missing", "outlier", "quality"],
	"eda": ["profile", "describe", "summary", "statistics"],
	# ... more categories
	}
	```

	### 2. Tool Categories

	```python
	TOOL_CATEGORIES = {
	"visualization": [
	"generate_plotly_dashboard",
	"generate_interactive_scatter",
	"generate_interactive_histogram",
	# ... 6 more visualization tools
	],
	"model_training": [
	"train_baseline_models",
	"hyperparameter_tuning",
	"perform_cross_validation",
	# ... 3 more ML tools
	],
	# ... other categories
	}
	```

	### 3. Dynamic Prompt Generation

	```python
	def build_compact_system_prompt(user_query: str) -> str:
	# Detect user intent
	intents = detect_intent(user_query) # {"visualization"}

	# Get relevant tools
	tools = get_relevant_tools(intents) # 13 tools instead of 82

	# Build compact prompt with only these tools
	return compact_prompt # ~2K tokens instead of ~20K
	```

	## Production Patterns

	### Pattern 1: Router + Specialists (LangChain/CrewAI)

	```
	┌─────────────────┐
	│ Router Agent │ ← Small prompt: "What specialist is needed?"
	│ (2K tokens) │ → Routes to Data Cleaning Agent
	└────────┬────────┘
	│
	┌────▼────────────────────┐
	│ Data Cleaning Specialist│ ← Focused prompt: only cleaning tools
	│ (3K tokens) │
	└─────────────────────────┘
	```

	### Pattern 2: RAG for Tools (Vector Retrieval)

	```python
	# Embed all 82 tool descriptions in vector DB
	tool_embeddings = embed_tools(all_tools)

	# User query → Retrieve top-5 most relevant
	query = "I need to handle missing values"
	relevant_tools = vector_db.similarity_search(query, k=5)
	# Returns: clean_missing_values, handle_outliers, detect_data_quality_issues, ...

	# Only pass these 5 tools to LLM
	prompt = build_prompt_with_tools(relevant_tools) # Much smaller!
	```

	### Pattern 3: Hierarchical Agents (Your New System)

	```
	User: "Train a model"
	↓
	Intent Detector → "model_training" + "data_quality"
	↓
	Load Tools: 4 core + 5 data_quality + 6 model_training = 15 tools
	↓
	Compact Prompt: ~3K tokens ✅
	```

	## Token Comparison

	### Full Prompt (All 82 Tools)
	```
	System Instructions: 10K tokens
	Tool Descriptions: 8K tokens
	Workflow Rules: 2K tokens
	────────────────────────────────
	TOTAL: ~20K tokens
	```

	### Compact Prompt (15 Relevant Tools)
	```
	System Instructions: 1K tokens (condensed)
	Tool Descriptions: 1K tokens (only 15 tools)
	Workflow Rules: 500 tokens (simplified)
	────────────────────────────────
	TOTAL: ~2.5K tokens (87.5% reduction!)
	```

	## Usage

	### Automatic (Recommended)

	```python
	# Auto-enables for Groq, disabled for Gemini
	agent = DataScienceCopilot(
	provider="groq" # Compact prompts automatically enabled
	)
	```

	### Manual Control

	```python
	# Force compact prompts even with Gemini
	agent = DataScienceCopilot(
	provider="gemini",
	use_compact_prompts=True # Override
	)
	```

	### Environment Variable

	```bash
	# Enable compact prompts globally
	export USE_COMPACT_PROMPTS=true
	```

	## Intent Categories

	\| Category \| Keywords \| Tools Loaded \| Use Case \|
	\|----------\|----------\|--------------\|----------\|
	\| visualization \| plot, chart, graph, visualize, dashboard \| 9 tools \| User wants plots only \|
	\| model_training \| train, model, predict, classify, forecast \| 6 tools \| ML pipeline \|
	\| data_quality \| clean, missing, outlier, quality, duplicates \| 5 tools \| Data cleaning \|
	\| feature_engineering \| feature, encode, transform, scale, normalize \| 8 tools \| Feature creation \|
	\| eda \| profile, describe, summary, statistics, distribution \| 5 tools \| Exploratory analysis \|
	\| time_series \| time, date, datetime, temporal, trend, seasonality \| 4 tools \| Temporal data \|
	\| optimization \| tune, optimize, hyperparameter, improve \| 3 tools \| Model tuning \|
	\| code_execution \| execute, run code, calculate, custom, python \| 2 tools \| Custom Python code \|

	Default: If no keywords detected → loads "eda" category

	## Real-World Example

	### Before (Full Prompt)

	```
	User: "Generate plots for magnitude and latitude"

	Prompt includes:
	✅ 9 visualization tools (needed)
	❌ 6 ML training tools (not needed)
	❌ 5 data quality tools (not needed)
	❌ 8 feature engineering tools (not needed)
	❌ 54 other tools (not needed)
	────────────────────────────────────
	TOTAL: 82 tools, ~20K tokens → OVERFLOW on Groq ❌
	```

	### After (Dynamic Prompt)

	```
	User: "Generate plots for magnitude and latitude"

	Intent detected: "visualization"

	Prompt includes:
	✅ 9 visualization tools (needed)
	✅ 4 core tools (always included)
	────────────────────────────────────
	TOTAL: 13 tools, ~2K tokens → Fits Groq perfectly ✅
	```

	## Advanced: Multi-Intent Detection

	Some queries need multiple categories:

	```python
	# Query with multiple intents
	query = "Clean the data, encode categories, and train a model"

	intents = detect_intent(query)
	# Returns: {"data_quality", "feature_engineering", "model_training"}

	tools = get_relevant_tools(intents)
	# Loads: 4 core + 5 data_quality + 8 feature_engineering + 6 model_training
	# = 23 tools (~4K tokens) - still fits in 8K context!
	```

	## Performance Impact

	### Token Savings

	\| Query Type \| Full Prompt \| Compact Prompt \| Reduction \|
	\|------------\|-------------\|----------------\|-----------\|
	\| Visualization only \| 20K tokens \| 2K tokens \| 90% \|
	\| Data profiling \| 20K tokens \| 2.5K tokens \| 87.5% \|
	\| Full ML pipeline \| 20K tokens \| 5K tokens \| 75% \|

	### Latency Impact

	- No additional latency - Intent detection is fast (<10ms)
	- Faster LLM inference - Smaller prompts = faster processing
	- Same accuracy - LLM only needs relevant tools for the task

	## Comparison: Other Approaches

	### 1. Prompt Compression (Microsoft LLMLingua)

	❌ Loses semantic information
	❌ Hard to debug
	❌ Requires fine-tuning
	✅ 80% compression possible

	### 2. Tool RAG (Vector Retrieval)

	✅ Very accurate tool selection
	✅ Scales to 1000+ tools
	❌ Requires vector DB setup
	❌ Embedding costs
	❌ Latency overhead (100-200ms)

	### 3. Dynamic Loading (Your System)

	✅ Simple keyword matching - no ML needed
	✅ Zero latency - instant intent detection
	✅ Deterministic - same query = same tools
	✅ Debuggable - easy to see which tools loaded
	✅ 90% token reduction for single-intent queries
	⚠️ May load unnecessary tools for vague queries

	## When to Use Each Approach

	\| Scenario \| Best Approach \| Why \|
	\|----------\|---------------\|-----\|
	\| < 20 tools \| Full prompt \| No optimization needed \|
	\| 20-100 tools \| Dynamic loading (your system) \| Simple, fast, effective \|
	\| 100-500 tools \| Tool RAG \| Better precision at scale \|
	\| 500+ tools \| Hierarchical agents \| Separate specialists \|
	\| Groq/Small models \| Dynamic loading ✅ \| Perfect for 8K context \|
	\| Gemini/Large models \| Full prompt \| Context window not an issue \|

	## Testing

	Test the system with different queries:

	```bash
	# Run demo (shows token savings)
	python src/dynamic_prompts.py

	# Output:
	# 📊 Example 1: 'Generate interactive plots'
	# Detected intents: {'visualization'}
	# Tools loaded: 13
	# Prompt stats: 2,134 tokens, 89 lines
	#
	# 🤖 Example 2: 'Train a model'
	# Detected intents: {'model_training', 'data_quality'}
	# Tools loaded: 15
	# Prompt stats: 3,567 tokens, 112 lines
	```

	## Monitoring

	Add logging to track prompt sizes:

	```python
	if self.use_compact_prompts:
	intents = detect_intent(task_description)
	logger.info(f"Detected intents: {intents}")
	logger.info(f"Tools loaded: {len(get_relevant_tools(intents))}")
	logger.info(f"Estimated tokens: {len(system_prompt) // 4}")
	```

	## Future Improvements

	1. LLM-based intent detection - More accurate than keywords
	2. Tool usage analytics - Learn which tools are actually used together
	3. Hybrid RAG + dynamic - Combine both approaches
	4. Adaptive thresholds - Adjust tool loading based on remaining context
	5. Tool clustering - Group similar tools automatically

	## Conclusion

	Your dynamic prompt system solves the Groq context window problem by:

	✅ 90% token reduction for focused queries
	✅ Zero latency overhead (keyword matching is instant)
	✅ Simple implementation (no ML, no vector DBs)
	✅ Automatic for Groq (manual override available)
	✅ Production-ready (deterministic, debuggable)

	This is exactly what LangChain and CrewAI do under the hood - your implementation is industry-standard! 🚀

	---

	Now you can use Groq with 82+ tools without context overflow! 🎉