Spaces:

MCP-1st-Birthday
/

sdlc-agent

Runtime error

App Files Files Community

Veeru-c commited on 15 days ago

Commit

fc9883e

1 Parent(s): 4b59886

initial commit

Browse files

Files changed (3) hide show

docs/HOW_TO_RUN.md +101 -165
mcp/mcp_server.py +74 -43
src/rag/modal-rag-product-design.py +14 -0

docs/HOW_TO_RUN.md CHANGED Viewed

@@ -1,229 +1,165 @@
-# How to Run the Fine-Tuning Pipeline
-This guide walks you through the complete pipeline from data generation to model deployment.
 ---
-## 📊 Dataset Generation Results
-### Final Statistics
-- **Training Samples**: 201,651
-- **Validation Samples**: 22,407
-- **Total Dataset**: 224,058 high-quality QA pairs
-- **Improvement**: 150x more data than previous approach
-### Batch Performance
-| Batch | Files | Data Points | Status |
-|-------|-------|-------------|--------|
-| 1 | 1,000 | 100,611 | ✅ Excellent |
-| 2 | 1,000 | 39,960 | ✅ Good |
-| 3 | 1,000 | 0 | ⚠️ Complex files |
-| 4 | 1,000 | 600 | ⚠️ Runner issue |
-| 5 | 1,000 | 54,627 | ✅ Excellent |
-| 6 | 1,000 | 5,400 | ✅ Good |
-| 7 | 888 | 22,860 | ✅ Good |
 ---
-## 🚀 Step-by-Step Instructions
-### Step 1: Fine-Tune the Model
-Run the fine-tuning job on Modal with H200 GPU:
 ```bash
-cd /Users/veeru/agents/mcp-hack
-# Start fine-tuning in detached mode
-./venv/bin/modal run --detach src/finetune/finetune_modal.py
 ```
-**What happens:**
-- Loads 201,651 training samples from `finetune-dataset` volume
-- Trains Phi-3-mini-4k-instruct with LoRA on H200 GPU
-- Runs for ~90-120 minutes
-- Saves model to `model-checkpoints` volume
-**Monitor progress:**
 ```bash
-# View live logs
-modal app logs mcp-hack::finetune-phi3-modal
 ```
----
-### Step 2: Evaluate the Model
-After training completes, test the model:
 ```bash
-./venv/bin/modal run src/finetune/eval_finetuned.py
-```
-This will run sample questions and show the model's answers.
----
-### Step 3: Deploy API Endpoint
-### 4. Deploy Inference API
-You have three options for deployment. For production use with low latency (<3s), **Option B** is recommended.
-**Option A: Standard GPU Endpoint (A10G)**
-Good for testing, uses standard Transformers library.
-```bash
-./venv/bin/modal deploy src/finetune/api_endpoint.py
 ```
-**Option B: High-Performance vLLM Endpoint (Recommended)**
-Uses vLLM for <3s latency. Requires model merging first.
-1. **Merge Model**: Convert LoRA adapter to full model
-   ```bash
-   ./venv/bin/modal run src/finetune/merge_model.py
-   ```
-2. **Deploy vLLM Endpoint**:
-   ```bash
-   ./venv/bin/modal deploy src/finetune/api_endpoint_vllm.py
-   ```
-**Option C: CPU Endpoint**
-Slowest, but cheapest. Good for debugging without GPU quota.
 ```bash
-./venv/bin/modal deploy src/finetune/api_endpoint_cpu.py
 ```
-**Get the endpoint URL:**
-```bash
-modal app list
 ```
----
-### Step 4: Test the API
 ```bash
-# Example API call
-curl -X POST https://YOUR-MODAL-URL/ask \
-  -H "Content-Type: application/json" \
-  -d '{
-    "question": "What is the population of Tokyo?",
-    "context": "Japan Census data"
-  }'
 ```
 ---
-## 📁 Key Files
-### Data Processing
-- `src/finetune/prepare_finetune_data.py` - Generates dataset from CSV files
-- `docs/clean_sample.py` - Local testing script for data cleaning
-### Model Training
-- `src/finetune/finetune_modal.py` - Fine-tuning script (H200 GPU)
-- `src/finetune/eval_finetuned.py` - Evaluation script
-### API Deployment
-- `src/finetune/api_endpoint.py` - GPU inference endpoint (A10G)
-- `src/finetune/api_endpoint_cpu.py` - CPU inference endpoint (when created)
-### Documentation
-- `diagrams/finetuning.svg` - Visual pipeline diagram
-- `finetune/04-evaluation.md` - Evaluation results
 ---
-## 🔧 Modal Volumes
-The pipeline uses these Modal volumes:
-| Volume | Purpose | Size |
-|--------|---------|------|
-| `census-data` | Raw census CSV files | 6,838 files |
-| `economy-labor-data` | Raw economy CSV files | 50 files |
-| `finetune-dataset` | Generated JSONL training data | 224K samples |
-| `model-checkpoints` | Fine-tuned model weights | ~7GB |
----
-## 💡 Tips
-### If Training Fails
 ```bash
-# Check logs for errors
-modal app logs mcp-hack::finetune-phi3-modal
-# Restart training
-./venv/bin/modal run --detach docs/finetune_modal.py
 ```
-### If You Need to Regenerate Data
 ```bash
-# Regenerate with new logic
-./venv/bin/modal run --detach src/finetune/prepare_finetune_data.py
 ```
-### View Volume Contents
 ```bash
-# List files in a volume
-modal volume ls finetune-dataset
-# Download a file
-modal volume get finetune-dataset train.jsonl finetune/train.jsonl
 ```
 ---
-## 📈 Expected Timeline
-| Step | Duration | Notes |
-|------|----------|-------|
-| Data Generation | ✅ Complete | 224K samples ready |
-| Fine-Tuning | ~90-120 min | H200 GPU |
-| Evaluation | ~5 min | Quick tests |
-| API Deployment | ~2 min | Instant after deploy |
 ---
-## 🎯 Next Steps
-1. **Run fine-tuning** (see Step 1 above)
-2. **Wait for completion** (~2 hours)
-3. **Evaluate results** (see Step 2)
-4. **Deploy API** (see Step 3)
-5. **Test with real queries** (see Step 4)
----
-## 📞 Troubleshooting
-**Issue**: "Volume not found"
 ```bash
-# List all volumes
-modal volume list
 ```
-**Issue**: "Out of memory during training"
-- Reduce `per_device_train_batch_size` in `src/finetune/finetune_modal.py`
-- Current: 2 (already optimized for H200)
-**Issue**: "Model not loading in API"
-- Ensure fine-tuning completed successfully
-- Check `model-checkpoints` volume has files
----
-## ✅ Success Criteria
-After completing all steps, you should have:
-- ✅ Fine-tuned Phi-3-mini model
-- ✅ Deployed API endpoint
-- ✅ Model answering questions about Japanese census/economy data
-- ✅ Improved accuracy over base model
----
-**Ready to start?** Run the fine-tuning command from Step 1!

+# 🚀 How to Run the AI Development Agent
+This guide provides sequential instructions to set up and run all components of the AI Development Agent: the **MCP Server** (Backend/Integration Hub) and the **Web Dashboard** (Frontend).
 ---
+## 📋 Prerequisites
+- **Python 3.10+** (Recommended: 3.11 or 3.12)
+  - *Note: Python 3.13 requires a specific fix for Gradio (included in instructions).*
+- **JIRA Account** (for real integration)
+- **Git**
 ---
+## 🛠️ Step 1: Setup & Run MCP Server
+The MCP Server is the core "brain" that handles RAG, Fine-tuning queries, and JIRA integration.
+### 1. Navigate to directory
 ```bash
+cd mcp
 ```
+### 2. Create Virtual Environment
 ```bash
+python3 -m venv venv
+source venv/bin/activate
 ```
+### 3. Install Dependencies
 ```bash
+pip install -r requirements.txt
+# ⚠️ Python 3.13 Fix: If you are using Python 3.13, run this extra command:
+pip install audioop-lts
 ```
+### 4. Configure Environment
+Create a `.env` file in the `mcp/` directory:
 ```bash
+touch .env
 ```
+Add your credentials to `.env`:
+```env
+# JIRA Configuration
+JIRA_URL="https://your-domain.atlassian.net"
+JIRA_EMAIL="your-email@example.com"
+JIRA_API_TOKEN="your-api-token"
+JIRA_PROJECT_KEY="PROJ"
+# RAG Configuration
+RAG_ENABLED="true"
+# URL from Step 1.5 below
+RAG_API_URL="https://your-modal-url.modal.run"
 ```
+### 5. Start the Server
 ```bash
+python mcp_server.py
 ```
+✅ **Success**: You should see `Running on local URL: http://0.0.0.0:7860`
 ---
+## 🚀 Step 1.5: Deploy RAG System (Optional)
+To enable real RAG capabilities instead of mock data, deploy the RAG system on Modal.
+### 1. Deploy the RAG App
+```bash
+cd ..  # Go back to root if in mcp/
+./venv/bin/modal deploy src/rag/modal-rag-product-design.py
+```
+### 2. Get the URL
+After deployment, you will see a URL ending in `...-api-query.modal.run`.
+Copy this URL and add it to your `mcp/.env` file as `RAG_API_URL`.
 ---
+## 🖥️ Step 2: Setup & Run Dashboard
+The Dashboard is the user interface where you interact with the agent.
+### 1. Open a new terminal and navigate
 ```bash
+cd dashboard
 ```
+### 2. Create Virtual Environment
 ```bash
+python3 -m venv venv
+source venv/bin/activate
 ```
+### 3. Install Dependencies
 ```bash
+pip install -r requirements.txt
+```
+### 4. Start the Dashboard
+```bash
+python server.py
 ```
+✅ **Success**: You should see `Uvicorn running on http://0.0.0.0:8000`
 ---
+## 🌐 Step 3: Access the Application
+1. Open your browser to **http://localhost:8000**
+2. Enter a requirement (e.g., "Create a login page with 2FA")
+3. Watch the agent analyze, query RAG, and create JIRA epics/stories!
 ---
+## 🧠 Advanced: Fine-Tuning Pipeline
+If you want to train your own domain-specific model, follow these steps.
+### Dataset Generation Results (Reference)
+- **Training Samples**: 201,651
+- **Validation Samples**: 22,407
+- **Total Dataset**: 224,058 high-quality QA pairs
+### Step 1: Fine-Tune the Model
+Run the fine-tuning job on Modal with H200 GPU:
 ```bash
+cd /Users/veeru/agents/mcp-hack
+./venv/bin/modal run --detach src/finetune/finetune_modal.py
 ```
+### Step 2: Evaluate the Model
+After training completes, test the model:
+```bash
+./venv/bin/modal run src/finetune/eval_finetuned.py
+```
+### Step 3: Deploy Inference API
+**Option B: High-Performance vLLM Endpoint (Recommended)**
+1. **Merge Model**:
+   ```bash
+   ./venv/bin/modal run src/finetune/merge_model.py
+   ```
+2. **Deploy vLLM Endpoint**:
+   ```bash
+   ./venv/bin/modal deploy src/finetune/api_endpoint_vllm.py
+   ```
+### Step 4: Test the API
+```bash
+curl -X POST https://YOUR-MODAL-URL/ask \
+  -H "Content-Type: application/json" \
+  -d '{
+    "question": "What is the population of Tokyo?",
+    "context": "Japan Census data"
+  }'
+```
+### Troubleshooting Fine-Tuning
+- **Logs**: `modal app logs mcp-hack::finetune-phi3-modal`
+- **Volumes**: `modal volume list`

mcp/mcp_server.py CHANGED Viewed

@@ -69,62 +69,93 @@ def use_real_jira() -> bool:
 # ===== RAG Functions =====
 def query_rag(requirement: str) -> Dict:
     """
-    Query RAG system for relevant context and generate product specification.
-    Args:
-        requirement: User's requirement text
-    Returns:
-        Dict with specification, context, and recommendations
     """
-    print(f"[RAG] Querying with requirement: {requirement[:100]}...")
-    if config.RAG_ENABLED:
-        # TODO: Implement real RAG query with ChromaDB/Pinecone
-        # from langchain.vectorstores import Chroma
-        # vectordb = Chroma(persist_directory=config.VECTOR_DB_PATH)
-        # results = vectordb.similarity_search(requirement, k=5)
-        pass
-    # Mock RAG response
-    specification = {
-        "title": "Generated Product Specification",
-        "summary": f"Product specification for: {requirement[:100]}",
         "features": [
-            "Core functionality implementation",
-            "User interface components",
-            "API endpoints and integration",
-            "Database schema design",
-            "Security and authentication"
         ],
         "technical_requirements": [
-            "Backend: Python/FastAPI or Node.js/Express",
-            "Frontend: React or Vue.js",
-            "Database: PostgreSQL or MongoDB",
-            "Authentication: JWT tokens",
-            "Deployment: Docker containers"
         ],
         "acceptance_criteria": [
-            "All core features implemented and tested",
-            "API documentation complete",
-            "Unit test coverage > 80%",
-            "Security audit passed",
-            "Performance benchmarks met"
-        ],
-        "dependencies": [
-            "User authentication system",
-            "Database migration tools",
-            "CI/CD pipeline setup"
         ],
-        "estimated_effort": "2-3 sprints",
-        "context_retrieved": 5,
-        "confidence_score": 0.85
     }
     return {
         "status": "success",
-        "specification": specification,
-        "source": "mock_rag" if not config.RAG_ENABLED else "real_rag",
         "timestamp": datetime.now().isoformat()
     }

 # ===== RAG Functions =====
 def query_rag(requirement: str) -> Dict:
     """
+    Query the RAG system for product specifications based on the requirement.
     """
+    print(f"[RAG] Querying with requirement: {requirement[:50]}...")
+    if config.RAG_ENABLED and config.RAG_API_URL:
+        try:
+            import requests
+            print(f"[RAG] Calling remote endpoint: {config.RAG_API_URL}")
+            response = requests.post(
+                config.RAG_API_URL,
+                json={"question": requirement, "top_k": 5},
+                headers={"Content-Type": "application/json"},
+                timeout=60
+            )
+            if response.ok:
+                result = response.json()
+                answer = result.get("answer", "")
+                sources = result.get("sources", [])
+                # Parse the answer to extract structured fields if possible
+                # For now, we'll wrap the answer in our standard structure
+                return {
+                    "status": "success",
+                    "specification": {
+                        "title": "Product Specification (RAG Generated)",
+                        "summary": answer[:200] + "...",
+                        "features": [line.strip('- ') for line in answer.split('\n') if line.strip().startswith('-')],
+                        "technical_requirements": ["Derived from product design docs"],
+                        "acceptance_criteria": ["See detailed RAG answer"],
+                        "estimated_effort": "TBD",
+                        "full_answer": answer,
+                        "context_retrieved": len(sources)
+                    },
+                    "source": "real_rag",
+                    "timestamp": datetime.now().isoformat()
+                }
+            else:
+                print(f"[RAG] Error: {response.status_code} - {response.text}")
+        except Exception as e:
+            print(f"[RAG] Exception: {e}")
+    # Mock response fallback
+    print("[RAG] Using mock response")
+    # Simulate processing time
+    # time.sleep(1)
+    # Simple keyword matching for mock data
+    req_lower = requirement.lower()
+    spec = {
+        "title": "Auto Insurance Product Spec",
+        "summary": "Specification based on Tokyo market requirements.",
         "features": [
+            "User registration and login",
+            "Policy selection interface",
+            "Premium calculation engine"
         ],
         "technical_requirements": [
+            "Secure database for user data",
+            "Integration with payment gateway",
+            "Responsive web design"
         ],
         "acceptance_criteria": [
+            "User can create an account",
+            "User can view policy details",
+            "Premium is calculated correctly"
         ],
+        "estimated_effort": "2 weeks"
     }
+    if "mobile" in req_lower or "app" in req_lower:
+        spec["title"] = "Mobile App Specification"
+        spec["features"].append("Push notifications")
+        spec["technical_requirements"].append("iOS and Android support")
+    if "ai" in req_lower or "agent" in req_lower:
+        spec["title"] = "AI Agent Integration Spec"
+        spec["features"].append("Chat interface")
+        spec["technical_requirements"].append("LLM integration")
     return {
         "status": "success",
+        "specification": spec,
+        "source": "mock_rag",
         "timestamp": datetime.now().isoformat()
     }

src/rag/modal-rag-product-design.py CHANGED Viewed

@@ -528,3 +528,17 @@ def query_product_design(question: str = "What are the three product tiers and t
             print(f"\n{i}. {source['metadata'].get('source', 'Unknown')}")
             print(f"   {source['content'][:200]}...")

             print(f"\n{i}. {source['metadata'].get('source', 'Unknown')}")
             print(f"   {source['content'][:200]}...")
+# Define data model for API
+from pydantic import BaseModel
+class RAGQuery(BaseModel):
+    question: str
+    top_k: int = 5
+@app.function(image=image)
+@modal.web_endpoint(method="POST")
+def api_query(item: RAGQuery):
+    """Expose RAG query as a web endpoint"""
+    model = ProductDesignRAG()
+    return model.query.remote(item.question, item.top_k)