Veeru-c commited on
Commit
fc9883e
Β·
1 Parent(s): 4b59886

initial commit

Browse files
docs/HOW_TO_RUN.md CHANGED
@@ -1,229 +1,165 @@
1
- # How to Run the Fine-Tuning Pipeline
2
 
3
- This guide walks you through the complete pipeline from data generation to model deployment.
4
 
5
  ---
6
 
7
- ## πŸ“Š Dataset Generation Results
8
 
9
- ### Final Statistics
10
- - **Training Samples**: 201,651
11
- - **Validation Samples**: 22,407
12
- - **Total Dataset**: 224,058 high-quality QA pairs
13
- - **Improvement**: 150x more data than previous approach
14
-
15
- ### Batch Performance
16
- | Batch | Files | Data Points | Status |
17
- |-------|-------|-------------|--------|
18
- | 1 | 1,000 | 100,611 | βœ… Excellent |
19
- | 2 | 1,000 | 39,960 | βœ… Good |
20
- | 3 | 1,000 | 0 | ⚠️ Complex files |
21
- | 4 | 1,000 | 600 | ⚠️ Runner issue |
22
- | 5 | 1,000 | 54,627 | βœ… Excellent |
23
- | 6 | 1,000 | 5,400 | βœ… Good |
24
- | 7 | 888 | 22,860 | βœ… Good |
25
 
26
  ---
27
 
28
- ## πŸš€ Step-by-Step Instructions
29
 
30
- ### Step 1: Fine-Tune the Model
31
-
32
- Run the fine-tuning job on Modal with H200 GPU:
33
 
 
34
  ```bash
35
- cd /Users/veeru/agents/mcp-hack
36
-
37
- # Start fine-tuning in detached mode
38
- ./venv/bin/modal run --detach src/finetune/finetune_modal.py
39
  ```
40
 
41
- **What happens:**
42
- - Loads 201,651 training samples from `finetune-dataset` volume
43
- - Trains Phi-3-mini-4k-instruct with LoRA on H200 GPU
44
- - Runs for ~90-120 minutes
45
- - Saves model to `model-checkpoints` volume
46
-
47
- **Monitor progress:**
48
  ```bash
49
- # View live logs
50
- modal app logs mcp-hack::finetune-phi3-modal
51
  ```
52
 
53
- ---
54
-
55
- ### Step 2: Evaluate the Model
56
-
57
- After training completes, test the model:
58
-
59
  ```bash
60
- ./venv/bin/modal run src/finetune/eval_finetuned.py
61
- ```
62
-
63
- This will run sample questions and show the model's answers.
64
-
65
- ---
66
-
67
- ### Step 3: Deploy API Endpoint
68
-
69
- ### 4. Deploy Inference API
70
 
71
- You have three options for deployment. For production use with low latency (<3s), **Option B** is recommended.
72
-
73
- **Option A: Standard GPU Endpoint (A10G)**
74
- Good for testing, uses standard Transformers library.
75
- ```bash
76
- ./venv/bin/modal deploy src/finetune/api_endpoint.py
77
  ```
78
 
79
- **Option B: High-Performance vLLM Endpoint (Recommended)**
80
- Uses vLLM for <3s latency. Requires model merging first.
81
-
82
- 1. **Merge Model**: Convert LoRA adapter to full model
83
- ```bash
84
- ./venv/bin/modal run src/finetune/merge_model.py
85
- ```
86
-
87
- 2. **Deploy vLLM Endpoint**:
88
- ```bash
89
- ./venv/bin/modal deploy src/finetune/api_endpoint_vllm.py
90
- ```
91
-
92
- **Option C: CPU Endpoint**
93
- Slowest, but cheapest. Good for debugging without GPU quota.
94
  ```bash
95
- ./venv/bin/modal deploy src/finetune/api_endpoint_cpu.py
96
  ```
97
 
98
- **Get the endpoint URL:**
99
- ```bash
100
- modal app list
 
 
 
 
 
 
 
 
 
101
  ```
102
 
103
- ---
104
-
105
- ### Step 4: Test the API
106
-
107
  ```bash
108
- # Example API call
109
- curl -X POST https://YOUR-MODAL-URL/ask \
110
- -H "Content-Type: application/json" \
111
- -d '{
112
- "question": "What is the population of Tokyo?",
113
- "context": "Japan Census data"
114
- }'
115
  ```
 
116
 
117
  ---
118
 
119
- ## πŸ“ Key Files
120
-
121
- ### Data Processing
122
- - `src/finetune/prepare_finetune_data.py` - Generates dataset from CSV files
123
- - `docs/clean_sample.py` - Local testing script for data cleaning
124
 
125
- ### Model Training
126
- - `src/finetune/finetune_modal.py` - Fine-tuning script (H200 GPU)
127
- - `src/finetune/eval_finetuned.py` - Evaluation script
128
 
129
- ### API Deployment
130
- - `src/finetune/api_endpoint.py` - GPU inference endpoint (A10G)
131
- - `src/finetune/api_endpoint_cpu.py` - CPU inference endpoint (when created)
 
 
132
 
133
- ### Documentation
134
- - `diagrams/finetuning.svg` - Visual pipeline diagram
135
- - `finetune/04-evaluation.md` - Evaluation results
136
 
137
  ---
138
 
139
- ## πŸ”§ Modal Volumes
140
-
141
- The pipeline uses these Modal volumes:
142
-
143
- | Volume | Purpose | Size |
144
- |--------|---------|------|
145
- | `census-data` | Raw census CSV files | 6,838 files |
146
- | `economy-labor-data` | Raw economy CSV files | 50 files |
147
- | `finetune-dataset` | Generated JSONL training data | 224K samples |
148
- | `model-checkpoints` | Fine-tuned model weights | ~7GB |
149
 
150
- ---
151
-
152
- ## πŸ’‘ Tips
153
 
154
- ### If Training Fails
155
  ```bash
156
- # Check logs for errors
157
- modal app logs mcp-hack::finetune-phi3-modal
158
-
159
- # Restart training
160
- ./venv/bin/modal run --detach docs/finetune_modal.py
161
  ```
162
 
163
- ### If You Need to Regenerate Data
164
  ```bash
165
- # Regenerate with new logic
166
- ./venv/bin/modal run --detach src/finetune/prepare_finetune_data.py
167
  ```
168
 
169
- ### View Volume Contents
170
  ```bash
171
- # List files in a volume
172
- modal volume ls finetune-dataset
173
 
174
- # Download a file
175
- modal volume get finetune-dataset train.jsonl finetune/train.jsonl
 
176
  ```
 
177
 
178
  ---
179
 
180
- ## πŸ“ˆ Expected Timeline
181
 
182
- | Step | Duration | Notes |
183
- |------|----------|-------|
184
- | Data Generation | βœ… Complete | 224K samples ready |
185
- | Fine-Tuning | ~90-120 min | H200 GPU |
186
- | Evaluation | ~5 min | Quick tests |
187
- | API Deployment | ~2 min | Instant after deploy |
188
 
189
  ---
190
 
191
- ## 🎯 Next Steps
192
 
193
- 1. **Run fine-tuning** (see Step 1 above)
194
- 2. **Wait for completion** (~2 hours)
195
- 3. **Evaluate results** (see Step 2)
196
- 4. **Deploy API** (see Step 3)
197
- 5. **Test with real queries** (see Step 4)
198
 
199
- ---
200
-
201
- ## πŸ“ž Troubleshooting
 
202
 
203
- **Issue**: "Volume not found"
 
204
  ```bash
205
- # List all volumes
206
- modal volume list
207
  ```
208
 
209
- **Issue**: "Out of memory during training"
210
- - Reduce `per_device_train_batch_size` in `src/finetune/finetune_modal.py`
211
- - Current: 2 (already optimized for H200)
212
-
213
- **Issue**: "Model not loading in API"
214
- - Ensure fine-tuning completed successfully
215
- - Check `model-checkpoints` volume has files
216
-
217
- ---
218
 
219
- ## βœ… Success Criteria
 
 
 
 
 
 
 
 
 
220
 
221
- After completing all steps, you should have:
222
- - βœ… Fine-tuned Phi-3-mini model
223
- - βœ… Deployed API endpoint
224
- - βœ… Model answering questions about Japanese census/economy data
225
- - βœ… Improved accuracy over base model
 
 
 
 
226
 
227
- ---
 
 
228
 
229
- **Ready to start?** Run the fine-tuning command from Step 1!
 
1
+ # πŸš€ How to Run the AI Development Agent
2
 
3
+ This guide provides sequential instructions to set up and run all components of the AI Development Agent: the **MCP Server** (Backend/Integration Hub) and the **Web Dashboard** (Frontend).
4
 
5
  ---
6
 
7
+ ## πŸ“‹ Prerequisites
8
 
9
+ - **Python 3.10+** (Recommended: 3.11 or 3.12)
10
+ - *Note: Python 3.13 requires a specific fix for Gradio (included in instructions).*
11
+ - **JIRA Account** (for real integration)
12
+ - **Git**
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  ---
15
 
16
+ ## πŸ› οΈ Step 1: Setup & Run MCP Server
17
 
18
+ The MCP Server is the core "brain" that handles RAG, Fine-tuning queries, and JIRA integration.
 
 
19
 
20
+ ### 1. Navigate to directory
21
  ```bash
22
+ cd mcp
 
 
 
23
  ```
24
 
25
+ ### 2. Create Virtual Environment
 
 
 
 
 
 
26
  ```bash
27
+ python3 -m venv venv
28
+ source venv/bin/activate
29
  ```
30
 
31
+ ### 3. Install Dependencies
 
 
 
 
 
32
  ```bash
33
+ pip install -r requirements.txt
 
 
 
 
 
 
 
 
 
34
 
35
+ # ⚠️ Python 3.13 Fix: If you are using Python 3.13, run this extra command:
36
+ pip install audioop-lts
 
 
 
 
37
  ```
38
 
39
+ ### 4. Configure Environment
40
+ Create a `.env` file in the `mcp/` directory:
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  ```bash
42
+ touch .env
43
  ```
44
 
45
+ Add your credentials to `.env`:
46
+ ```env
47
+ # JIRA Configuration
48
+ JIRA_URL="https://your-domain.atlassian.net"
49
+ JIRA_EMAIL="your-email@example.com"
50
+ JIRA_API_TOKEN="your-api-token"
51
+ JIRA_PROJECT_KEY="PROJ"
52
+
53
+ # RAG Configuration
54
+ RAG_ENABLED="true"
55
+ # URL from Step 1.5 below
56
+ RAG_API_URL="https://your-modal-url.modal.run"
57
  ```
58
 
59
+ ### 5. Start the Server
 
 
 
60
  ```bash
61
+ python mcp_server.py
 
 
 
 
 
 
62
  ```
63
+ βœ… **Success**: You should see `Running on local URL: http://0.0.0.0:7860`
64
 
65
  ---
66
 
67
+ ## πŸš€ Step 1.5: Deploy RAG System (Optional)
 
 
 
 
68
 
69
+ To enable real RAG capabilities instead of mock data, deploy the RAG system on Modal.
 
 
70
 
71
+ ### 1. Deploy the RAG App
72
+ ```bash
73
+ cd .. # Go back to root if in mcp/
74
+ ./venv/bin/modal deploy src/rag/modal-rag-product-design.py
75
+ ```
76
 
77
+ ### 2. Get the URL
78
+ After deployment, you will see a URL ending in `...-api-query.modal.run`.
79
+ Copy this URL and add it to your `mcp/.env` file as `RAG_API_URL`.
80
 
81
  ---
82
 
83
+ ## πŸ–₯️ Step 2: Setup & Run Dashboard
 
 
 
 
 
 
 
 
 
84
 
85
+ The Dashboard is the user interface where you interact with the agent.
 
 
86
 
87
+ ### 1. Open a new terminal and navigate
88
  ```bash
89
+ cd dashboard
 
 
 
 
90
  ```
91
 
92
+ ### 2. Create Virtual Environment
93
  ```bash
94
+ python3 -m venv venv
95
+ source venv/bin/activate
96
  ```
97
 
98
+ ### 3. Install Dependencies
99
  ```bash
100
+ pip install -r requirements.txt
101
+ ```
102
 
103
+ ### 4. Start the Dashboard
104
+ ```bash
105
+ python server.py
106
  ```
107
+ βœ… **Success**: You should see `Uvicorn running on http://0.0.0.0:8000`
108
 
109
  ---
110
 
111
+ ## 🌐 Step 3: Access the Application
112
 
113
+ 1. Open your browser to **http://localhost:8000**
114
+ 2. Enter a requirement (e.g., "Create a login page with 2FA")
115
+ 3. Watch the agent analyze, query RAG, and create JIRA epics/stories!
 
 
 
116
 
117
  ---
118
 
119
+ ## 🧠 Advanced: Fine-Tuning Pipeline
120
 
121
+ If you want to train your own domain-specific model, follow these steps.
 
 
 
 
122
 
123
+ ### Dataset Generation Results (Reference)
124
+ - **Training Samples**: 201,651
125
+ - **Validation Samples**: 22,407
126
+ - **Total Dataset**: 224,058 high-quality QA pairs
127
 
128
+ ### Step 1: Fine-Tune the Model
129
+ Run the fine-tuning job on Modal with H200 GPU:
130
  ```bash
131
+ cd /Users/veeru/agents/mcp-hack
132
+ ./venv/bin/modal run --detach src/finetune/finetune_modal.py
133
  ```
134
 
135
+ ### Step 2: Evaluate the Model
136
+ After training completes, test the model:
137
+ ```bash
138
+ ./venv/bin/modal run src/finetune/eval_finetuned.py
139
+ ```
 
 
 
 
140
 
141
+ ### Step 3: Deploy Inference API
142
+ **Option B: High-Performance vLLM Endpoint (Recommended)**
143
+ 1. **Merge Model**:
144
+ ```bash
145
+ ./venv/bin/modal run src/finetune/merge_model.py
146
+ ```
147
+ 2. **Deploy vLLM Endpoint**:
148
+ ```bash
149
+ ./venv/bin/modal deploy src/finetune/api_endpoint_vllm.py
150
+ ```
151
 
152
+ ### Step 4: Test the API
153
+ ```bash
154
+ curl -X POST https://YOUR-MODAL-URL/ask \
155
+ -H "Content-Type: application/json" \
156
+ -d '{
157
+ "question": "What is the population of Tokyo?",
158
+ "context": "Japan Census data"
159
+ }'
160
+ ```
161
 
162
+ ### Troubleshooting Fine-Tuning
163
+ - **Logs**: `modal app logs mcp-hack::finetune-phi3-modal`
164
+ - **Volumes**: `modal volume list`
165
 
 
mcp/mcp_server.py CHANGED
@@ -69,62 +69,93 @@ def use_real_jira() -> bool:
69
  # ===== RAG Functions =====
70
  def query_rag(requirement: str) -> Dict:
71
  """
72
- Query RAG system for relevant context and generate product specification.
73
-
74
- Args:
75
- requirement: User's requirement text
76
-
77
- Returns:
78
- Dict with specification, context, and recommendations
79
  """
80
- print(f"[RAG] Querying with requirement: {requirement[:100]}...")
81
 
82
- if config.RAG_ENABLED:
83
- # TODO: Implement real RAG query with ChromaDB/Pinecone
84
- # from langchain.vectorstores import Chroma
85
- # vectordb = Chroma(persist_directory=config.VECTOR_DB_PATH)
86
- # results = vectordb.similarity_search(requirement, k=5)
87
- pass
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
 
89
- # Mock RAG response
90
- specification = {
91
- "title": "Generated Product Specification",
92
- "summary": f"Product specification for: {requirement[:100]}",
 
 
 
 
 
93
  "features": [
94
- "Core functionality implementation",
95
- "User interface components",
96
- "API endpoints and integration",
97
- "Database schema design",
98
- "Security and authentication"
99
  ],
100
  "technical_requirements": [
101
- "Backend: Python/FastAPI or Node.js/Express",
102
- "Frontend: React or Vue.js",
103
- "Database: PostgreSQL or MongoDB",
104
- "Authentication: JWT tokens",
105
- "Deployment: Docker containers"
106
  ],
107
  "acceptance_criteria": [
108
- "All core features implemented and tested",
109
- "API documentation complete",
110
- "Unit test coverage > 80%",
111
- "Security audit passed",
112
- "Performance benchmarks met"
113
- ],
114
- "dependencies": [
115
- "User authentication system",
116
- "Database migration tools",
117
- "CI/CD pipeline setup"
118
  ],
119
- "estimated_effort": "2-3 sprints",
120
- "context_retrieved": 5,
121
- "confidence_score": 0.85
122
  }
123
 
 
 
 
 
 
 
 
 
 
 
124
  return {
125
  "status": "success",
126
- "specification": specification,
127
- "source": "mock_rag" if not config.RAG_ENABLED else "real_rag",
128
  "timestamp": datetime.now().isoformat()
129
  }
130
 
 
69
  # ===== RAG Functions =====
70
  def query_rag(requirement: str) -> Dict:
71
  """
72
+ Query the RAG system for product specifications based on the requirement.
 
 
 
 
 
 
73
  """
74
+ print(f"[RAG] Querying with requirement: {requirement[:50]}...")
75
 
76
+ if config.RAG_ENABLED and config.RAG_API_URL:
77
+ try:
78
+ import requests
79
+ print(f"[RAG] Calling remote endpoint: {config.RAG_API_URL}")
80
+
81
+ response = requests.post(
82
+ config.RAG_API_URL,
83
+ json={"question": requirement, "top_k": 5},
84
+ headers={"Content-Type": "application/json"},
85
+ timeout=60
86
+ )
87
+
88
+ if response.ok:
89
+ result = response.json()
90
+ answer = result.get("answer", "")
91
+ sources = result.get("sources", [])
92
+
93
+ # Parse the answer to extract structured fields if possible
94
+ # For now, we'll wrap the answer in our standard structure
95
+ return {
96
+ "status": "success",
97
+ "specification": {
98
+ "title": "Product Specification (RAG Generated)",
99
+ "summary": answer[:200] + "...",
100
+ "features": [line.strip('- ') for line in answer.split('\n') if line.strip().startswith('-')],
101
+ "technical_requirements": ["Derived from product design docs"],
102
+ "acceptance_criteria": ["See detailed RAG answer"],
103
+ "estimated_effort": "TBD",
104
+ "full_answer": answer,
105
+ "context_retrieved": len(sources)
106
+ },
107
+ "source": "real_rag",
108
+ "timestamp": datetime.now().isoformat()
109
+ }
110
+ else:
111
+ print(f"[RAG] Error: {response.status_code} - {response.text}")
112
+ except Exception as e:
113
+ print(f"[RAG] Exception: {e}")
114
+
115
+ # Mock response fallback
116
+ print("[RAG] Using mock response")
117
 
118
+ # Simulate processing time
119
+ # time.sleep(1)
120
+
121
+ # Simple keyword matching for mock data
122
+ req_lower = requirement.lower()
123
+
124
+ spec = {
125
+ "title": "Auto Insurance Product Spec",
126
+ "summary": "Specification based on Tokyo market requirements.",
127
  "features": [
128
+ "User registration and login",
129
+ "Policy selection interface",
130
+ "Premium calculation engine"
 
 
131
  ],
132
  "technical_requirements": [
133
+ "Secure database for user data",
134
+ "Integration with payment gateway",
135
+ "Responsive web design"
 
 
136
  ],
137
  "acceptance_criteria": [
138
+ "User can create an account",
139
+ "User can view policy details",
140
+ "Premium is calculated correctly"
 
 
 
 
 
 
 
141
  ],
142
+ "estimated_effort": "2 weeks"
 
 
143
  }
144
 
145
+ if "mobile" in req_lower or "app" in req_lower:
146
+ spec["title"] = "Mobile App Specification"
147
+ spec["features"].append("Push notifications")
148
+ spec["technical_requirements"].append("iOS and Android support")
149
+
150
+ if "ai" in req_lower or "agent" in req_lower:
151
+ spec["title"] = "AI Agent Integration Spec"
152
+ spec["features"].append("Chat interface")
153
+ spec["technical_requirements"].append("LLM integration")
154
+
155
  return {
156
  "status": "success",
157
+ "specification": spec,
158
+ "source": "mock_rag",
159
  "timestamp": datetime.now().isoformat()
160
  }
161
 
src/rag/modal-rag-product-design.py CHANGED
@@ -528,3 +528,17 @@ def query_product_design(question: str = "What are the three product tiers and t
528
  print(f"\n{i}. {source['metadata'].get('source', 'Unknown')}")
529
  print(f" {source['content'][:200]}...")
530
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
528
  print(f"\n{i}. {source['metadata'].get('source', 'Unknown')}")
529
  print(f" {source['content'][:200]}...")
530
 
531
+ # Define data model for API
532
+ from pydantic import BaseModel
533
+
534
+ class RAGQuery(BaseModel):
535
+ question: str
536
+ top_k: int = 5
537
+
538
+ @app.function(image=image)
539
+ @modal.web_endpoint(method="POST")
540
+ def api_query(item: RAGQuery):
541
+ """Expose RAG query as a web endpoint"""
542
+ model = ProductDesignRAG()
543
+ return model.query.remote(item.question, item.top_k)
544
+