a-zamfir commited on
Commit
f26de06
·
1 Parent(s): d3bbc38

initial atlas commit

Browse files
.env ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ # Auto-select best provider
2
+ LLM_PROVIDER=huggingface
3
+ AUDIO_PROVIDER=huggingface
4
+
5
+ # Hugging Face and Nebius setup: add your keys here.
.gitignore ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ __pycache__
2
+ *.pyc
3
+ temp
4
+ .venv
5
+ auth_api
6
+ auth_api.py
README.md CHANGED
@@ -8,7 +8,138 @@ sdk_version: 6.0.1
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
- short_description: Atlas is a general usage assistant
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
+ short_description: ATLAS - Gradio x HuggingFace Hackathon
12
  ---
13
 
14
+ # ATLAS
15
+
16
+ ## Important
17
+ 1. **Watch** ATLAS' video overview here: [Youtube](https://youtu.be/-nn9mkU5jqk)]
18
+ 2. **ATLAS works entirely through mock MCP tools** - no external dependencies required. Just clone and run.
19
+
20
+ ## Overview
21
+
22
+ ATLAS is a multimodal AI work companion built for the Gradio x MCP Hackathon. It demonstrates how a voice-driven assistant can augment knowledge work by:
23
+
24
+ - **Listening** to your requests through voice (STT)
25
+ - **Speaking** responses and updates (TTS)
26
+ - **Seeing** your screen to understand context (vision)
27
+ - **Acting** on your behalf through MCP tool integrations
28
+
29
+ The goal is to showcase how modern LLMs can be integrated into daily workflows to handle context retrieval, document analysis, and environment automation, all through natural conversation.
30
+
31
+ ## Key Goals
32
+
33
+ 1. **Multimodal Work Companion**
34
+ - Voice: hands-free interaction during calls/meetings
35
+ - Vision: screen analysis for real-time context
36
+ - Text: conversational interface with persistent context
37
+
38
+ 2. **Practical Automation**
39
+ - Email context absorption
40
+ - Customer data retrieval
41
+ - Document lookup and analysis
42
+ - Environment automation (API permissions, integrations)
43
+
44
+ 3. **Proof-of-Concept (POC)**
45
+ - Simple RAG without database infrastructure
46
+ - Mock MCP tools for easy setup
47
+ - Adaptable to any office workflow
48
+
49
+ ## Functionalities & Offerings
50
+
51
+ ### 1. Audio Service
52
+ - **STT**: Converts voice input to text for hands-free operation
53
+ - **TTS**: Speaks AI responses for natural conversation flow
54
+
55
+ ### 2. Text (LLM) Service
56
+ - Built on modern LLM APIs
57
+ - Handles multi-turn conversation with context retention
58
+ - Tool-calling orchestration for MCP integration
59
+ - Dynamic prompt engineering for context-aware responses
60
+
61
+ ### 3. Vision Service
62
+ - Screen capture analysis for understanding user context
63
+ - Document reading and interpretation
64
+ - Visual feedback integration into conversation flow
65
+
66
+ ### 4. MCP Integration
67
+ - **Customer Data Tools**: Retrieve CRM information on demand
68
+ - **Document Retrieval**: Simple RAG implementation without database
69
+ - **Environment Automation**: API permission management, integration testing
70
+ - **Email Processing**: Context absorption and response generation
71
+
72
+ ## Demo Scenario
73
+
74
+ The hackathon demo showcases a realistic CSM/sales rep workflow:
75
+
76
+ 1. **Email arrives** → ATLAS reads and absorbs context using vision
77
+ 2. **Customer data needed** → Retrieves from mock CRM
78
+ 3. **Documents requested** → Pulls relevant customer files
79
+ 4. **API call fails (401)** → User encounters auth error in Postman
80
+ 5. **ATLAS fixes it** → Updates access permissions automatically
81
+ 6. **Verification** → API call succeeds
82
+ 7. **Response draft** → Generates email reply based on full context
83
+
84
+ All through natural voice conversation.
85
+
86
+ ## Tech Stack
87
+
88
+ | Component | Technology |
89
+ |--------------------|--------------------------------------------------|
90
+ | UI Framework | Gradio 6 |
91
+ | LLM | HuggingFace/Nebius APIs |
92
+ | STT | Speech-to-text model: Whisper |
93
+ | TTS | Text-to-speech model: Kokoro |
94
+ | Vision | Vision language model: Gemma |
95
+ | Tool Integration | MCP (Model Context Protocol) |
96
+ | RAG | Simple document retrieval (no vector DB) |
97
+
98
+ ## Quickstart
99
+
100
+ 1. **Install dependencies**:
101
+ ```bash
102
+ pip install -r requirements.txt
103
+ ```
104
+
105
+ 2. **Configure** `.env` with your API keys.
106
+
107
+
108
+ 3. **Launch** the Gradio app:
109
+ ```bash
110
+ python app.py
111
+ ```
112
+
113
+ 4. **Interact** by voice or text:
114
+ - Click "Record" to begin voice interaction
115
+ - Ask ATLAS to retrieve customer data, or pull documents
116
+ - Share screen for visual context
117
+ - Request environment automations (API permissions, etc.)
118
+
119
+ ## Adaptability
120
+
121
+ While built for CSM/sales rep workflows, ATLAS adapts to any office role:
122
+
123
+ - **Support Engineers**: Ticket context + documentation retrieval + environment automation
124
+ - **Account Managers**: Client data + document analysis + meeting prep
125
+ - **Project Managers**: Task context + resource lookup + status updates
126
+ - **Developers**: API testing + documentation + environment management
127
+
128
+ Simply swap the MCP tools to match your workflow.
129
+
130
+ ## Architecture
131
+
132
+ ATLAS uses a simple but effective architecture:
133
+
134
+ 1. **Gradio UI** → User interaction layer (voice/text/vision)
135
+ 2. **LLM Core** → Reasoning and orchestration
136
+ 3. **MCP Tools** → Lightweight integrations (no heavy infra)
137
+ 4. **Simple RAG** → Document retrieval without vector databases
138
+
139
+ Focus on clarity and practical value over architectural complexity.
140
+
141
+ ## Contact
142
+
143
+ <a.zamfir@hotmail.com>
144
+ LinkedIn: Andrei Zamfir <https://www.linkedin.com/in/andrei-d-zamfir/>
145
+
app.py ADDED
@@ -0,0 +1,667 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Atlas - Minimal VAD version based on Gradio's official pattern
3
+ """
4
+
5
+ import gradio as gr
6
+ import asyncio
7
+ import logging
8
+ import tempfile
9
+ import numpy as np
10
+ import wave
11
+ import io
12
+ import time
13
+ import re
14
+ import ast
15
+ import json
16
+ import os
17
+ import sys
18
+ import atexit
19
+ import subprocess
20
+
21
+ from dataclasses import dataclass, field
22
+ from pathlib import Path
23
+ from typing import Optional, List, Dict, Tuple
24
+
25
+ from services.mcp_client import MCPClient
26
+ from services.audio_service import AudioService
27
+ from services.llm_service import LLMService
28
+ from services.screen_service import get_screen_service
29
+ from config.settings import Settings
30
+ from config.prompts import get_generic_prompt
31
+
32
+ from openai import OpenAI
33
+
34
+ logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
35
+ logger = logging.getLogger(__name__)
36
+
37
+
38
+ # ============================================
39
+ # App State (like Gradio's official example)
40
+ # ============================================
41
+
42
+ @dataclass
43
+ class AppState:
44
+ stream: Optional[np.ndarray] = None
45
+ sampling_rate: int = 0
46
+ pause_detected: bool = False
47
+ started_talking: bool = False
48
+ stopped: bool = False
49
+ conversation: List[Dict] = field(default_factory=list)
50
+
51
+
52
+ # ============================================
53
+ # VAD Helper
54
+ # ============================================
55
+
56
+ def detect_pause(audio: np.ndarray, sr: int, state: AppState) -> bool:
57
+ """Simple energy-based pause detection."""
58
+ if audio is None or len(audio) < sr * 0.3:
59
+ return False
60
+
61
+ # Look at last 0.5 seconds
62
+ window = int(sr * 0.5)
63
+ recent = audio[-window:] if len(audio) >= window else audio
64
+
65
+ # Energy
66
+ recent_float = recent.astype(np.float32)
67
+ if recent.dtype == np.int16:
68
+ recent_float = recent_float / 32768.0
69
+ energy = float(np.sqrt(np.mean(recent_float ** 2)))
70
+
71
+ SILENCE_THRESHOLD = 0.01
72
+
73
+ # If earlier was loud and now quiet = pause
74
+ if len(audio) > window * 2:
75
+ earlier = audio[:-window]
76
+ earlier_float = earlier.astype(np.float32)
77
+ if earlier.dtype == np.int16:
78
+ earlier_float = earlier_float / 32768.0
79
+ earlier_energy = float(np.sqrt(np.mean(earlier_float ** 2)))
80
+
81
+ if earlier_energy > SILENCE_THRESHOLD * 2 and energy < SILENCE_THRESHOLD:
82
+ logger.info(f"Pause: earlier={earlier_energy:.4f}, now={energy:.4f}")
83
+ return True
84
+
85
+ return False
86
+
87
+
88
+ def audio_to_wav_file(audio: np.ndarray, sr: int) -> str:
89
+ """Save audio to temp WAV file."""
90
+ audio_float = audio.astype(np.float32)
91
+ max_val = np.max(np.abs(audio_float))
92
+ if max_val > 0:
93
+ audio_float = audio_float / max_val
94
+ audio_int = (audio_float * 32767).astype(np.int16)
95
+
96
+ tmp = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")
97
+ with wave.open(tmp.name, 'wb') as w:
98
+ w.setnchannels(1)
99
+ w.setsampwidth(2)
100
+ w.setframerate(sr)
101
+ w.writeframes(audio_int.tobytes())
102
+ return tmp.name
103
+
104
+ # ============================================
105
+ # MCP
106
+ # ============================================
107
+
108
+ def start_mcp_server():
109
+ """
110
+ Start the local CRM MCP server (crm_mcp_server.py) in a background process.
111
+
112
+ Controlled by Settings.mcp_auto_start (MCP_AUTO_START env var).
113
+ """
114
+ settings = Settings()
115
+ if not getattr(settings, "mcp_auto_start", True):
116
+ logger.info("MCP auto-start disabled via settings.")
117
+ return None
118
+
119
+ script_path = os.path.join(os.path.dirname(__file__), "crm_mcp_server.py")
120
+ cmd = [sys.executable, script_path]
121
+
122
+ try:
123
+ proc = subprocess.Popen(
124
+ cmd,
125
+ stdout=subprocess.PIPE,
126
+ stderr=subprocess.PIPE,
127
+ )
128
+ logger.info(f"Started CRM MCP server (PID={proc.pid}) using: {cmd}")
129
+ except Exception as e:
130
+ logger.error(f"Failed to start CRM MCP server: {e}")
131
+ return None
132
+
133
+ # Ensure child process is cleaned up when app exits
134
+ def _cleanup():
135
+ if proc.poll() is None:
136
+ logger.info("Stopping CRM MCP server...")
137
+ try:
138
+ proc.terminate()
139
+ except Exception:
140
+ pass
141
+
142
+ atexit.register(_cleanup)
143
+ return proc
144
+
145
+
146
+ # ============================================
147
+ # Chatbot
148
+ # ============================================
149
+
150
+
151
+ TOOL_CALL_RE = re.compile(
152
+ r'^\s*([a-zA-Z_][\w]*)\s*\((.*)\)\s*$', re.DOTALL
153
+ )
154
+
155
+
156
+ def parse_tool_call(text: str):
157
+ """
158
+ Extract tool_name and kwargs from something like:
159
+ tool_name(a=1, b="x")
160
+ Works even if surrounded by chatter or code fences.
161
+ """
162
+ # Remove code fences
163
+ cleaned = text.strip()
164
+ if "```" in cleaned:
165
+ parts = cleaned.split("```")
166
+ if len(parts) >= 2:
167
+ cleaned = parts[1]
168
+
169
+ # Find last candidate line
170
+ pattern = re.compile(r'^([a-zA-Z_]\w*)\s*\((.*)\)\s*$')
171
+ for line in reversed(cleaned.splitlines()):
172
+ line = line.strip()
173
+ m = pattern.match(line)
174
+ if not m:
175
+ continue
176
+
177
+ print(f"Tool call: {line}")
178
+
179
+ name, args_src = m.groups()
180
+ args_src = args_src.strip()
181
+
182
+ # No args
183
+ if not args_src:
184
+ return name, {}
185
+
186
+ try:
187
+ func_src = f"def _f({args_src}): pass"
188
+ module = ast.parse(func_src)
189
+ func_def = module.body[0] # ast.FunctionDef
190
+ args = func_def.args
191
+
192
+ kwargs = {}
193
+ for arg, default in zip(args.args, args.defaults):
194
+ key = arg.arg
195
+ value = ast.literal_eval(default)
196
+ kwargs[key] = value
197
+
198
+ return name, kwargs
199
+
200
+ except Exception as e:
201
+ print("Argument parse error:", e)
202
+ return None
203
+
204
+ return None
205
+
206
+ class Chatbot:
207
+ def __init__(self):
208
+ self.settings = Settings()
209
+ self.audio_service = AudioService(
210
+ api_key=self.settings.hf_token,
211
+ stt_provider="fal-ai",
212
+ stt_model=self.settings.stt_model,
213
+ tts_model=self.settings.tts_model,
214
+ )
215
+ self.llm_service = LLMService(
216
+ api_key=self.settings.llm_api_key,
217
+ model_name=self.settings.effective_model_name,
218
+ )
219
+ self.vision_client = OpenAI(
220
+ base_url=self.settings.NEBIUS_BASE_URL,
221
+ api_key=self.settings.NEBIUS_API_KEY
222
+ )
223
+ self.vision_model = self.settings.NEBIUS_MODEL
224
+ self.screen_service = get_screen_service()
225
+ self.history: list[dict] = []
226
+
227
+ self.mcp = MCPClient()
228
+ try:
229
+ self.tools = self.mcp.list_tools()
230
+ except Exception as e:
231
+ # fail gracefully, tools just won’t be used
232
+ logging.exception("Failed to load tools from MCP server: %s", e)
233
+ self.tools = []
234
+
235
+ self.tools_description = self._build_tools_description()
236
+
237
+ def _build_tools_description(self) -> str:
238
+ """Build a human-readable list of tools for the system prompt."""
239
+ if not getattr(self, "tools", None):
240
+ return "No tools are currently available."
241
+
242
+ lines = []
243
+ for t in self.tools:
244
+ name = t.get("name", "unknown_tool")
245
+ desc = t.get("description", "")
246
+ props = t.get("inputSchema", {}).get("properties", {})
247
+ args = ", ".join(
248
+ f'{k}: {v.get("type", "string")}'
249
+ for k, v in props.items()
250
+ )
251
+ lines.append(f"- {name}({args}) — {desc}")
252
+ return "\n".join(lines)
253
+
254
+ async def process(self, text: str, tts_enabled: bool = True) -> Tuple[str, Optional[str]]:
255
+ if not text.strip():
256
+ return "", None
257
+
258
+ # ---------- Phase 1: ask model what to do ----------
259
+ messages = self.llm_service.build_messages_with_tools(
260
+ system_prompt=get_generic_prompt(),
261
+ user_input=text,
262
+ tools_description=self.tools_description,
263
+ conversation_history=self.history,
264
+ )
265
+
266
+ first_reply = await self.llm_service.get_chat_completion(messages)
267
+
268
+ # Try to parse a tool call from the reply
269
+ tool_call = parse_tool_call(first_reply)
270
+ tool_result_str = None
271
+
272
+ if tool_call:
273
+ tool_name, tool_args = tool_call
274
+ try:
275
+ result = self.mcp.call_tool(tool_name, tool_args)
276
+ tool_result_str = (
277
+ f"Tool {tool_name} succeeded with arguments {tool_args}.\n"
278
+ f"Result (JSON):\n{json.dumps(result, indent=2)}"
279
+ )
280
+ except Exception as e:
281
+ tool_result_str = f"Tool {tool_name} failed: {e}"
282
+
283
+ # ---------- Phase 2: give tool result back to model ----------
284
+ messages = self.llm_service.build_messages_with_tools(
285
+ system_prompt=get_generic_prompt(),
286
+ user_input=text,
287
+ tools_description=self.tools_description,
288
+ conversation_history=self.history,
289
+ tool_results=tool_result_str,
290
+ )
291
+ reply = await self.llm_service.get_chat_completion(messages)
292
+ else:
293
+ # No tool call – just treat initial text as final answer
294
+ reply = first_reply
295
+
296
+ # Save final user + assistant messages in conversation history
297
+ self.history.append({"role": "user", "content": text})
298
+ self.history.append({"role": "assistant", "content": reply})
299
+
300
+ # ---------- Optional: TTS ----------
301
+ audio_path = None
302
+ if tts_enabled:
303
+ audio_bytes = await self.audio_service.text_to_speech(reply)
304
+ if audio_bytes:
305
+ tmp = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")
306
+ tmp.write(audio_bytes)
307
+ tmp.close()
308
+ audio_path = tmp.name
309
+
310
+ return reply, audio_path
311
+
312
+
313
+ async def transcribe(self, audio_path: str) -> str:
314
+ return await self.audio_service.speech_to_text(audio_path)
315
+
316
+ async def capture_screen(self, state: AppState, tts_enabled: bool) -> Tuple[List[Dict], Optional[str], AppState, str]:
317
+ """Capture screen and send to vision model."""
318
+ # Capture screenshot
319
+ capture = self.screen_service.capture()
320
+ if not capture:
321
+ return state.conversation, None, state, "❌ Capture failed"
322
+
323
+ logger.info(f"Screenshot captured: {capture.width}x{capture.height}")
324
+
325
+ # Send to vision model
326
+ try:
327
+ response = await asyncio.get_event_loop().run_in_executor(
328
+ None,
329
+ lambda: self.vision_client.chat.completions.create(
330
+ model=self.vision_model,
331
+ messages=[{
332
+ "role": "user",
333
+ "content": [
334
+ {"type": "text", "text": "Ignore the ATLAS interface on the left-most side of the screen. Describe the content of the main window."},
335
+ {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{capture.image_b64}"}}
336
+ ]
337
+ }]
338
+ )
339
+ )
340
+ analysis = response.choices[0].message.content
341
+ except Exception as e:
342
+ logger.error(f"Vision error: {e}")
343
+ return state.conversation, None, state, f"❌ {str(e)}"
344
+
345
+ # Process through main LLM
346
+ reply, audio_path = await self.process(f"[Screen analysis]: {analysis}", tts_enabled)
347
+
348
+ state.conversation.append({"role": "user", "content": "[Screen captured]"})
349
+ state.conversation.append({"role": "assistant", "content": f"Vision output: {reply}. Incorporate the data in your context. Do not call any tool yet, await for further instructions."})
350
+
351
+ return state.conversation, audio_path, state, "📸 Done"
352
+
353
+ # Start MCP server as soon as this module is imported
354
+ mcp_proc = start_mcp_server()
355
+
356
+ # Give it a brief moment to bind to the port
357
+ time.sleep(1.0)
358
+
359
+ bot = None # type: ignore
360
+
361
+
362
+ def tokens_present() -> bool:
363
+ """Check whether required env vars are already available."""
364
+ return bool(os.getenv("HF_TOKEN")) and bool(os.getenv("NEBIUS_API_KEY"))
365
+
366
+
367
+ ENV_PATH = Path(__file__).parent / ".env"
368
+
369
+ def upsert_env_var(key: str, value: str):
370
+ """
371
+ Update or append env var in .env file so it persists across runs.
372
+ Simple key=value per line, no fancy parsing.
373
+ """
374
+ if not value:
375
+ return
376
+
377
+ lines = []
378
+ if ENV_PATH.exists():
379
+ lines = ENV_PATH.read_text(encoding="utf-8").splitlines()
380
+
381
+ found = False
382
+ for i, line in enumerate(lines):
383
+ if line.startswith(f"{key}="):
384
+ lines[i] = f"{key}={value}"
385
+ found = True
386
+ break
387
+
388
+ if not found:
389
+ lines.append(f"{key}={value}")
390
+
391
+ ENV_PATH.write_text("\n".join(lines) + "\n", encoding="utf-8")
392
+
393
+
394
+ def ensure_bot_initialized() -> Optional[str]:
395
+ """
396
+ Initialize the global Chatbot if tokens are present.
397
+ Returns an error message if tokens are missing, otherwise None.
398
+ """
399
+ global bot
400
+
401
+ if bot is not None:
402
+ return None
403
+
404
+ hf_token = os.getenv("HF_TOKEN", "")
405
+ if not hf_token or len(hf_token) <= 10:
406
+ return "⚠️ HF_TOKEN missing or invalid. Please fill it in the Setup section."
407
+
408
+ # Optional debug: see what we are about to use
409
+ settings = Settings()
410
+ logger.info(
411
+ f"Initializing Chatbot with HF token prefix={settings.hf_token[:4]}..., len={len(settings.hf_token)}"
412
+ )
413
+
414
+ bot = Chatbot()
415
+ return None
416
+
417
+
418
+ def save_tokens(hf_token: str, nebius_api_key: str) -> str:
419
+ # basic sanity check
420
+ if hf_token and not hf_token.strip().startswith("hf_"):
421
+ return "❌ HF_TOKEN does not look like a Hugging Face token (should start with 'hf_')."
422
+
423
+ if hf_token:
424
+ os.environ["HF_TOKEN"] = hf_token.strip()
425
+ upsert_env_var("HF_TOKEN", hf_token.strip())
426
+
427
+ if nebius_api_key:
428
+ os.environ["NEBIUS_API_KEY"] = nebius_api_key.strip()
429
+ upsert_env_var("NEBIUS_API_KEY", nebius_api_key.strip())
430
+
431
+ # NOW build Chatbot + LLMService with the *current* env
432
+ err = ensure_bot_initialized()
433
+ if err:
434
+ return err
435
+ return "✅ Tokens saved and assistant initialized. You can now use Atlas."
436
+
437
+ def check_tokens_on_load():
438
+ if tokens_present():
439
+ # env already has HF_TOKEN/NEBIUS_API_KEY: build Chatbot immediately
440
+ err = ensure_bot_initialized()
441
+ msg = "✅ Tokens loaded from .env. Atlas is ready." if not err else err
442
+
443
+ return (
444
+ gr.update(visible=False), # hf_token_box
445
+ gr.update(visible=False), # nebius_key_box
446
+ msg,
447
+ )
448
+ else:
449
+ return (
450
+ gr.update(visible=True),
451
+ gr.update(visible=True),
452
+ "⚠️ Please paste your HF_TOKEN and NEBIUS_API_KEY to start.",
453
+ )
454
+
455
+
456
+ # ============================================
457
+ # Gradio Handlers
458
+ # ============================================
459
+
460
+ def process_audio(audio: tuple, state: AppState):
461
+ """Process audio chunk. Return gr.Audio(recording=False) to stop."""
462
+ if audio is None:
463
+ return None, state
464
+
465
+ sr, data = audio
466
+
467
+ # Mono
468
+ if data.ndim > 1:
469
+ data = data.mean(axis=1)
470
+
471
+ # Accumulate
472
+ if state.stream is None:
473
+ state.stream = data
474
+ state.sampling_rate = sr
475
+ else:
476
+ state.stream = np.concatenate((state.stream, data))
477
+
478
+ # Energy check
479
+ data_float = data.astype(np.float32)
480
+ if data.dtype == np.int16:
481
+ data_float = data_float / 32768.0
482
+ energy = float(np.sqrt(np.mean(data_float ** 2)))
483
+
484
+ if energy > 0.015:
485
+ state.started_talking = True
486
+ logger.debug(f"Talking: energy={energy:.4f}")
487
+
488
+ # Pause check
489
+ state.pause_detected = detect_pause(state.stream, state.sampling_rate, state)
490
+
491
+ if state.pause_detected and state.started_talking:
492
+ logger.info("Pause detected - stopping recording")
493
+ return gr.Audio(recording=False), state
494
+
495
+ return None, state
496
+
497
+
498
+ async def respond(state: AppState, tts_enabled: bool):
499
+ """Transcribe and respond when recording stops."""
500
+ if bot is None:
501
+ msg = "⚠️ Configure HF_TOKEN and NEBIUS_API_KEY in the Setup section before using voice."
502
+ state.conversation.append({"role": "assistant", "content": msg})
503
+ return None, AppState(conversation=state.conversation), state.conversation
504
+
505
+ if state.stream is None or len(state.stream) < 1000:
506
+ logger.info("No audio")
507
+ return None, AppState(conversation=state.conversation), state.conversation
508
+
509
+ logger.info(f"Processing {len(state.stream)} samples...")
510
+
511
+ wav_path = audio_to_wav_file(state.stream, state.sampling_rate)
512
+ transcript = await bot.transcribe(wav_path)
513
+ logger.info(f"Transcript: {transcript}")
514
+
515
+ if not transcript.strip():
516
+ return None, AppState(conversation=state.conversation), state.conversation
517
+
518
+ reply, audio_path = await bot.process(transcript, tts_enabled)
519
+
520
+ state.conversation.append({"role": "user", "content": transcript})
521
+ state.conversation.append({"role": "assistant", "content": reply})
522
+
523
+ return audio_path, AppState(conversation=state.conversation), state.conversation
524
+
525
+
526
+ def start_recording(state: AppState):
527
+ """Restart recording."""
528
+ if not state.stopped:
529
+ return gr.Audio(recording=True)
530
+ return gr.Audio(recording=False)
531
+
532
+
533
+ async def send_text(text: str, state: AppState, tts_enabled: bool):
534
+ if not text.strip():
535
+ return state.conversation, None, state, ""
536
+
537
+ if bot is None:
538
+ msg = "⚠️ Configure HF_TOKEN and NEBIUS_API_KEY in the Setup section before chatting."
539
+ state.conversation.append({"role": "assistant", "content": msg})
540
+ return state.conversation, None, state, ""
541
+
542
+ reply, audio_path = await bot.process(text, tts_enabled)
543
+ state.conversation.append({"role": "user", "content": text})
544
+ state.conversation.append({"role": "assistant", "content": reply})
545
+
546
+ return state.conversation, audio_path, state, ""
547
+
548
+
549
+ async def capture_screen_handler(state: AppState, tts_enabled: bool):
550
+ if bot is None:
551
+ msg = "⚠️ Configure HF_TOKEN and NEBIUS_API_KEY in the Setup section before using screen capture."
552
+ return state.conversation, None, state, msg
553
+
554
+ return await bot.capture_screen(state, tts_enabled)
555
+
556
+
557
+ # ============================================
558
+ # UI
559
+ # ============================================
560
+
561
+ with gr.Blocks(title="ATLAS") as demo:
562
+ gr.Markdown("### Atlas - CRM Voice Assistant")
563
+
564
+ state = gr.State(value=AppState())
565
+
566
+ with gr.Row():
567
+ with gr.Column(scale=2):
568
+ chatbot = gr.Chatbot(label="Conversation", height=400)
569
+
570
+ with gr.Row():
571
+ txt = gr.Textbox(placeholder="Type here your message...", label="Input", scale=4)
572
+ send_btn = gr.Button("Send", scale=1)
573
+
574
+ with gr.Column(scale=1):
575
+ # 🔐 Setup section
576
+ gr.Markdown("### Setup (API keys)")
577
+ hf_token_box = gr.Textbox(
578
+ placeholder="Paste your HuggingFace token (HF_TOKEN)",
579
+ label="HF_TOKEN",
580
+ type="password"
581
+ )
582
+ nebius_key_box = gr.Textbox(
583
+ placeholder="Paste your Nebius API key (NEBIUS_API_KEY)",
584
+ label="NEBIUS_API_KEY",
585
+ type="password"
586
+ )
587
+ save_keys_btn = gr.Button("Save keys & initialize Atlas")
588
+ setup_status = gr.Markdown("")
589
+
590
+ gr.Markdown("---")
591
+ gr.Markdown("### Speech module")
592
+
593
+ mic = gr.Audio(
594
+ sources=["microphone"],
595
+ type="numpy",
596
+ label="Microphone",
597
+ streaming=True,
598
+ )
599
+
600
+ audio_out = gr.Audio(label="Response", autoplay=True, streaming=True)
601
+ tts_toggle = gr.Checkbox(label="🔊 TTS Enabled", value=True)
602
+ stop_btn = gr.Button("🛑 Stop", variant="stop")
603
+
604
+ gr.Markdown("---")
605
+ gr.Markdown("### 🖥️ Screen")
606
+ capture_btn = gr.Button("📸 Capture Screen")
607
+ screen_status = gr.Textbox(label="Status", value="Ready", interactive=False)
608
+
609
+
610
+ # Stream -> detect pause -> stop
611
+ mic.stream(
612
+ process_audio,
613
+ inputs=[mic, state],
614
+ outputs=[mic, state],
615
+ stream_every=0.5,
616
+ time_limit=60,
617
+ )
618
+
619
+ # Stop -> transcribe -> respond -> restart
620
+ mic.stop_recording(
621
+ respond,
622
+ inputs=[state, tts_toggle],
623
+ outputs=[audio_out, state, chatbot],
624
+ ).then(
625
+ start_recording,
626
+ inputs=[state],
627
+ outputs=[mic],
628
+ )
629
+
630
+ stop_btn.click(
631
+ lambda: (AppState(stopped=True), gr.Audio(recording=False)),
632
+ outputs=[state, mic],
633
+ )
634
+
635
+ send_btn.click(send_text, inputs=[txt, state, tts_toggle], outputs=[chatbot, audio_out, state, txt])
636
+ txt.submit(send_text, inputs=[txt, state, tts_toggle], outputs=[chatbot, audio_out, state, txt])
637
+
638
+ # Screen capture
639
+ capture_btn.click(
640
+ capture_screen_handler,
641
+ inputs=[state, tts_toggle],
642
+ outputs=[chatbot, audio_out, state, screen_status]
643
+ )
644
+
645
+ # When app loads, show/hide token inputs based on env
646
+ demo.load(
647
+ fn=check_tokens_on_load,
648
+ inputs=None,
649
+ outputs=[hf_token_box, nebius_key_box, setup_status],
650
+ )
651
+
652
+ # When user clicks "Save keys"
653
+ save_keys_btn.click(
654
+ fn=save_tokens,
655
+ inputs=[hf_token_box, nebius_key_box],
656
+ outputs=[setup_status],
657
+ )
658
+
659
+
660
+
661
+
662
+ if __name__ == "__main__":
663
+ demo.launch(
664
+ server_name="0.0.0.0",
665
+ server_port=7860,
666
+ theme=gr.themes.Default()
667
+ )
config/prompts.py ADDED
@@ -0,0 +1,198 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """System prompts for the ATLAS assistant."""
2
+
3
+ def get_generic_prompt() -> str:
4
+ """Return the main system prompt for the LLM."""
5
+ return """# ATLAS — Tool Use with High-Reliability Calling
6
+
7
+ You are **ATLAS**, an intelligent assistant with access to tools. Your priority is **accurate, schema-correct tool calls**. Treat tools as **verifiable data sources** that complement your reasoning.
8
+
9
+ ---
10
+
11
+ ## Capabilities (context)
12
+ 1) **CRM Access** — search/retrieve customers, deals, documents.
13
+ 2) **Screen Analysis** — reason over shared visuals. Utilize vision data to enhance your context, do not call tools based on the response.
14
+ 3) **Voice Interaction** — natural, concise speech.
15
+
16
+ > Use tools to fetch facts needed for the user's request.
17
+
18
+ ---
19
+
20
+ ## Tool Calling Protocol (STRICT)
21
+
22
+ ### A. Availability & Name Match
23
+ - Call **only** tools listed in `TOOLS` and **exactly** by their key (e.g., `get_customer`, **not** `getCustomer`).
24
+ - If a needed capability isn't in `TOOLS`, **do not invent** a call. Explain the limitation or ask for alternatives.
25
+
26
+ ### B. One Tool Per Message
27
+ - Each assistant turn either (1) calls **exactly one** tool, or (2) asks a **targeted question** if parameters are missing/ambiguous, or (3) answers from known information.
28
+ - After a tool call, wait for the tool's result before any further calls.
29
+
30
+ ### C. Function-Call Format (no extra text)
31
+ When you need to call a tool, reply **only** with:
32
+
33
+ name_of_the_tool(arg1="value1", arg2="value2")
34
+
35
+ - No prefix/suffix text appended to the tool i.e. tool_name=get_customer(customer_id="Walnut"). This should be get_customer(customer_id="Walnut")
36
+ - No Markdown fences.
37
+ - String values must be quoted. Numbers unquoted.
38
+ - Include **only** schema fields; no extras.
39
+
40
+ ### D. Parameter Sourcing & Validation (pre-call checklist)
41
+ Before **every** call, resolve and validate each parameter:
42
+
43
+ 1) **Source Map** each param:
44
+ - `user` (explicitly provided by the user)
45
+ - `context` (already mentioned in conversation)
46
+ - `prior_tool` (value returned by an earlier tool)
47
+ - `agent_generated` (only safe defaults allowed by schema; never guess IDs/names)
48
+
49
+ 2) **Schema Conformance**:
50
+ - Required fields present.
51
+ - Correct **types**, **casing**, and allowed values (e.g., `stage` matches provided enum).
52
+ - Respect defaults (e.g., `limit` defaults to 50; omit if not needed).
53
+
54
+ 3) **Disambiguation**:
55
+ - If a required parameter is **uncertain**, **ask a short clarifying question** instead of calling the tool.
56
+
57
+ ---
58
+
59
+ ## Response Guidelines (around tool calls)
60
+
61
+ - **Post-result:** cite **concrete fields** (names, IDs, amounts, timestamps). Avoid generic claims like “found an error”; quote the actual value or message.
62
+ - **Screen/Voice context:** reference what you see/hear but **do not** call tools unless needed to satisfy the request.
63
+ - **Uncertainty:** say “I'm not sure” and propose a data-gathering step (with a proper tool call) instead of guessing.
64
+
65
+ ---
66
+
67
+ ## Reliability Heuristics
68
+
69
+ 1) **Plan → Execute**
70
+ - First, clarify the goal and required parameters.
71
+ - Then choose the **single** best tool.
72
+ - Execute exactly one call.
73
+
74
+ 2) **Parameter Echo (mentally, not in the tool call)**
75
+ - Ensure each param is justified by a source map before calling.
76
+ - Example mental check: `customer_id ← prior_tool:get_customers[...]` or `user provided`.
77
+
78
+ 3) **No Hallucinated Entities**
79
+ - Do **not** invent customer IDs, deal IDs, document names, or fields. If you only have “Walnut” as a **name**, and the schema requires `customer_id`, use it as so.
80
+
81
+ 4) **Branching Discipline**
82
+ - If a result contradicts your assumption (e.g., access is disabled), choose the next **logical** tool (e.g., `get_access` → possibly `set_access` **only with consent**) or report options; don't chain speculative calls.
83
+
84
+ 5) **Minimal Surface**
85
+ - Prefer precise queries (filters like `status`, `industry`, `stage`, `limit`) to reduce noise.
86
+
87
+ ---
88
+
89
+ ## Tool Catalog Reminders (schema edges)
90
+
91
+ - `get_customer` requires **`customer_id`**.
92
+ - `get_deals` supports filters: `stage`, `customer_id`, `owner`, `min_value`, `limit`. Ensure `stage` matches the allowed set.
93
+ - `read_document` requires **`name`** (exact or partial). Do not make up the content, invoke a tool call to get the document.
94
+ - `get_pipeline_summary` and `get_documents` take **no params** (empty object).
95
+
96
+ ---
97
+
98
+ ## Failure Handling & Recovery
99
+
100
+ - On failure, explain succinctly:
101
+ - **What** failed (`tool`, error text),
102
+ - **Why** (schema mismatch, missing param, backend error),
103
+ - **Next**: propose a compliant retry or an alternative tool.
104
+ - Do **not** retry automatically unless you fixed the cause (e.g., added required param).
105
+
106
+ ---
107
+
108
+ > When you are ready to call a tool, send **only** the function call in the required format. Otherwise, ask a clarifying question or summarize findings with cited fields.
109
+
110
+ """
111
+
112
+
113
+ def get_vision_prompt() -> str:
114
+ """Return the prompt for the vision/screen analysis model."""
115
+ return """You are a visual analysis assistant helping a user with their screen content.
116
+
117
+ ## Mission
118
+ Analyze the current screen and **prioritize the actionable email** in the foreground. Extract the problem, requests, blockers, and any embedded evidence (e.g., error snippets), then output a **structured to-do object** the user can act on immediately.
119
+
120
+ ## Guidelines
121
+ 1) **Be Specific:** Cite on-screen text (subjects, status codes, error messages, labels).
122
+ 2) **Be Relevant:** Focus on the active email/thread and its required actions.
123
+ 3) **Be Concise:** Short bullet summaries + a tight to-do list.
124
+ 4) **Note Changes (if follow-up):** Briefly mention what’s new versus prior view.
125
+ 5) **PII Caution:** Redact personal names/emails (use roles like “Technical Account Manager”). Company names/products are fine.
126
+ 6) **Evidence First:** Prefer exact on-screen snippets for errors/status (e.g., `401 Unauthorized`, `"Access not authorized for this company."`).
127
+
128
+ ## What to Extract (Email-Focused)
129
+ - Active app/window (e.g., “Outlook/Email”)
130
+ - Subject / thread topic
131
+ - Sender role (redact personal name), company (if shown)
132
+ - Key asks/requirements (bullets)
133
+ - Pain points/blockers (bullets)
134
+ - Evidence snippet(s) (codes/messages/log lines visible)
135
+ - Attachments or embedded artifacts (if any)
136
+ - Implied deadlines/urgency (if stated)
137
+ - Suggested immediate next steps
138
+
139
+ ## Output (JSON only)
140
+ Return **only** a JSON object with this shape:
141
+
142
+ {
143
+ "context": {
144
+ "active_app": "<string>",
145
+ "subject": "<string>",
146
+ "sender_role": "<string|null>",
147
+ "sender_company": "<string|null>",
148
+ "received_time": "<string|null>"
149
+ },
150
+ "summary": "<1-2 sentence plain-language recap>",
151
+ "requirements": ["<ask 1>", "<ask 2>", "..."],
152
+ "blockers": ["<blocker 1>", "..."],
153
+ "evidence": [
154
+ {"type": "status", "value": "<e.g., 401 Unauthorized>"},
155
+ {"type": "message", "value": "<exact error text>"}
156
+ ],
157
+ "todos": [
158
+ {"title": "<actionable task>", "owner": "<me|team>", "priority": "<high|med|low>", "due": "<ISO8601|null>"},
159
+ {"title": "...", "owner": "...", "priority": "...", "due": "..."}
160
+ ],
161
+ "suggested_reply": "<concise draft reply to sender without PII>"
162
+ }
163
+
164
+ If a field is unknown, use null. Keep values brief and factual.
165
+
166
+ ## Ignore
167
+ - System tray and background windows unless directly relevant.
168
+ - Personal info (names/emails) — redact or omit.
169
+
170
+ Respond with the JSON object only.
171
+
172
+ """
173
+
174
+
175
+ def get_tool_execution_prompt() -> str:
176
+ """Return the prompt for tool execution context."""
177
+ return """Based on the tool execution results, provide a helpful response to the user.
178
+
179
+ If the tool succeeded:
180
+ - Summarize the key information returned
181
+ - Highlight what's most relevant to the user's query
182
+ - Suggest follow-up actions if appropriate
183
+
184
+ If the tool failed:
185
+ - Explain what went wrong in simple terms
186
+ - Suggest alternative approaches
187
+ - Offer to try a different tool if available
188
+ """
189
+
190
+
191
+ def get_vad_context_prompt() -> str:
192
+ """Return the prompt for voice activity detection context."""
193
+ return """The user is speaking to you via voice. Keep your response:
194
+ - Conversational and natural
195
+ - Concise (suitable for text-to-speech)
196
+ - Clear and easy to understand when heard
197
+ - Free of complex formatting (no bullet points, tables, etc.)
198
+ """
config/settings.py ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Application-wide configuration settings."""
2
+
3
+ import os
4
+ from dataclasses import dataclass, field
5
+ from typing import Optional
6
+ from pathlib import Path
7
+ from dotenv import load_dotenv
8
+
9
+ load_dotenv()
10
+
11
+
12
+ @dataclass
13
+ class Settings:
14
+ """Application-wide configuration settings."""
15
+
16
+ # ============================================
17
+ # LLM Provider Settings
18
+ # ============================================
19
+ llm_provider: str = os.getenv("LLM_PROVIDER", "auto")
20
+
21
+ # Hugging Face settings
22
+ hf_token: str = os.getenv("HF_TOKEN", "")
23
+ hf_chat_model: str = os.getenv("HF_CHAT_MODEL", "Qwen/Qwen2.5-7B-Instruct")
24
+ hf_temperature: float = float(os.getenv("HF_TEMPERATURE", "0.001"))
25
+ hf_max_new_tokens: int = int(os.getenv("HF_MAX_NEW_TOKENS", "512"))
26
+
27
+ # Model settings
28
+ model_name: str = os.getenv("MODEL_NAME", "Qwen/Qwen2.5-7B-Instruct")
29
+
30
+ # ============================================
31
+ # Audio Provider Settings
32
+ # ============================================
33
+ audio_provider: str = os.getenv("AUDIO_PROVIDER", "auto")
34
+ tts_model: str = os.getenv("TTS_MODEL", "hexgrad/Kokoro-82M")
35
+ stt_model: str = os.getenv("STT_MODEL", "openai/whisper-large-v3")
36
+
37
+ # ============================================
38
+ # VAD (Voice Activity Detection) Settings
39
+ # ============================================
40
+ vad_enabled: bool = os.getenv("VAD_ENABLED", "true").lower() == "true"
41
+ vad_sample_rate: int = int(os.getenv("VAD_SAMPLE_RATE", "16000"))
42
+ vad_frame_duration_ms: int = int(os.getenv("VAD_FRAME_DURATION_MS", "30"))
43
+ vad_aggressiveness: int = int(os.getenv("VAD_AGGRESSIVENESS", "2"))
44
+ vad_speech_threshold: float = float(os.getenv("VAD_SPEECH_THRESHOLD", "0.5"))
45
+ vad_silence_threshold: float = float(os.getenv("VAD_SILENCE_THRESHOLD", "0.3"))
46
+ vad_min_speech_ms: int = int(os.getenv("VAD_MIN_SPEECH_MS", "300"))
47
+ vad_max_speech_s: float = float(os.getenv("VAD_MAX_SPEECH_S", "30.0"))
48
+ vad_post_speech_silence_ms: int = int(os.getenv("VAD_POST_SPEECH_SILENCE_MS", "800"))
49
+
50
+ # ============================================
51
+ # Screen/Vision Settings
52
+ # ============================================
53
+ screen_capture_interval: float = float(os.getenv("SCREEN_CAPTURE_INTERVAL", "1.0"))
54
+ screen_compression_quality: int = int(os.getenv("SCREEN_COMPRESSION_QUALITY", "50"))
55
+ max_width: int = int(os.getenv("SCREEN_MAX_WIDTH", "3440"))
56
+ max_height: int = int(os.getenv("SCREEN_MAX_HEIGHT", "1440"))
57
+
58
+ # Vision model (Nebius)
59
+ NEBIUS_MODEL: str = os.getenv("NEBIUS_MODEL", "google/gemma-3-27b-it-fast")
60
+ NEBIUS_API_KEY: str = os.getenv("NEBIUS_API_KEY", "")
61
+ NEBIUS_BASE_URL: str = os.getenv("NEBIUS_BASE_URL", "https://api.studio.nebius.com/v1/")
62
+
63
+ # Auto-enable vision when screen context is needed
64
+ vision_auto_enabled: bool = os.getenv("VISION_AUTO_ENABLED", "true").lower() == "true"
65
+ vision_fps: float = float(os.getenv("VISION_FPS", "0.05")) # Frames per second
66
+
67
+ # ============================================
68
+ # MCP Server Settings
69
+ # ============================================
70
+ mcp_server_url: str = os.getenv("MCP_SERVER_URL", "http://localhost:8000")
71
+ mcp_auto_start: bool = os.getenv("MCP_AUTO_START", "true").lower() == "true"
72
+
73
+ # ============================================
74
+ # CRM Data Settings
75
+ # ============================================
76
+ crm_data_dir: str = os.getenv("CRM_DATA_DIR", "./data")
77
+
78
+ # ============================================
79
+ # Hyper-V Settings (Legacy)
80
+ # ============================================
81
+ hyperv_enabled: bool = os.getenv("HYPERV_ENABLED", "false").lower() == "true"
82
+ hyperv_host: str = os.getenv("HYPERV_HOST", "localhost")
83
+ hyperv_username: Optional[str] = os.getenv("HYPERV_USERNAME")
84
+ hyperv_password: Optional[str] = os.getenv("HYPERV_PASSWORD")
85
+
86
+ # ============================================
87
+ # Application Settings
88
+ # ============================================
89
+ max_conversation_history: int = int(os.getenv("MAX_CONVERSATION_HISTORY", "50"))
90
+ temp_dir: str = os.getenv("TEMP_DIR", "./temp")
91
+ log_level: str = os.getenv("LOG_LEVEL", "INFO")
92
+
93
+ # Feature flags
94
+ enable_screen_sharing_button: bool = os.getenv("ENABLE_SCREEN_SHARING_BUTTON", "true").lower() == "true"
95
+ enable_voice_input: bool = os.getenv("ENABLE_VOICE_INPUT", "true").lower() == "true"
96
+
97
+ def __post_init__(self):
98
+ """Initialize directories and validate settings."""
99
+ # Ensure necessary directories exist
100
+ Path(self.temp_dir).mkdir(exist_ok=True, parents=True)
101
+ Path("./config").mkdir(exist_ok=True, parents=True)
102
+ Path("./logs").mkdir(exist_ok=True, parents=True)
103
+ Path(self.crm_data_dir).mkdir(exist_ok=True, parents=True)
104
+
105
+ # 🔁 Refresh dynamic, env-backed values so they pick up changes done at runtime
106
+ self.hf_token = os.getenv("HF_TOKEN", self.hf_token)
107
+ self.NEBIUS_API_KEY = os.getenv("NEBIUS_API_KEY", self.NEBIUS_API_KEY)
108
+
109
+
110
+ def is_hf_token_valid(self) -> bool:
111
+ """Check if HuggingFace token is set and looks like a real HF token."""
112
+ token = os.getenv("HF_TOKEN", "") # always read the latest env
113
+ return bool(token and token.startswith("hf_") and len(token) > 20)
114
+
115
+ @property
116
+ def effective_llm_provider(self) -> str:
117
+ if self.llm_provider == "auto":
118
+ return "huggingface" if self.is_hf_token_valid() else "openai"
119
+ return self.llm_provider
120
+
121
+ @property
122
+ def effective_audio_provider(self) -> str:
123
+ if self.audio_provider == "auto":
124
+ return "huggingface" if self.is_hf_token_valid() else "openai"
125
+ return self.audio_provider
126
+
127
+ @property
128
+ def llm_endpoint(self) -> str:
129
+ if self.effective_llm_provider == "huggingface":
130
+ return f"https://api-inference.huggingface.co/models/{self.hf_chat_model}"
131
+ return getattr(self, 'openai_endpoint', '')
132
+
133
+ @property
134
+ def llm_api_key(self) -> str:
135
+ if self.effective_llm_provider == "huggingface":
136
+ return os.getenv("HF_TOKEN", "") # latest HF token
137
+ return getattr(self, "openai_api_key", "")
138
+
139
+ @property
140
+ def effective_model_name(self) -> str:
141
+ return self.hf_chat_model if self.effective_llm_provider == "huggingface" else self.model_name
142
+
143
+ def get_vad_config(self) -> dict:
144
+ """Get VAD configuration as a dictionary."""
145
+ return {
146
+ "sample_rate": self.vad_sample_rate,
147
+ "frame_duration_ms": self.vad_frame_duration_ms,
148
+ "aggressiveness": self.vad_aggressiveness,
149
+ "speech_threshold": self.vad_speech_threshold,
150
+ "silence_threshold": self.vad_silence_threshold,
151
+ "min_speech_duration_ms": self.vad_min_speech_ms,
152
+ "max_speech_duration_s": self.vad_max_speech_s,
153
+ "post_speech_silence_ms": self.vad_post_speech_silence_ms,
154
+ }
crm_mcp_server.py ADDED
@@ -0,0 +1,525 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ CRM MCP Server - Local MCP server with mocked CRM data.
3
+ Provides tools for customer, deal, and document management.
4
+ """
5
+
6
+ import json
7
+ import os
8
+ import logging
9
+ from pathlib import Path
10
+ from typing import Optional, List, Dict, Any
11
+ from datetime import datetime
12
+
13
+ from fastapi import FastAPI, HTTPException
14
+ from fastapi.middleware.cors import CORSMiddleware
15
+ from pydantic import BaseModel
16
+
17
+ # Configure logging
18
+ logging.basicConfig(level=logging.INFO)
19
+ logger = logging.getLogger(__name__)
20
+
21
+ # Initialize FastAPI app
22
+ app = FastAPI(title="CRM MCP Server", version="1.0.0")
23
+
24
+ # Add CORS middleware
25
+ app.add_middleware(
26
+ CORSMiddleware,
27
+ allow_origins=["*"],
28
+ allow_credentials=True,
29
+ allow_methods=["*"],
30
+ allow_headers=["*"],
31
+ )
32
+
33
+ # Data directory path
34
+ DATA_DIR = Path(__file__).parent / "data"
35
+ ACCESS_FILE = DATA_DIR / "access.json"
36
+
37
+
38
+ class ToolCallRequest(BaseModel):
39
+ name: str
40
+ arguments: Dict[str, Any] = {}
41
+
42
+
43
+ class ToolResponse(BaseModel):
44
+ success: bool
45
+ result: Any = None
46
+ error: str = None
47
+
48
+
49
+ # ============================================
50
+ # Data Loading Functions
51
+ # ============================================
52
+
53
+ def load_customers() -> Dict:
54
+ """Load customers from JSON file."""
55
+ customers_file = DATA_DIR / "customers.json"
56
+ if customers_file.exists():
57
+ with open(customers_file, "r") as f:
58
+ return json.load(f)
59
+ return {"customers": []}
60
+
61
+
62
+ def load_deals() -> Dict:
63
+ """Load deals from JSON file."""
64
+ deals_file = DATA_DIR / "deals.json"
65
+ if deals_file.exists():
66
+ with open(deals_file, "r") as f:
67
+ return json.load(f)
68
+ return {"deals": [], "pipeline_summary": {}}
69
+
70
+
71
+ def load_documents() -> List[Dict]:
72
+ """Load list of available documents."""
73
+ docs_dir = DATA_DIR / "documents"
74
+ documents = []
75
+ if docs_dir.exists():
76
+ for doc_path in docs_dir.glob("*"):
77
+ if doc_path.is_file():
78
+ stat = doc_path.stat()
79
+ documents.append({
80
+ "name": doc_path.name,
81
+ #"path": str(doc_path),
82
+ "size_bytes": stat.st_size,
83
+ "modified_at": datetime.fromtimestamp(stat.st_mtime).isoformat(),
84
+ "type": doc_path.suffix.lstrip(".")
85
+ })
86
+ return documents
87
+
88
+ def load_access_data() -> Dict[str, Any]:
89
+ if ACCESS_FILE.exists():
90
+ with open(ACCESS_FILE, "r", encoding="utf-8") as f:
91
+ return json.load(f)
92
+ return {"access": []}
93
+
94
+
95
+ def save_access_data(data: Dict[str, Any]) -> None:
96
+ ACCESS_FILE.parent.mkdir(parents=True, exist_ok=True)
97
+ with open(ACCESS_FILE, "w", encoding="utf-8") as f:
98
+ json.dump(data, f, indent=2)
99
+
100
+
101
+ # ============================================
102
+ # Tool Implementations
103
+ # ============================================
104
+
105
+ def get_customers(
106
+ status: Optional[str] = None,
107
+ industry: Optional[str] = None,
108
+ limit: int = 50
109
+ ) -> Dict:
110
+ """Get list of customers with optional filtering."""
111
+ data = load_customers()
112
+ customers = data.get("customers", [])
113
+
114
+ # Apply filters
115
+ if status:
116
+ customers = [c for c in customers if c.get("status", "").lower() == status.lower()]
117
+ if industry:
118
+ customers = [c for c in customers if industry.lower() in c.get("industry", "").lower()]
119
+
120
+ # Apply limit
121
+ customers = customers[:limit]
122
+
123
+ return {
124
+ "total": len(customers),
125
+ "customers": customers
126
+ }
127
+
128
+
129
+ def get_customer(customer_id: str) -> Dict:
130
+ """Get a specific customer by ID."""
131
+ data = load_customers()
132
+ for customer in data.get("customers", []):
133
+ if customer.get("id") == customer_id:
134
+ # Also get related deals
135
+ deals_data = load_deals()
136
+ related_deals = [
137
+ d for d in deals_data.get("deals", [])
138
+ if d.get("customer_id") == customer_id
139
+ ]
140
+ customer["related_deals"] = related_deals
141
+ customer.pop("tags")
142
+ return customer
143
+ return {"error": f"Customer {customer_id} not found"}
144
+
145
+
146
+ def get_deals(
147
+ stage: Optional[str] = None,
148
+ customer_id: Optional[str] = None,
149
+ owner: Optional[str] = None,
150
+ min_value: Optional[float] = None,
151
+ limit: int = 50
152
+ ) -> Dict:
153
+ """Get list of deals with optional filtering."""
154
+ data = load_deals()
155
+ deals = data.get("deals", [])
156
+
157
+ # Apply filters
158
+ if stage:
159
+ deals = [d for d in deals if d.get("stage", "").lower() == stage.lower()]
160
+ if customer_id:
161
+ deals = [d for d in deals if d.get("customer_id") == customer_id]
162
+ if owner:
163
+ deals = [d for d in deals if owner.lower() in d.get("owner", "").lower()]
164
+ if min_value is not None:
165
+ deals = [d for d in deals if d.get("value", 0) >= min_value]
166
+
167
+ # Apply limit
168
+ deals = deals[:limit]
169
+
170
+ return {
171
+ "total": len(deals),
172
+ "deals": deals,
173
+ "pipeline_summary": data.get("pipeline_summary", {})
174
+ }
175
+
176
+
177
+ def get_deal(deal_id: str) -> Dict:
178
+ """Get a specific deal by ID."""
179
+ data = load_deals()
180
+ for deal in data.get("deals", []):
181
+ if deal.get("id") == deal_id:
182
+ return deal
183
+ return {"error": f"Deal {deal_id} not found"}
184
+
185
+
186
+ def get_pipeline_summary() -> Dict:
187
+ """Get sales pipeline summary."""
188
+ data = load_deals()
189
+ deals = data.get("deals", [])
190
+
191
+ # Calculate fresh summary
192
+ open_deals = [d for d in deals if d.get("stage") not in ["closed_won", "closed_lost"]]
193
+
194
+ summary = {
195
+ "total_deals": len(deals),
196
+ "open_deals": len(open_deals),
197
+ "total_pipeline_value": sum(d.get("value", 0) for d in open_deals),
198
+ "weighted_value": sum(
199
+ d.get("value", 0) * d.get("probability", 0) / 100
200
+ for d in open_deals
201
+ ),
202
+ "by_stage": {},
203
+ "by_owner": {},
204
+ "expected_closes_this_month": []
205
+ }
206
+
207
+ # Group by stage
208
+ for deal in deals:
209
+ stage = deal.get("stage", "unknown")
210
+ if stage not in summary["by_stage"]:
211
+ summary["by_stage"][stage] = {"count": 0, "value": 0}
212
+ summary["by_stage"][stage]["count"] += 1
213
+ summary["by_stage"][stage]["value"] += deal.get("value", 0)
214
+
215
+ # Group by owner
216
+ for deal in open_deals:
217
+ owner = deal.get("owner", "Unassigned")
218
+ if owner not in summary["by_owner"]:
219
+ summary["by_owner"][owner] = {"count": 0, "value": 0}
220
+ summary["by_owner"][owner]["count"] += 1
221
+ summary["by_owner"][owner]["value"] += deal.get("value", 0)
222
+
223
+ # Deals expected to close this month
224
+ current_month = datetime.now().strftime("%Y-%m")
225
+ for deal in open_deals:
226
+ close_date = deal.get("expected_close", "")
227
+ if close_date.startswith(current_month):
228
+ summary["expected_closes_this_month"].append({
229
+ "id": deal.get("id"),
230
+ "title": deal.get("title"),
231
+ "value": deal.get("value"),
232
+ "probability": deal.get("probability")
233
+ })
234
+
235
+ return summary
236
+
237
+
238
+ def get_documents() -> Dict:
239
+ """Get list of available documents."""
240
+ documents = load_documents()
241
+ return {
242
+ "total": len(documents),
243
+ "documents": documents,
244
+ "instructions": "Utilize the exact file name as param for the read_document tool"
245
+ }
246
+
247
+
248
+ def read_document(name: str) -> Dict:
249
+ """Read content of a specific document."""
250
+ docs_dir = DATA_DIR / "documents"
251
+ doc_path = docs_dir / name
252
+
253
+ if not doc_path.exists():
254
+ # Try to find partial match
255
+ for doc in docs_dir.glob("*"):
256
+ if name.lower() in doc.name.lower():
257
+ doc_path = doc
258
+ break
259
+
260
+ if not doc_path.exists():
261
+ return {"error": f"Document '{name}' not found"}
262
+
263
+ try:
264
+ with open(doc_path, "r", encoding="utf-8") as f:
265
+ content = f.read()
266
+ return {
267
+ "name": doc_path.name,
268
+ "content": content,
269
+ "size_bytes": len(content),
270
+ "type": doc_path.suffix.lstrip(".")
271
+ }
272
+ except Exception as e:
273
+ return {"error": f"Failed to read document: {str(e)}"}
274
+
275
+
276
+ def search_documents(query: str) -> Dict:
277
+ """Search documents by content."""
278
+ docs_dir = DATA_DIR / "documents"
279
+ query_lower = query.lower()
280
+
281
+ matches = []
282
+ if docs_dir.exists():
283
+ for doc_path in docs_dir.glob("*"):
284
+ if doc_path.is_file():
285
+ try:
286
+ with open(doc_path, "r", encoding="utf-8") as f:
287
+ content = f.read()
288
+
289
+ if query_lower in content.lower() or query_lower in doc_path.name.lower():
290
+ # Find relevant excerpts
291
+ lines = content.split("\n")
292
+ relevant_lines = [
293
+ line.strip() for line in lines
294
+ if query_lower in line.lower()
295
+ ][:3] # Max 3 relevant lines
296
+
297
+ matches.append({
298
+ "name": doc_path.name,
299
+ "type": doc_path.suffix.lstrip("."),
300
+ "relevant_excerpts": relevant_lines
301
+ })
302
+ except Exception:
303
+ continue
304
+
305
+ return {
306
+ "query": query,
307
+ "total_matches": len(matches),
308
+ "documents": matches
309
+ }
310
+
311
+
312
+ def get_access(customer_name: str) -> Dict[str, Any]:
313
+ """Look up whether access is enabled for the given customer."""
314
+ data = load_access_data()
315
+ name_lower = customer_name.lower()
316
+
317
+ for entry in data.get("access", []):
318
+ if entry.get("customer_name", "").lower() == name_lower:
319
+ return entry
320
+
321
+ return {"customer_name": customer_name, "enabled": False}
322
+
323
+
324
+ def set_access(customer_name: str) -> Dict[str, Any]:
325
+ """Set access enabled = true for the given customer."""
326
+ data = load_access_data()
327
+ name_lower = customer_name.lower()
328
+
329
+ found = None
330
+ for entry in data.get("access", []):
331
+ if entry.get("customer_name", "").lower() == name_lower:
332
+ entry["enabled"] = True
333
+ found = entry
334
+ break
335
+
336
+ if not found:
337
+ found = {"customer_name": customer_name, "enabled": True}
338
+ data.setdefault("access", []).append(found)
339
+
340
+ save_access_data(data)
341
+ return found
342
+
343
+
344
+
345
+ # ============================================
346
+ # Tool Registry
347
+ # ============================================
348
+
349
+ TOOLS = {
350
+ "get_customers": {
351
+ "description": "Get list of customers. Optionally filter by status (active/inactive/prospect) or industry.",
352
+ "function": get_customers,
353
+ "inputSchema": {
354
+ "type": "object",
355
+ "properties": {
356
+ "status": {"type": "string", "description": "Filter by status: active, inactive, or prospect"},
357
+ "industry": {"type": "string", "description": "Filter by industry"},
358
+ "limit": {"type": "integer", "description": "Maximum number of results", "default": 50}
359
+ }
360
+ }
361
+ },
362
+ "get_customer": {
363
+ "description": "Get detailed information about a specific customer by name.",
364
+ "function": get_customer,
365
+ "inputSchema": {
366
+ "type": "object",
367
+ "properties": {
368
+ "customer_id": {"type": "string", "description": "Customer ID (e.g., Walnut)"}
369
+ },
370
+ "required": ["customer_id"]
371
+ }
372
+ },
373
+ "get_deals": {
374
+ "description": "Get list of deals/opportunities. Optionally filter by stage, customer, owner, or minimum value.",
375
+ "function": get_deals,
376
+ "inputSchema": {
377
+ "type": "object",
378
+ "properties": {
379
+ "stage": {"type": "string", "description": "Filter by stage: qualification, demo, proposal, negotiation, closed_won, closed_lost"},
380
+ "customer_id": {"type": "string", "description": "Filter by customer ID"},
381
+ "owner": {"type": "string", "description": "Filter by deal owner name"},
382
+ "min_value": {"type": "number", "description": "Minimum deal value"},
383
+ "limit": {"type": "integer", "description": "Maximum number of results", "default": 50}
384
+ }
385
+ }
386
+ },
387
+ "get_deal": {
388
+ "description": "Get detailed information about a specific deal by ID.",
389
+ "function": get_deal,
390
+ "inputSchema": {
391
+ "type": "object",
392
+ "properties": {
393
+ "deal_id": {"type": "string", "description": "Deal ID (e.g., DEAL-001)"}
394
+ },
395
+ "required": ["deal_id"]
396
+ }
397
+ },
398
+ "get_pipeline_summary": {
399
+ "description": "Get sales pipeline summary including totals by stage and owner.",
400
+ "function": get_pipeline_summary,
401
+ "inputSchema": {
402
+ "type": "object",
403
+ "properties": {}
404
+ }
405
+ },
406
+ "get_documents": {
407
+ "description": "Get list of available CRM documents related to the company at hand.",
408
+ "function": get_documents,
409
+ "inputSchema": {
410
+ "type": "object",
411
+ "properties": {}
412
+ }
413
+ },
414
+ "read_document": {
415
+ "description": "Read the content of a specific document by name.",
416
+ "function": read_document,
417
+ "inputSchema": {
418
+ "type": "object",
419
+ "properties": {
420
+ "name": {"type": "string", "description": "Document name or partial name"}
421
+ },
422
+ "required": ["name"]
423
+ }
424
+ },
425
+ "search_documents": {
426
+ "description": "Search documents by content or title.",
427
+ "function": search_documents,
428
+ "inputSchema": {
429
+ "type": "object",
430
+ "properties": {
431
+ "query": {"type": "string", "description": "Search query"}
432
+ },
433
+ "required": ["query"]
434
+ }
435
+ },
436
+ "get_access": {
437
+ "description": "Check whether endpoint access is enabled for a given customer.",
438
+ "function": get_access,
439
+ "inputSchema": {
440
+ "type": "object",
441
+ "properties": {
442
+ "customer_name": {
443
+ "type": "string",
444
+ "description": "Customer company name"
445
+ }
446
+ },
447
+ "required": ["customer_name"]
448
+ }
449
+ },
450
+ "set_access": {
451
+ "description": "Enable endpoint access for a given customer (sets enabled=true in access.json).",
452
+ "function": set_access,
453
+ "inputSchema": {
454
+ "type": "object",
455
+ "properties": {
456
+ "customer_name": {
457
+ "type": "string",
458
+ "description": "Customer company name"
459
+ }
460
+ },
461
+ "required": ["customer_name"]
462
+ }
463
+ }
464
+ }
465
+
466
+
467
+ # ============================================
468
+ # API Endpoints
469
+ # ============================================
470
+
471
+ @app.get("/")
472
+ async def root():
473
+ """Health check endpoint."""
474
+ return {"status": "ok", "service": "CRM MCP Server", "version": "1.0.0"}
475
+
476
+
477
+ @app.get("/tools")
478
+ async def list_tools():
479
+ """List all available tools."""
480
+ tools_list = []
481
+ for name, config in TOOLS.items():
482
+ tools_list.append({
483
+ "name": name,
484
+ "description": config["description"],
485
+ "inputSchema": config["inputSchema"]
486
+ })
487
+ return {"tools": tools_list}
488
+
489
+
490
+ @app.post("/tools/call")
491
+ async def call_tool(request: ToolCallRequest):
492
+ """Execute a tool by name with arguments."""
493
+ tool_name = request.name
494
+ arguments = request.arguments
495
+
496
+ if tool_name not in TOOLS:
497
+ return ToolResponse(
498
+ success=False,
499
+ error=f"Unknown tool: {tool_name}"
500
+ )
501
+
502
+ try:
503
+ tool_func = TOOLS[tool_name]["function"]
504
+ result = tool_func(**arguments)
505
+ return ToolResponse(success=True, result=result)
506
+ except Exception as e:
507
+ logger.error(f"Tool execution error: {e}")
508
+ return ToolResponse(success=False, error=str(e))
509
+
510
+
511
+ # ============================================
512
+ # Main Entry Point
513
+ # ============================================
514
+
515
+ if __name__ == "__main__":
516
+ import uvicorn
517
+
518
+ # Ensure data directory exists
519
+ DATA_DIR.mkdir(parents=True, exist_ok=True)
520
+ (DATA_DIR / "documents").mkdir(exist_ok=True)
521
+
522
+ logger.info(f"Starting CRM MCP Server...")
523
+ logger.info(f"Data directory: {DATA_DIR}")
524
+
525
+ uvicorn.run(app, host="0.0.0.0", port=8000)
data/access.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "access": [
3
+ {
4
+ "customer_name": "Walnut",
5
+ "enabled": true
6
+ },
7
+ {
8
+ "customer_name": "TechStart Inc",
9
+ "enabled": false
10
+ }
11
+ ]
12
+ }
data/customers.json ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "customers": [
3
+ {
4
+ "id": "Walnut",
5
+ "name": "Walnut",
6
+ "contact_name": "Maria Brown",
7
+ "annual_revenue": "$50M",
8
+ "notes": "Key account. Had previous issues with integration endpoints.",
9
+ "tags": ["enterprise", "manufacturing", "high-value"]
10
+ },
11
+ {
12
+ "id": "TechStart Inc",
13
+ "name": "TechStart Inc",
14
+ "contact_name": "Sarah Johnson",
15
+ "contact_email": "sarah@techstart.io",
16
+ "contact_phone": "+1-555-0102",
17
+ "industry": "Technology",
18
+ "company_size": "50-100",
19
+ "annual_revenue": "$5M-$10M",
20
+ "status": "active",
21
+ "created_at": "2023-06-22",
22
+ "last_contact": "2025-05-18",
23
+ "notes": "Fast-growing startup. Looking for scalable solutions.",
24
+ "tags": ["startup", "tech", "growth"]
25
+ },
26
+ {
27
+ "id": "Global Finance Ltd",
28
+ "name": "Global Finance Ltd",
29
+ "contact_name": "Michael Chen",
30
+ "contact_email": "m.chen@globalfinance.com",
31
+ "contact_phone": "+1-555-0103",
32
+ "industry": "Financial Services",
33
+ "company_size": "1000+",
34
+ "annual_revenue": "$500M+",
35
+ "status": "active",
36
+ "created_at": "2022-11-08",
37
+ "last_contact": "2025-05-22",
38
+ "notes": "Enterprise client. Strict compliance requirements.",
39
+ "tags": ["enterprise", "finance", "compliance", "high-value"]
40
+ },
41
+ {
42
+ "id": "CUST-004",
43
+ "name": "Green Energy Solutions",
44
+ "contact_name": "Emma Davis",
45
+ "contact_email": "emma.davis@greenenergy.com",
46
+ "contact_phone": "+1-555-0104",
47
+ "industry": "Energy",
48
+ "company_size": "100-500",
49
+ "annual_revenue": "$20M-$50M",
50
+ "status": "active",
51
+ "created_at": "2024-02-14",
52
+ "last_contact": "2025-05-15",
53
+ "notes": "Sustainability-focused. Interested in IoT monitoring.",
54
+ "tags": ["energy", "sustainability", "iot"]
55
+ },
56
+ {
57
+ "id": "CUST-005",
58
+ "name": "HealthCare Plus",
59
+ "contact_name": "Dr. Robert Wilson",
60
+ "contact_email": "rwilson@healthcareplus.org",
61
+ "contact_phone": "+1-555-0105",
62
+ "industry": "Healthcare",
63
+ "company_size": "500-1000",
64
+ "annual_revenue": "$100M-$500M",
65
+ "status": "active",
66
+ "created_at": "2023-09-30",
67
+ "last_contact": "2025-05-21",
68
+ "notes": "Hospital network. HIPAA compliance is critical.",
69
+ "tags": ["healthcare", "compliance", "enterprise"]
70
+ },
71
+ {
72
+ "id": "CUST-006",
73
+ "name": "RetailMax",
74
+ "contact_name": "Lisa Anderson",
75
+ "contact_email": "l.anderson@retailmax.com",
76
+ "contact_phone": "+1-555-0106",
77
+ "industry": "Retail",
78
+ "company_size": "1000+",
79
+ "annual_revenue": "$200M-$500M",
80
+ "status": "inactive",
81
+ "created_at": "2022-05-18",
82
+ "last_contact": "2024-12-10",
83
+ "notes": "Churned due to budget cuts. Potential to re-engage in Q3.",
84
+ "tags": ["retail", "enterprise", "churned"]
85
+ },
86
+ {
87
+ "id": "CUST-007",
88
+ "name": "EduLearn Academy",
89
+ "contact_name": "Prof. James Taylor",
90
+ "contact_email": "jtaylor@edulearn.edu",
91
+ "contact_phone": "+1-555-0107",
92
+ "industry": "Education",
93
+ "company_size": "100-500",
94
+ "annual_revenue": "$10M-$20M",
95
+ "status": "prospect",
96
+ "created_at": "2025-03-01",
97
+ "last_contact": "2025-05-19",
98
+ "notes": "University looking for LMS integration. Demo scheduled.",
99
+ "tags": ["education", "prospect", "demo-scheduled"]
100
+ },
101
+ {
102
+ "id": "CUST-008",
103
+ "name": "LogiTrans Shipping",
104
+ "contact_name": "Carlos Rodriguez",
105
+ "contact_email": "carlos@logitrans.com",
106
+ "contact_phone": "+1-555-0108",
107
+ "industry": "Logistics",
108
+ "company_size": "500-1000",
109
+ "annual_revenue": "$50M-$100M",
110
+ "status": "active",
111
+ "created_at": "2023-04-12",
112
+ "last_contact": "2025-05-17",
113
+ "notes": "Fleet management client. Expanding to new regions.",
114
+ "tags": ["logistics", "fleet", "expansion"]
115
+ }
116
+ ]
117
+ }
data/deals.json ADDED
@@ -0,0 +1,146 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "deals": [
3
+ {
4
+ "id": "DEAL-001",
5
+ "title": "Wolf Solutions",
6
+ "customer_id": "Wolf Solutions",
7
+ "customer_name": "Wolf Solutions",
8
+ "value": 250000,
9
+ "currency": "USD",
10
+ "stage": "negotiation",
11
+ "probability": 75,
12
+ "expected_close": "2025-06-30",
13
+ "created_at": "2025-02-15",
14
+ "owner": "Maria Brown",
15
+ "description": "Full cloud infrastructure migration including data center consolidation.",
16
+ "next_action": "Send revised proposal with volume discount",
17
+ "competitors": ["CloudCorp", "SkyNet Solutions"]
18
+ },
19
+ {
20
+ "id": "DEAL-002",
21
+ "title": "TechStart Platform License",
22
+ "customer_id": "CUST-002",
23
+ "customer_name": "TechStart Inc",
24
+ "value": 45000,
25
+ "currency": "USD",
26
+ "stage": "proposal",
27
+ "probability": 60,
28
+ "expected_close": "2025-07-15",
29
+ "created_at": "2025-04-01",
30
+ "owner": "Bob Martinez",
31
+ "description": "Annual platform license with premium support.",
32
+ "next_action": "Schedule technical deep-dive with their CTO",
33
+ "competitors": ["OpenPlatform"]
34
+ },
35
+ {
36
+ "id": "DEAL-003",
37
+ "title": "Global Finance Security Suite",
38
+ "customer_id": "CUST-003",
39
+ "customer_name": "Global Finance Ltd",
40
+ "value": 750000,
41
+ "currency": "USD",
42
+ "stage": "closed_won",
43
+ "probability": 100,
44
+ "expected_close": "2025-05-01",
45
+ "closed_at": "2025-04-28",
46
+ "created_at": "2024-11-10",
47
+ "owner": "Alice Brown",
48
+ "description": "Enterprise security suite with compliance modules.",
49
+ "next_action": "Kickoff implementation meeting",
50
+ "competitors": []
51
+ },
52
+ {
53
+ "id": "DEAL-004",
54
+ "title": "Green Energy IoT Pilot",
55
+ "customer_id": "CUST-004",
56
+ "customer_name": "Green Energy Solutions",
57
+ "value": 85000,
58
+ "currency": "USD",
59
+ "stage": "qualification",
60
+ "probability": 40,
61
+ "expected_close": "2025-08-30",
62
+ "created_at": "2025-05-01",
63
+ "owner": "Carol White",
64
+ "description": "IoT monitoring pilot for 3 solar farms.",
65
+ "next_action": "Conduct site assessment",
66
+ "competitors": ["IoTech", "SensorFlow"]
67
+ },
68
+ {
69
+ "id": "DEAL-005",
70
+ "title": "HealthCare Plus EHR Integration",
71
+ "customer_id": "CUST-005",
72
+ "customer_name": "HealthCare Plus",
73
+ "value": 320000,
74
+ "currency": "USD",
75
+ "stage": "negotiation",
76
+ "probability": 80,
77
+ "expected_close": "2025-06-15",
78
+ "created_at": "2025-01-20",
79
+ "owner": "Bob Martinez",
80
+ "description": "EHR system integration with existing hospital network.",
81
+ "next_action": "Legal review of BAA agreement",
82
+ "competitors": ["MedTech Systems"]
83
+ },
84
+ {
85
+ "id": "DEAL-006",
86
+ "title": "EduLearn LMS Package",
87
+ "customer_id": "CUST-007",
88
+ "customer_name": "EduLearn Academy",
89
+ "value": 120000,
90
+ "currency": "USD",
91
+ "stage": "demo",
92
+ "probability": 50,
93
+ "expected_close": "2025-09-01",
94
+ "created_at": "2025-03-15",
95
+ "owner": "Carol White",
96
+ "description": "Learning management system for 5000 students.",
97
+ "next_action": "Demo scheduled for May 25th",
98
+ "competitors": ["Canvas", "Moodle"]
99
+ },
100
+ {
101
+ "id": "DEAL-007",
102
+ "title": "LogiTrans Fleet Expansion",
103
+ "customer_id": "CUST-008",
104
+ "customer_name": "LogiTrans Shipping",
105
+ "value": 180000,
106
+ "currency": "USD",
107
+ "stage": "proposal",
108
+ "probability": 65,
109
+ "expected_close": "2025-07-30",
110
+ "created_at": "2025-04-10",
111
+ "owner": "Alice Brown",
112
+ "description": "Expand fleet management to 500 additional vehicles.",
113
+ "next_action": "Finalize pricing for Latin America region",
114
+ "competitors": ["FleetTrack Pro"]
115
+ },
116
+ {
117
+ "id": "DEAL-008",
118
+ "title": "Acme Support Renewal",
119
+ "customer_id": "CUST-001",
120
+ "customer_name": "Acme Corporation",
121
+ "value": 75000,
122
+ "currency": "USD",
123
+ "stage": "closed_won",
124
+ "probability": 100,
125
+ "expected_close": "2025-03-31",
126
+ "closed_at": "2025-03-28",
127
+ "created_at": "2025-02-01",
128
+ "owner": "Alice Brown",
129
+ "description": "Annual premium support renewal.",
130
+ "next_action": "None - deal closed",
131
+ "competitors": []
132
+ }
133
+ ],
134
+ "pipeline_summary": {
135
+ "total_pipeline_value": 1825000,
136
+ "weighted_pipeline_value": 1147500,
137
+ "deals_by_stage": {
138
+ "qualification": 1,
139
+ "demo": 1,
140
+ "proposal": 2,
141
+ "negotiation": 2,
142
+ "closed_won": 2
143
+ },
144
+ "average_deal_size": 228125
145
+ }
146
+ }
data/documents/walnut integration roadmap.md ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Integration roadmap – Endpoint services
2
+
3
+ Phase 1 – Preparation (Week 1) - Done
4
+ - Confirm scope and target environments.
5
+ - Enable API / endpoint access for staging tenant.
6
+ - Share API keys & documentation with customer team.
7
+
8
+ Phase 2 – Initial integration (Weeks 2–3) - Done
9
+ - Implement authentication flow against Atlas endpoint services.
10
+ - Set up test events (customer updates, subscription changes).
11
+ - Validate logging and error handling.
12
+
13
+ Phase 3 – Pilot rollout (Weeks 4–5) - Ongoing
14
+ - Endpoint is enabled for the Staging environments.
15
+
16
+ Phase 4 – Full rollout (Week 6+) - Scheduled: ETA December 1st
17
+ - Enable the endpoints for the production environments.
data/documents/walnut meeting minutes.md ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Meeting minutes – Subscription renewal update (10 min check-in)
2
+
3
+ - Date: 2025-11-15
4
+ - Attendees: Andrei Zamfir (CSM) + Georgina Espinosa (Technical Account Manager)
5
+ - Topic: Subscription renewal status
6
+
7
+ Key points:
8
+ - Reviewed current 12-month subscription, renewal due 2026-01-31.
9
+ - Customer confirmed they intend to renew at current ARR.
10
+ - Discussed progress on new feature integration (endpoint services).
11
+ - Blocker: pending internal review for access request.
12
+
13
+ Next steps:
14
+ - Customer to complete access review by next status update meeting.
15
+ - CSM to send recap email and share integration roadmap document.
requirements.txt ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Core dependencies
2
+ gradio>=6.0.0
3
+ huggingface_hub>=0.20.0
4
+ openai>=1.0.0
5
+ python-dotenv>=1.0.0
6
+
7
+ # FastAPI for MCP server
8
+ fastapi>=0.100.0
9
+ uvicorn>=0.24.0
10
+ pydantic>=2.0.0
11
+
12
+ # Audio processing
13
+ numpy>=1.24.0
14
+
15
+ # Screen capture
16
+ mss>=9.0.0
17
+ Pillow>=10.0.0
18
+
19
+ # Voice Activity Detection (optional)
20
+ # Install these for automatic speech detection:
21
+ # pyaudio>=0.2.13
22
+ # webrtcvad>=2.0.10
23
+
24
+ # HTTP client
25
+ requests>=2.31.0
services/audio_service.py ADDED
@@ -0,0 +1,121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Audio Service - Speech-to-Text and Text-to-Speech.
3
+ """
4
+
5
+ import io
6
+ import logging
7
+ import tempfile
8
+ import asyncio
9
+ from typing import Optional, Union
10
+
11
+ from huggingface_hub import InferenceClient
12
+
13
+ from config.settings import Settings
14
+
15
+ logger = logging.getLogger(__name__)
16
+
17
+
18
+ class AudioService:
19
+ """Audio service for STT and TTS."""
20
+
21
+ def __init__(
22
+ self,
23
+ api_key: str,
24
+ stt_provider: str = "fal-ai",
25
+ stt_model: str = "openai/whisper-large-v3",
26
+ tts_model: str = "canopylabs/orpheus-3b-0.1-ft",
27
+ ):
28
+ """
29
+ Initialize audio service.
30
+
31
+ Args:
32
+ api_key: Hugging Face API token
33
+ stt_provider: Provider for speech-to-text
34
+ stt_model: ASR model ID
35
+ tts_model: TTS model ID
36
+ """
37
+ self.api_key = api_key
38
+ self.stt_model = stt_model
39
+ self.tts_model = tts_model
40
+
41
+ # STT client
42
+ logger.debug(f"Initializing ASR client with provider={stt_provider}")
43
+ self.asr_client = InferenceClient(
44
+ provider=stt_provider,
45
+ api_key=self.api_key,
46
+ )
47
+
48
+ # TTS client
49
+ logger.debug(f"Initializing TTS client")
50
+ self.tts_client = InferenceClient(token=self.api_key)
51
+
52
+ logger.info(f"AudioService configured: ASR={self.stt_model}, TTS={self.tts_model}")
53
+
54
+ async def speech_to_text(self, audio_input: Union[str, bytes, io.BytesIO]) -> str:
55
+ """
56
+ Convert speech to text.
57
+
58
+ Args:
59
+ audio_input: File path, bytes, or BytesIO of audio
60
+
61
+ Returns:
62
+ Transcribed text
63
+ """
64
+ # Prepare input path
65
+ if isinstance(audio_input, str):
66
+ input_path = audio_input
67
+ logger.debug(f"Using existing file for ASR: {input_path}")
68
+ else:
69
+ data = audio_input.getvalue() if isinstance(audio_input, io.BytesIO) else audio_input
70
+ tmp = tempfile.NamedTemporaryFile(delete=False, suffix=".wav")
71
+ tmp.write(data)
72
+ tmp.close()
73
+ input_path = tmp.name
74
+ logger.debug(f"Wrote audio to temp file for ASR: {input_path}")
75
+
76
+ try:
77
+ logger.info(f"Calling ASR model={self.stt_model}")
78
+ result = await asyncio.get_event_loop().run_in_executor(
79
+ None,
80
+ lambda: self.asr_client.automatic_speech_recognition(
81
+ input_path,
82
+ model=self.stt_model,
83
+ )
84
+ )
85
+
86
+ transcript = result.get("text") if isinstance(result, dict) else getattr(result, "text", "")
87
+ logger.info(f"ASR success, transcript length={len(transcript)}")
88
+ return transcript or ""
89
+
90
+ except Exception as e:
91
+ logger.error(f"ASR error: {e}", exc_info=True)
92
+ return ""
93
+
94
+ async def text_to_speech(self, text: str) -> Optional[bytes]:
95
+ """
96
+ Convert text to speech.
97
+
98
+ Args:
99
+ text: Text to synthesize
100
+
101
+ Returns:
102
+ Audio bytes or None
103
+ """
104
+ if not text.strip():
105
+ logger.debug("Empty text input for TTS")
106
+ return None
107
+
108
+ def _call_tts():
109
+ try:
110
+ return self.tts_client.text_to_speech(text, model=self.tts_model)
111
+ except StopIteration as e:
112
+ raise RuntimeError(f"StopIteration in TTS call: {e}")
113
+
114
+ try:
115
+ logger.info(f"Calling TTS model={self.tts_model}, text length={len(text)}")
116
+ audio = await asyncio.get_event_loop().run_in_executor(None, _call_tts)
117
+ logger.info(f"TTS success, received {len(audio)} bytes")
118
+ return audio
119
+ except Exception as e:
120
+ logger.error(f"TTS error: {e}", exc_info=True)
121
+ return None
services/llm_service.py ADDED
@@ -0,0 +1,167 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ LLM Service - Chat completions via HuggingFace.
3
+ """
4
+
5
+ import logging
6
+ from typing import Dict, List, Optional, Any
7
+ from dataclasses import dataclass
8
+
9
+ from huggingface_hub import InferenceClient
10
+
11
+ from config.settings import Settings
12
+
13
+ logger = logging.getLogger(__name__)
14
+
15
+
16
+ @dataclass
17
+ class LLMConfig:
18
+ """LLM configuration."""
19
+ api_key: str
20
+ model_name: str
21
+ temperature: float = 0.01
22
+ max_tokens: int = 512
23
+
24
+
25
+ class LLMService:
26
+ """
27
+ LLM service using HuggingFace InferenceClient.
28
+ """
29
+
30
+ def __init__(
31
+ self,
32
+ api_key: Optional[str] = None,
33
+ model_name: Optional[str] = None,
34
+ ):
35
+ """
36
+ Initialize LLM service.
37
+
38
+ Args:
39
+ api_key: HuggingFace API key
40
+ model_name: Model name/ID
41
+ """
42
+ settings = Settings()
43
+
44
+ key = api_key or settings.hf_token
45
+ name = model_name or settings.effective_model_name
46
+
47
+ self.config = LLMConfig(
48
+ api_key=key,
49
+ model_name=name,
50
+ temperature=settings.hf_temperature,
51
+ max_tokens=settings.hf_max_new_tokens,
52
+ )
53
+
54
+ self.client = InferenceClient(token=self.config.api_key)
55
+
56
+ logger.info(f"LLMService initialized with model: {self.config.model_name}")
57
+
58
+ async def get_chat_completion(
59
+ self,
60
+ messages: List[Dict[str, str]],
61
+ temperature: Optional[float] = None,
62
+ max_tokens: Optional[int] = None,
63
+ ) -> str:
64
+ """
65
+ Get chat completion from the model.
66
+
67
+ Args:
68
+ messages: List of message dicts with 'role' and 'content'
69
+ temperature: Override temperature
70
+ max_tokens: Override max tokens
71
+
72
+ Returns:
73
+ Assistant response text
74
+ """
75
+ logger.debug(f"Chat completion request with model: {self.config.model_name}")
76
+
77
+ try:
78
+ response = self.client.chat_completion(
79
+ messages=messages,
80
+ model=self.config.model_name,
81
+ max_tokens=max_tokens or self.config.max_tokens,
82
+ temperature=temperature or self.config.temperature
83
+ )
84
+
85
+ content = response.choices[0].message.content
86
+ logger.debug(f"Chat completion response: {content[:200]}...")
87
+
88
+ return content
89
+
90
+ except Exception as e:
91
+ logger.error(f"Chat completion error: {str(e)}")
92
+ raise Exception(f"LLM completion error: {str(e)}")
93
+
94
+ async def get_streaming_completion(
95
+ self,
96
+ messages: List[Dict[str, str]],
97
+ temperature: Optional[float] = None,
98
+ max_tokens: Optional[int] = None,
99
+ ):
100
+ """
101
+ Get streaming chat completion.
102
+
103
+ Yields:
104
+ Text chunks as they're generated
105
+ """
106
+ logger.debug(f"Streaming completion request with model: {self.config.model_name}")
107
+
108
+ try:
109
+ stream = self.client.chat_completion(
110
+ messages=messages,
111
+ model=self.config.model_name,
112
+ max_tokens=max_tokens or self.config.max_tokens,
113
+ temperature=temperature or self.config.temperature,
114
+ stream=True
115
+ )
116
+
117
+ for chunk in stream:
118
+ if chunk.choices and chunk.choices[0].delta.content:
119
+ yield chunk.choices[0].delta.content
120
+
121
+ except Exception as e:
122
+ logger.error(f"Streaming completion error: {str(e)}")
123
+ raise Exception(f"LLM streaming error: {str(e)}")
124
+
125
+ def build_messages_with_tools(
126
+ self,
127
+ system_prompt: str,
128
+ user_input: str,
129
+ tools_description: str = "",
130
+ conversation_history: Optional[List[Dict[str, str]]] = None,
131
+ tool_results: Optional[str] = None,
132
+ ) -> List[Dict[str, str]]:
133
+ """
134
+ Build messages array with tools and context.
135
+
136
+ Args:
137
+ system_prompt: System instruction
138
+ user_input: User's message
139
+ tools_description: Available tools description
140
+ conversation_history: Previous messages
141
+ tool_results: Results from tool execution
142
+
143
+ Returns:
144
+ Messages array for chat completion
145
+ """
146
+ messages = [{"role": "system", "content": system_prompt}]
147
+
148
+ if tools_description:
149
+ messages.append({
150
+ "role": "system",
151
+ "content": f"Available tools:\n{tools_description}"
152
+ })
153
+
154
+ # Add conversation history
155
+ if conversation_history:
156
+ for msg in conversation_history[-10:]: # Last 10 messages
157
+ if msg.get("role") in ["user", "assistant"]:
158
+ messages.append(msg)
159
+
160
+ # Add current user input
161
+ messages.append({"role": "user", "content": user_input})
162
+
163
+ # Add tool results if present
164
+ if tool_results:
165
+ messages.append({"role": "assistant", "content": tool_results})
166
+
167
+ return messages
services/mcp_client.py ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # services/mcp_client.py
2
+ import requests
3
+ from typing import Any, Dict, List
4
+ from config.settings import Settings
5
+
6
+ class MCPClient:
7
+ def __init__(self, base_url: str | None = None):
8
+ settings = Settings()
9
+ self.base_url = (base_url or settings.mcp_server_url).rstrip("/")
10
+
11
+ def list_tools(self) -> List[Dict[str, Any]]:
12
+ resp = requests.get(f"{self.base_url}/tools", timeout=5)
13
+ resp.raise_for_status()
14
+ data = resp.json()
15
+ return data.get("tools", [])
16
+
17
+ def call_tool(self, name: str, arguments: Dict[str, Any] | None = None) -> Any:
18
+ payload = {"name": name, "arguments": arguments or {}}
19
+ resp = requests.post(f"{self.base_url}/tools/call", json=payload, timeout=30)
20
+ resp.raise_for_status()
21
+ data = resp.json()
22
+ if not data.get("success", False):
23
+ raise RuntimeError(data.get("error", "Unknown tool error"))
24
+ return data.get("result")
services/screen_service.py ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Screen Service - Simple screenshot capture.
3
+ Just captures screen and returns base64 image.
4
+ """
5
+
6
+ import base64
7
+ import io
8
+ import logging
9
+ import time
10
+ from typing import Optional, Dict, Any
11
+ from dataclasses import dataclass
12
+
13
+ try:
14
+ import mss
15
+ MSS_AVAILABLE = True
16
+ except ImportError:
17
+ MSS_AVAILABLE = False
18
+
19
+ from PIL import Image
20
+
21
+ logger = logging.getLogger(__name__)
22
+
23
+
24
+ @dataclass
25
+ class ScreenCapture:
26
+ """Represents a captured screen frame."""
27
+ timestamp: float
28
+ image_b64: str
29
+ width: int
30
+ height: int
31
+
32
+
33
+ class ScreenService:
34
+ """Simple screen capture service."""
35
+
36
+ def __init__(
37
+ self,
38
+ monitor: int = 1,
39
+ max_width: int = 1920,
40
+ max_height: int = 1080,
41
+ compression_quality: int = 85,
42
+ ):
43
+ self.monitor = monitor
44
+ self.max_width = max_width
45
+ self.max_height = max_height
46
+ self.compression_quality = compression_quality
47
+
48
+ if not MSS_AVAILABLE:
49
+ logger.warning("mss not available. Screen capture disabled.")
50
+
51
+ def is_available(self) -> bool:
52
+ """Check if screen capture is available."""
53
+ return MSS_AVAILABLE
54
+
55
+ def _process_image(self, img: Image.Image) -> Image.Image:
56
+ """Process and resize image."""
57
+ if img.mode != "RGB":
58
+ img = img.convert("RGB")
59
+ w, h = img.size
60
+ ar = w / h
61
+ if w > self.max_width or h > self.max_height:
62
+ if ar > 1:
63
+ new_w = min(w, self.max_width)
64
+ new_h = int(new_w / ar)
65
+ else:
66
+ new_h = min(h, self.max_height)
67
+ new_w = int(new_h * ar)
68
+ img = img.resize((new_w, new_h), Image.Resampling.LANCZOS)
69
+ return img
70
+
71
+ def _image_to_base64(self, img: Image.Image) -> str:
72
+ """Convert image to base64 string."""
73
+ buf = io.BytesIO()
74
+ img.save(buf, format="JPEG", quality=self.compression_quality, optimize=True)
75
+ return base64.b64encode(buf.getvalue()).decode("utf-8")
76
+
77
+ def capture(self) -> Optional[ScreenCapture]:
78
+ """
79
+ Capture a screenshot and return base64.
80
+
81
+ Returns:
82
+ ScreenCapture object or None if failed
83
+ """
84
+ if not MSS_AVAILABLE:
85
+ logger.error("Screen capture not available")
86
+ return None
87
+
88
+ try:
89
+ with mss.mss() as sct:
90
+ mon = sct.monitors[self.monitor]
91
+ frame = sct.grab(mon)
92
+ pil = Image.frombytes("RGB", frame.size, frame.bgra, "raw", "BGRX")
93
+ pil = self._process_image(pil)
94
+ b64 = self._image_to_base64(pil)
95
+
96
+ return ScreenCapture(
97
+ timestamp=time.time(),
98
+ image_b64=b64,
99
+ width=pil.width,
100
+ height=pil.height
101
+ )
102
+ except Exception as e:
103
+ logger.error(f"Screen capture error: {e}")
104
+ return None
105
+
106
+
107
+ # Singleton
108
+ _instance: Optional[ScreenService] = None
109
+
110
+ def get_screen_service() -> ScreenService:
111
+ """Get singleton screen service."""
112
+ global _instance
113
+ if _instance is None:
114
+ _instance = ScreenService()
115
+ return _instance