n0v33n commited on
Commit
dafad66
·
1 Parent(s): cf10f4b

initial commit

Browse files
Files changed (5) hide show
  1. Dockerfile +13 -0
  2. README.md +124 -5
  3. agent.py +645 -0
  4. app.py +239 -0
  5. requirements.txt +9 -0
Dockerfile ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.12-slim
2
+ WORKDIR /app
3
+ RUN apt-get update && apt-get install -y \
4
+ gcc \
5
+ g++ \
6
+ libffi-dev \
7
+ libssl-dev \
8
+ python3-dev \
9
+ && rm -rf /var/lib/apt/lists/*
10
+ COPY app.py .
11
+ RUN pip install --no-cache-dir requirements.txt
12
+ EXPOSE 7860
13
+ CMD ["python", "app.py"]
README.md CHANGED
@@ -1,11 +1,130 @@
1
  ---
2
- title: Misty Climate Agent
3
- emoji: 🏃
4
- colorFrom: pink
5
- colorTo: indigo
6
  sdk: docker
7
  pinned: false
8
  short_description: This is a agent created using mistral models
 
 
 
9
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
+ title: MistyClimate Agent
3
+ emoji: 📈
4
+ colorFrom: red
5
+ colorTo: pink
6
  sdk: docker
7
  pinned: false
8
  short_description: This is a agent created using mistral models
9
+ tags:
10
+ - agent-demo-track
11
+ Usage: Mistral
12
  ---
13
+ # MistyClimate Agent 📈
14
+
15
+ This is an agent created using Mistral models, designed to process climate-related documents, analyze images, perform JSON data analysis, and convert text to speech. It provides a multi-agent system for document processing, image analysis, JSON analysis, and text-to-speech functionalities, all integrated into a user-friendly Gradio interface.
16
+
17
+ ## Video Demo
18
+ Below is an embedded YouTube video demonstrating the Link2Doc MCP Server for the Hackathon:
19
+
20
+ <div style="text-align: center; margin: 20px 0;">
21
+ <iframe width="560" height="400" src="" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
22
+ </div>
23
+ ## Features
24
+
25
+ - **Document Processing**: Extract structured data from climate-related PDFs using OCR capabilities.
26
+ - **Image Analysis**: Analyze image-based documents (e.g., PNG, JPG, PDF) to extract text, charts, and tables.
27
+ - **JSON Analysis**: Analyze JSON data to extract insights and patterns, with a focus on climate data.
28
+ - **Text-to-Speech**: Convert text analysis into speech using the gTTS library.
29
+ - **Gradio Interface**: A web-based UI to interact with all features seamlessly.
30
+
31
+ ## Setup
32
+
33
+ This project is containerized using Docker and deployed on a Gradio Space. Follow the steps below to set up and run the project locally or on Hugging Face Spaces.
34
+
35
+ ### Prerequisites
36
+
37
+ - Docker (if running locally)
38
+ - A Mistral API key (obtain from [Mistral AI](https://mistral.ai/))
39
+ - Python 3.10+ (if running locally without Docker)
40
+
41
+ ### Installation
42
+
43
+ 1. **Clone the Repository** (if running locally):
44
+ ```bash
45
+ git clone <repository-url>
46
+ cd <repository-directory>
47
+ ```
48
+
49
+ 2. **Install Dependencies**:
50
+ The project uses a `requirements.txt` file to manage dependencies. If running locally without Docker, install the dependencies using:
51
+ ```bash
52
+ pip install -r requirements.txt
53
+ ```
54
+
55
+ 3. **Set Up the Mistral API Key**:
56
+ - You will need a Mistral API key to use the Mistral models.
57
+ - In the Gradio interface, input your API key in the "Mistral API Key" field.
58
+
59
+ 4. **Run with Docker** (recommended for local testing):
60
+ - Build the Docker image:
61
+ ```bash
62
+ docker build -t mistyclimate-agent .
63
+ ```
64
+ - Run the Docker container:
65
+ ```bash
66
+ docker run -p 7860:7860 mistyclimate-agent
67
+ ```
68
+ - Access the Gradio interface at `http://localhost:7860`.
69
+
70
+ 5. **Deploy on Hugging Face Spaces**:
71
+ - This project is already configured for Hugging Face Spaces with the `sdk: docker` setting.
72
+ - Push your code to a Hugging Face Space repository.
73
+ - The Space will automatically build and deploy using the provided `Dockerfile`.
74
+
75
+ ## Usage
76
+
77
+ 1. **Access the Gradio Interface**:
78
+ - If running locally, open `http://localhost:7860` in your browser.
79
+ - If deployed on Hugging Face Spaces, visit the Space URL.
80
+
81
+ 2. **Enter Your Mistral API Key**:
82
+ - In the Gradio interface, provide your Mistral API key in the designated input field.
83
+
84
+ 3. **Interact with the Tabs**:
85
+ - **Document Processing**:
86
+ - Upload a PDF document (e.g., a climate report).
87
+ - Select the document type (e.g., `climate_report`).
88
+ - Click "Process Document" to extract structured data in JSON format.
89
+ - **Image Analysis**:
90
+ - Upload an image file (PNG, JPG, or PDF).
91
+ - Choose an analysis focus (e.g., `text_extraction`, `chart_analysis`).
92
+ - Click "Analyze Image" to get structured data from the image.
93
+ - **JSON Analysis & Speech**:
94
+ - Input JSON data (e.g., temperature or emissions data).
95
+ - Select an analysis type (e.g., `content`).
96
+ - Click "Run Analysis & Speech" to analyze the JSON and generate a speech output.
97
+ - **Text-to-Speech**:
98
+ - Enter text to convert to speech (e.g., "hello, and good luck for the hackathon").
99
+ - Click "Generate Speech" to produce and play an audio file.
100
+
101
+ ## File Structure
102
+
103
+ - `agent.py`: Core logic for the multi-agent system, including document processing, image analysis, JSON analysis, and text-to-speech workflows.
104
+ - `app.py`: Gradio interface setup and workflow orchestration.
105
+ - `requirements.txt`: List of Python dependencies.
106
+ - `Dockerfile`: Docker configuration for containerizing the app.
107
+ - `README.md`: Project documentation (this file).
108
+
109
+ ## Notes
110
+
111
+ - **File Paths**: In a Gradio Space, files like PDFs, images, and WAVs are handled dynamically via uploads. Output files (e.g., WAVs) are saved to `/tmp/` during runtime.
112
+ - **Mistral API Key**: Ensure you have a valid Mistral API key to use the models. Without it, the workflows will fail.
113
+ - **Docker Deployment**: The project is configured to run in a Docker container, making it compatible with Hugging Face Spaces.
114
+
115
+ ## Configuration Reference
116
+
117
+ For more details on configuring Hugging Face Spaces, refer to the [Hugging Face Spaces Config Reference](https://huggingface.co/docs/hub/spaces-config-reference).
118
+
119
+ ## Tags
120
+
121
+ - `agent-demo-track`
122
+
123
+ ## License
124
+
125
+ This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details (if applicable).
126
+
127
+ ---
128
+
129
+ Built with ❤️ by Samudrala Dinesh Naveen Kumar.
130
 
 
agent.py ADDED
@@ -0,0 +1,645 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import base64
3
+ import json
4
+ import requests
5
+ from typing import Dict, Any, Optional, Union
6
+ from pathlib import Path
7
+ import asyncio
8
+ from mistralai.extra.mcp.sse import MCPClientSSE, SSEServerParams
9
+ from mistralai import Mistral
10
+ from mistralai.models import UserMessage, AssistantMessage, ToolMessage
11
+ from pydantic import BaseModel
12
+ from IPython.display import Audio, display
13
+ import platform
14
+ import subprocess
15
+ import urllib.parse
16
+ from gtts import gTTS
17
+
18
+ # Pydantic Models for structured outputs
19
+ class AnalysisDescription(BaseModel):
20
+ document_type: str
21
+ key_findings: list[str]
22
+ summary: str
23
+ metadata: Dict[str, Any]
24
+ confidence_score: float
25
+
26
+ MODEL = "mistral-medium-latest"
27
+
28
+ def play_wav(url: str, save_path: str = "/tmp/audio.wav"):
29
+ """
30
+ Plays a WAV file from a URL or local file path.
31
+ Args:
32
+ url (str): URL or file path (e.g., file://path/to/file.wav).
33
+ save_path (str, optional): Path to save downloaded files. Defaults to "/tmp/audio.wav".
34
+ Returns:
35
+ str: Status message
36
+ """
37
+ try:
38
+ # Handle local file paths
39
+ if url.startswith("file://"):
40
+ file_path = urllib.parse.urlparse(url).path
41
+ if platform.system() == "Windows":
42
+ # On Windows, remove leading slash AND decode percent-encoding
43
+ file_path = urllib.parse.unquote(file_path.lstrip("/"))
44
+ else:
45
+ file_path = urllib.parse.unquote(file_path)
46
+ print(f"Playing local file: {file_path}")
47
+ else:
48
+ # Download from URL
49
+ print(f"Attempting to download WAV file from {url}...")
50
+ response = requests.get(url, timeout=10)
51
+ response.raise_for_status()
52
+
53
+ with open(save_path, 'wb') as f:
54
+ f.write(response.content)
55
+ print(f"WAV file successfully downloaded and saved to {save_path}")
56
+ file_path = save_path
57
+
58
+ print(f"Attempting to play {file_path}...")
59
+ try:
60
+ # Jupyter playback
61
+ display(Audio(filename=file_path))
62
+ except NameError:
63
+ # Non-Jupyter playback
64
+ if platform.system() == "Windows":
65
+ os.startfile(file_path)
66
+ elif platform.system() == "Darwin": # macOS
67
+ subprocess.run(["open", file_path], check=True)
68
+ else: # Linux
69
+ subprocess.run(["xdg-open", file_path], check=True)
70
+
71
+ return "Audio played successfully"
72
+
73
+ except Exception as e:
74
+ print(f"Error playing audio: {str(e)}")
75
+ return f"Error: {str(e)}"
76
+
77
+ # Create DocAgent for OCR PDF processing
78
+ def create_doc_agent(client: Mistral):
79
+ return client.beta.agents.create(
80
+ model=MODEL,
81
+ name="DocAgent",
82
+ description="Converts OCR PDFs to JSON using document processing capabilities",
83
+ instructions="Process documents by extracting text and structure, then convert to JSON format. Focus on climate-related documents and extract key data points.",
84
+ tools=[
85
+ {
86
+ "type": "function",
87
+ "function": {
88
+ "name": "process_climate_document",
89
+ "description": "Process climate documents from file path or URL and extract structured data",
90
+ "parameters": {
91
+ "type": "object",
92
+ "properties": {
93
+ "file_path": {
94
+ "type": "string",
95
+ "description": "Path to the document file"
96
+ },
97
+ "url": {
98
+ "type": "string",
99
+ "description": "URL to the document"
100
+ },
101
+ "document_type": {
102
+ "type": "string",
103
+ "description": "Type of climate document (report, analysis, data, etc.)"
104
+ }
105
+ }
106
+ }
107
+ }
108
+ }
109
+ ]
110
+ )
111
+
112
+ # Create ImageAgent for image PDF processing
113
+ def create_image_agent(client: Mistral):
114
+ return client.beta.agents.create(
115
+ model=MODEL,
116
+ name="ImageAgent",
117
+ description="Converts image PDFs to JSON using image analysis capabilities",
118
+ instructions="Analyze image-based documents, extract text and visual elements, then structure the data as JSON. Handle charts, graphs, and tabular data effectively.",
119
+ tools=[
120
+ {
121
+ "type": "function",
122
+ "function": {
123
+ "name": "analyze_image",
124
+ "description": "Analyze image documents and extract structured data",
125
+ "parameters": {
126
+ "type": "object",
127
+ "properties": {
128
+ "image_data": {
129
+ "type": "string",
130
+ "description": "Base64-encoded image data"
131
+ },
132
+ "image_format": {
133
+ "type": "string",
134
+ "description": "Image format (png, jpg, pdf, etc.)"
135
+ },
136
+ "analysis_focus": {
137
+ "type": "string",
138
+ "description": "Specific focus for analysis (text_extraction, chart_analysis, table_extraction)"
139
+ }
140
+ },
141
+ "required": ["image_data", "image_format"]
142
+ }
143
+ }
144
+ }
145
+ ]
146
+ )
147
+
148
+ # Create Other Agents (similar changes for JsonAnalyzerAgent and SpeechAgent)
149
+ def create_json_analyzer_agent(client: Mistral):
150
+ return client.beta.agents.create(
151
+ model=MODEL,
152
+ name="JsonAnalyzerAgent",
153
+ description="Analyzes JSON outputs from DocAgent or ImageAgent, producing detailed descriptions",
154
+ instructions="Analyze JSON data structures, identify patterns, extract insights, and provide comprehensive analysis. Output should be structured and detailed.",
155
+ tools=[
156
+ {
157
+ "type": "function",
158
+ "function": {
159
+ "name": "analyze_json_data",
160
+ "description": "Process and analyze JSON data to extract insights and patterns",
161
+ "parameters": {
162
+ "type": "object",
163
+ "properties": {
164
+ "json_data": {
165
+ "type": "object",
166
+ "description": "JSON data to analyze"
167
+ },
168
+ "analysis_type": {
169
+ "type": "string",
170
+ "description": "Type of analysis to perform (statistical, content, structural)"
171
+ }
172
+ },
173
+ "required": ["json_data"]
174
+ }
175
+ }
176
+ }
177
+ ]
178
+ )
179
+
180
+ def create_speech_agent(client: Mistral):
181
+ return client.beta.agents.create(
182
+ model=MODEL,
183
+ name="SpeechAgent",
184
+ description="Converts text analysis from JsonAnalyzerAgent into speech",
185
+ instructions="Convert text analysis into natural speech format. Optimize text for spoken delivery and handle technical content appropriately.",
186
+ tools=[
187
+ {
188
+ "type": "function",
189
+ "function": {
190
+ "name": "text_to_speech",
191
+ "description": "Convert text to speech audio",
192
+ "parameters": {
193
+ "type": "object",
194
+ "properties": {
195
+ "text": {
196
+ "type": "string",
197
+ "description": "Text to convert to speech"
198
+ },
199
+ "voice_settings": {
200
+ "type": "object",
201
+ "properties": {
202
+ "speed": {"type": "number", "default": 1.0},
203
+ "pitch": {"type": "number", "default": 1.0},
204
+ "voice_type": {"type": "string", "default": "neutral"}
205
+ }
206
+ }
207
+ },
208
+ "required": ["text"]
209
+ }
210
+ }
211
+ }
212
+ ]
213
+ )
214
+
215
+ # Helper functions for agent interactions
216
+ def simulate_process_climate_document(file_path: Optional[str] = None, url: Optional[str] = None, document_type: str = "report") -> Dict[str, Any]:
217
+ """Simulate document processing function"""
218
+ return {
219
+ "document_id": "doc_001",
220
+ "source": file_path or url,
221
+ "type": document_type,
222
+ "extracted_text": "Climate change impacts are increasing globally...",
223
+ "key_data": {
224
+ "temperature_increase": "1.5°C",
225
+ "co2_levels": "420ppm",
226
+ "affected_regions": ["Arctic", "Coastal Areas", "Tropical Regions"]
227
+ },
228
+ "metadata": {
229
+ "pages": 45,
230
+ "extraction_confidence": 0.92,
231
+ "processing_time": "2.3s"
232
+ }
233
+ }
234
+
235
+ def simulate_analyze_image(image_data: str, image_format: str, analysis_focus: str = "text_extraction") -> Dict[str, Any]:
236
+ """Simulate image analysis function"""
237
+ return {
238
+ "image_id": "img_001",
239
+ "format": image_format,
240
+ "analysis_type": analysis_focus,
241
+ "extracted_content": {
242
+ "text": "Global Temperature Anomalies 2020-2024",
243
+ "charts": ["line_chart_temperatures", "bar_chart_emissions"],
244
+ "tables": [{"headers": ["Year", "Temperature", "Anomaly"], "rows": 5}]
245
+ },
246
+ "visual_elements": {
247
+ "charts_detected": 2,
248
+ "tables_detected": 1,
249
+ "text_regions": 8
250
+ },
251
+ "confidence": 0.88
252
+ }
253
+
254
+ def simulate_analyze_json_data(json_data: Dict[str, Any], analysis_type: str = "content") -> Dict[str, Any]:
255
+ """Simulate JSON analysis function"""
256
+ return {
257
+ "analysis_summary": "Comprehensive climate document analysis completed",
258
+ "key_insights": [
259
+ "Temperature data shows accelerating warming trend",
260
+ "Regional variations indicate uneven climate impacts",
261
+ "Emission data correlates with temperature increases"
262
+ ],
263
+ "data_quality": {
264
+ "completeness": 0.91,
265
+ "consistency": 0.87,
266
+ "reliability": 0.89
267
+ },
268
+ "recommendations": [
269
+ "Focus on high-impact regions for intervention",
270
+ "Monitor temperature trends quarterly",
271
+ "Implement emission reduction strategies"
272
+ ]
273
+ }
274
+
275
+ def simulate_text_to_speech(text: str, voice_settings: Dict[str, Any] = None) -> str:
276
+ print(f"Converting to speech: {text[:100]}...")
277
+ save_path = "/tmp/generated_speech.wav"
278
+ tts = gTTS(text=text, lang="en")
279
+ tts.save(save_path)
280
+ return f"file://{os.path.abspath(save_path)}"
281
+
282
+ async def process_document_workflow(client: Mistral, file_path: str, document_type: str = "climate_report"):
283
+ print("Starting document processing workflow...")
284
+
285
+ try:
286
+ # Define the tool as a dictionary
287
+ doc_tool = [
288
+ {
289
+ "type": "function",
290
+ "function": {
291
+ "name": "process_climate_document",
292
+ "description": "Process climate documents from file path or URL and extract structured data",
293
+ "parameters": {
294
+ "type": "object",
295
+ "properties": {
296
+ "file_path": {"type": "string", "description": "Path to the document file"},
297
+ "url": {"type": "string", "description": "URL to the document"},
298
+ "document_type": {"type": "string", "description": "Type of climate document"}
299
+ }
300
+ }
301
+ }
302
+ }
303
+ ]
304
+
305
+ messages = [
306
+ UserMessage(content=f"Process the climate document at {file_path} of type {document_type}")
307
+ ]
308
+
309
+ response = await client.chat.complete_async(
310
+ model=MODEL,
311
+ messages=messages,
312
+ tools=doc_tool
313
+ )
314
+
315
+ print("Document processing response:")
316
+ print(response.choices[0].message.content)
317
+
318
+ if response.choices[0].message.tool_calls:
319
+ for tool_call in response.choices[0].message.tool_calls:
320
+ if tool_call.function.name == "process_climate_document":
321
+ doc_result = simulate_process_climate_document(file_path=file_path, document_type=document_type)
322
+ print("Document processing result:")
323
+ print(json.dumps(doc_result, indent=2))
324
+
325
+ return response
326
+
327
+ except Exception as e:
328
+ print(f"Error in document workflow: {str(e)}")
329
+ return None
330
+
331
+ async def process_image_workflow(client: Mistral, image_path: str, analysis_focus: str = "text_extraction"):
332
+ print("Starting image processing workflow...")
333
+
334
+ try:
335
+ # Verify image file exists
336
+ if not os.path.exists(image_path):
337
+ raise FileNotFoundError(f"Image file not found: {image_path}")
338
+
339
+ # Convert image to base64
340
+ with open(image_path, "rb") as image_file:
341
+ image_data = base64.b64encode(image_file.read()).decode("utf-8")
342
+
343
+ # Define image analysis tool
344
+ image_tool = [
345
+ {
346
+ "type": "function",
347
+ "function": {
348
+ "name": "analyze_image",
349
+ "description": "Analyze image documents and extract structured data",
350
+ "parameters": {
351
+ "type": "object",
352
+ "properties": {
353
+ "image_data": {"type": "string", "description": "Base64-encoded image data"},
354
+ "image_format": {"type": "string", "description": "Image format (png, jpg, pdf, etc.)"},
355
+ "analysis_focus": {"type": "string", "description": "Specific focus for analysis"}
356
+ },
357
+ "required": ["image_data", "image_format"]
358
+ }
359
+ }
360
+ }
361
+ ]
362
+
363
+ messages = [
364
+ UserMessage(content=f"Analyze the image document at {image_path} with focus on {analysis_focus}")
365
+ ]
366
+
367
+ response = await client.chat.complete_async(
368
+ model=MODEL,
369
+ messages=messages,
370
+ tools=image_tool
371
+ )
372
+
373
+ print("Image processing response:")
374
+ print(response.choices[0].message.content)
375
+
376
+ if response.choices[0].message.tool_calls:
377
+ for tool_call in response.choices[0].message.tool_calls:
378
+ if tool_call.function.name == "analyze_image":
379
+ image_result = simulate_analyze_image(
380
+ image_data=image_data,
381
+ image_format="jpg",
382
+ analysis_focus=analysis_focus
383
+ )
384
+ print("Image analysis result:")
385
+ print(json.dumps(image_result, indent=2))
386
+
387
+ return response
388
+
389
+ except Exception as e:
390
+ print(f"Error in image workflow: {str(e)}")
391
+ return None
392
+
393
+ async def complete_analysis_workflow(client: Mistral, input_data: Dict[str, Any], max_retries: int = 3, initial_delay: float = 5.0):
394
+ print("Starting complete analysis workflow...")
395
+
396
+ async def make_api_call(messages, tools, retry_count=0):
397
+ try:
398
+ response = await client.chat.complete_async(
399
+ model=MODEL,
400
+ messages=messages,
401
+ tools=tools
402
+ )
403
+ return response
404
+ except Exception as e:
405
+ if "429" in str(e) and retry_count < max_retries:
406
+ delay = initial_delay * (2 ** retry_count)
407
+ print(f"Rate limit hit, retrying in {delay} seconds... (Attempt {retry_count + 1}/{max_retries})")
408
+ await asyncio.sleep(delay)
409
+ return await make_api_call(messages, tools, retry_count + 1)
410
+ raise e
411
+
412
+ try:
413
+ # Define JSON analysis tool
414
+ json_analysis_tool = [
415
+ {
416
+ "type": "function",
417
+ "function": {
418
+ "name": "analyze_json_data",
419
+ "description": "Process and analyze JSON data to extract insights and patterns",
420
+ "parameters": {
421
+ "type": "object",
422
+ "properties": {
423
+ "json_data": {"type": "object", "description": "JSON data to analyze"},
424
+ "analysis_type": {"type": "string", "description": "Type of analysis to perform"}
425
+ },
426
+ "required": ["json_data"]
427
+ }
428
+ }
429
+ }
430
+ ]
431
+
432
+ # Step 1: Analyze JSON data
433
+ messages = [
434
+ UserMessage(content="Analyze the provided JSON data and create a comprehensive analysis")
435
+ ]
436
+
437
+ json_response = await make_api_call(messages, json_analysis_tool)
438
+
439
+ print("JSON Analysis response:")
440
+ print(json_response.choices[0].message.content)
441
+
442
+ # Simulate JSON analysis
443
+ if json_response.choices[0].message.tool_calls:
444
+ for tool_call in json_response.choices[0].message.tool_calls:
445
+ if tool_call.function.name == "analyze_json_data":
446
+ analysis_result = simulate_analyze_json_data(json_data=input_data)
447
+ print("Analysis result:")
448
+ print(json.dumps(analysis_result, indent=2))
449
+
450
+ # Delay before next API call
451
+ await asyncio.sleep(2.0)
452
+
453
+ # Define speech tool
454
+ speech_tool = [
455
+ {
456
+ "type": "function",
457
+ "function": {
458
+ "name": "text_to_speech",
459
+ "description": "Convert text to speech audio",
460
+ "parameters": {
461
+ "type": "object",
462
+ "properties": {
463
+ "text": {"type": "string", "description": "Text to convert to speech"},
464
+ "voice_settings": {
465
+ "type": "object",
466
+ "properties": {
467
+ "speed": {"type": "number", "default": 1.0},
468
+ "pitch": {"type": "number", "default": 1.0},
469
+ "voice_type": {"type": "string", "default": "neutral"}
470
+ }
471
+ }
472
+ },
473
+ "required": ["text"]
474
+ }
475
+ }
476
+ }
477
+ ]
478
+
479
+ # Step 2: Convert analysis to speech
480
+ analysis_text = "Climate analysis reveals significant warming trends with regional variations requiring immediate attention."
481
+
482
+ speech_messages = [
483
+ UserMessage(content=f"Convert this analysis to speech: {analysis_text}")
484
+ ]
485
+
486
+ speech_response = await make_api_call(speech_messages, speech_tool)
487
+
488
+ print("Speech conversion response:")
489
+ print(speech_response.choices[0].message.content)
490
+
491
+ # Simulate TTS
492
+ if speech_response.choices[0].message.tool_calls:
493
+ for tool_call in speech_response.choices[0].message.tool_calls:
494
+ if tool_call.function.name == "text_to_speech":
495
+ audio_url = simulate_text_to_speech(text=analysis_text)
496
+ print(f"Generated audio URL: {audio_url}")
497
+
498
+ # Play the audio
499
+ play_result = play_wav(audio_url)
500
+ print(f"Audio play result: {play_result}")
501
+
502
+ return json_response, speech_response
503
+
504
+ except Exception as e:
505
+ print(f"Error in complete analysis workflow: {str(e)}")
506
+ return None, None
507
+
508
+ async def tts_with_mcp(client: Mistral, text: str = "hello, and good luck for the hackathon"):
509
+ try:
510
+ # Define TTS tool
511
+ tts_tool = [
512
+ {
513
+ "type": "function",
514
+ "function": {
515
+ "name": "text_to_speech",
516
+ "description": "Convert text to speech audio",
517
+ "parameters": {
518
+ "type": "object",
519
+ "properties": {
520
+ "text": {"type": "string", "description": "Text to convert to speech"},
521
+ "voice_settings": {
522
+ "type": "object",
523
+ "properties": {
524
+ "speed": {"type": "number", "default": 1.0},
525
+ "pitch": {"type": "number", "default": 1.0},
526
+ "voice_type": {"type": "string", "default": "neutral"}
527
+ }
528
+ }
529
+ },
530
+ "required": ["text"]
531
+ }
532
+ }
533
+ }
534
+ ]
535
+
536
+ print("Running TTS workflow...")
537
+ messages = [
538
+ UserMessage(content=f"Say '{text}' out loud!")
539
+ ]
540
+
541
+ response = await client.chat.complete_async(
542
+ model=MODEL,
543
+ messages=messages,
544
+ tools=tts_tool
545
+ )
546
+
547
+ print("TTS Agent response:")
548
+ print(response.choices[0].message.content)
549
+
550
+ if response.choices[0].message.tool_calls:
551
+ for tool_call in response.choices[0].message.tool_calls:
552
+ if tool_call.function.name == "text_to_speech":
553
+ audio_url = simulate_text_to_speech(text=text)
554
+ print(f"Generated audio URL: {audio_url}")
555
+ play_result = play_wav(audio_url)
556
+ print(f"Audio play result: {play_result}")
557
+
558
+ return response
559
+
560
+ except Exception as e:
561
+ print(f"Error in TTS workflow: {str(e)}")
562
+ return None
563
+
564
+ async def main(client: Mistral):
565
+ print("Running TTS workflow...")
566
+
567
+ try:
568
+ # Generate speech with gTTS
569
+ text = "hello, and good luck for the hackathon"
570
+ save_path = "/tmp/output.wav"
571
+ tts = gTTS(text=text, lang="en")
572
+ tts.save(save_path)
573
+ print(f"Audio saved to {save_path}")
574
+
575
+ # Play the audio
576
+ play_result = play_wav(f"file://{os.path.abspath(save_path)}")
577
+ print(f"Audio play result: {play_result}")
578
+
579
+ # Optional: Run SpeechAgent to simulate conversational interaction
580
+ run_result = await tts_with_mcp(client, text)
581
+
582
+ if run_result:
583
+ print("All run entries:")
584
+ for entry in run_result.choices[0].message.content.splitlines():
585
+ print(entry)
586
+
587
+ return run_result
588
+
589
+ except Exception as e:
590
+ print(f"Error in TTS workflow: {str(e)}")
591
+ return None
592
+
593
+ async def main_workflow(client: Mistral):
594
+ print("Mistral Multi-Agent Document Processing System Initialized")
595
+ doc_agent = create_doc_agent(client)
596
+ image_agent = create_image_agent(client)
597
+ json_analyzer_agent = create_json_analyzer_agent(client)
598
+ speech_agent = create_speech_agent(client)
599
+
600
+ print("Available agents:")
601
+ print(f"- DocAgent ID: {doc_agent.id}")
602
+ print(f"- ImageAgent ID: {image_agent.id}")
603
+ print(f"- JsonAnalyzerAgent ID: {json_analyzer_agent.id}")
604
+ print(f"- SpeechAgent ID: {speech_agent.id}")
605
+ print("-" * 50)
606
+
607
+ # Skip hardcoded file processing since Gradio handles file uploads
608
+ print("Skipping hardcoded document and image processing workflows in main_workflow.")
609
+ print("Use the Gradio interface to upload and process files.")
610
+ print("-" * 50)
611
+
612
+ # Complete analysis workflow
613
+ print("3. Running complete analysis workflow...")
614
+ sample_data = {
615
+ "temperature_data": [20.1, 20.5, 21.2, 21.8],
616
+ "emissions": [400, 410, 415, 420],
617
+ "regions": ["Global", "Arctic", "Tropical"]
618
+ }
619
+ analysis_response, speech_response = await complete_analysis_workflow(client, sample_data)
620
+ print("-" * 50)
621
+
622
+ if analysis_response:
623
+ print("Analysis Response:")
624
+ print(analysis_response.choices[0].message.content)
625
+ else:
626
+ print("No analysis response received")
627
+
628
+ if speech_response:
629
+ print("Speech Response:")
630
+ print(speech_response.choices[0].message.content)
631
+ else:
632
+ print("No speech response received")
633
+
634
+ print("All workflows completed!")
635
+
636
+ async def full_run(client: Mistral):
637
+ await main_workflow(client)
638
+ print("\n" + "="*50)
639
+ print("Running TTS workflow...")
640
+ await main(client)
641
+
642
+ if __name__ == "__main__":
643
+ # This block is for testing purposes; actual client will be passed from app.py
644
+ client = Mistral(api_key="YOUR_API_KEY")
645
+ asyncio.run(full_run(client))
app.py ADDED
@@ -0,0 +1,239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import asyncio
3
+ import json
4
+ import os
5
+ import base64
6
+ from agent import (create_doc_agent, create_image_agent, create_json_analyzer_agent,
7
+ create_speech_agent, process_document_workflow, process_image_workflow,
8
+ complete_analysis_workflow, tts_with_mcp, simulate_process_climate_document,
9
+ simulate_analyze_image, simulate_analyze_json_data, simulate_text_to_speech, play_wav)
10
+ from mistralai import Mistral
11
+ from typing import Dict, Any
12
+ # Function to initialize Mistral client and agents
13
+
14
+ custom_css = """
15
+ body {
16
+ background: #121212;
17
+ color: #ffffff;
18
+ }
19
+ .gradio-container {
20
+ background-color: #1e1e1e;
21
+ border-radius: 12px;
22
+ box-shadow: 0 4px 12px rgba(0,0,0,0.4);
23
+ }
24
+ h1, h2 {
25
+ color: #80cbc4;
26
+ }
27
+ .gr-button {
28
+ background-color: #26a69a;
29
+ color: white;
30
+ }
31
+ .gr-button:hover {
32
+ background-color: #00897b;
33
+ }
34
+ input, textarea, select {
35
+ background-color: #2c2c2c !important;
36
+ color: #ffffff;
37
+ border: 1px solid #4db6ac;
38
+ }
39
+ .gr-file label {
40
+ background-color: #26a69a;
41
+ color: white;
42
+ }
43
+ .gr-audio {
44
+ border-radius: 12px;
45
+ box-shadow: 0 0 8px #4db6ac;
46
+ }
47
+ """
48
+
49
+
50
+ def initialize_client_and_agents(api_key: str):
51
+ try:
52
+ client = Mistral(api_key=api_key)
53
+ doc_agent = create_doc_agent(client)
54
+ image_agent = create_image_agent(client)
55
+ json_analyzer_agent = create_json_analyzer_agent(client)
56
+ speech_agent = create_speech_agent(client)
57
+ return client, {
58
+ "doc_agent_id": doc_agent.id,
59
+ "image_agent_id": image_agent.id,
60
+ "json_analyzer_agent_id": json_analyzer_agent.id,
61
+ "speech_agent_id": speech_agent.id
62
+ }
63
+ except Exception as e:
64
+ return None, f"Error initializing client: {str(e)}"
65
+
66
+ # Function to handle document processing workflow
67
+ async def run_document_workflow(api_key: str, file, document_type):
68
+ if not api_key:
69
+ return "Error: Please provide a valid API key."
70
+ if file is None:
71
+ return "Error: Please upload a document file."
72
+ file_path = file.name
73
+ client, agents_or_error = initialize_client_and_agents(api_key)
74
+ if client is None:
75
+ return agents_or_error
76
+ try:
77
+ response = await process_document_workflow(client, file_path, document_type)
78
+ if response and response.choices and response.choices[0].message.tool_calls:
79
+ for tool_call in response.choices[0].message.tool_calls:
80
+ if tool_call.function.name == "process_climate_document":
81
+ result = simulate_process_climate_document(file_path=file_path, document_type=document_type)
82
+ return json.dumps(result, indent=2)
83
+ return response.choices[0].message.content if response and response.choices else "No response received."
84
+ except Exception as e:
85
+ return f"Error: {str(e)}"
86
+
87
+ # Function to handle image processing workflow
88
+ async def run_image_workflow(api_key: str, image_file, analysis_focus):
89
+ if not api_key:
90
+ return "Error: Please provide a valid API key."
91
+ if image_file is None:
92
+ return "Error: Please upload an image file."
93
+ image_path = image_file.name
94
+ client, agents_or_error = initialize_client_and_agents(api_key)
95
+ if client is None:
96
+ return agents_or_error
97
+ try:
98
+ response = await process_image_workflow(client, image_path, analysis_focus)
99
+ if response and response.choices and response.choices[0].message.tool_calls:
100
+ for tool_call in response.choices[0].message.tool_calls:
101
+ if tool_call.function.name == "analyze_image":
102
+ with open(image_path, "rb") as f:
103
+ image_data = base64.b64encode(f.read()).decode("utf-8")
104
+ result = simulate_analyze_image(image_data, image_format="jpg", analysis_focus=analysis_focus)
105
+ return json.dumps(result, indent=2)
106
+ return response.choices[0].message.content if response and response.choices else "No response received."
107
+ except Exception as e:
108
+ return f"Error: {str(e)}"
109
+
110
+ # Function to handle JSON analysis and speech workflow
111
+ async def run_analysis_and_speech_workflow(api_key: str, json_input, analysis_type):
112
+ if not api_key:
113
+ return "Error: Please provide a valid API key.", None
114
+ try:
115
+ json_data = json.loads(json_input)
116
+ client, agents_or_error = initialize_client_and_agents(api_key)
117
+ if client is None:
118
+ return agents_or_error, None
119
+ json_response, speech_response = await complete_analysis_workflow(client, json_data, max_retries=3)
120
+
121
+ output = []
122
+ if json_response and json_response.choices:
123
+ output.append("JSON Analysis Response:")
124
+ output.append(json_response.choices[0].message.content)
125
+ for tool_call in json_response.choices[0].message.tool_calls or []:
126
+ if tool_call.function.name == "analyze_json_data":
127
+ analysis_result = simulate_analyze_json_data(json_data, analysis_type)
128
+ output.append("Analysis Result:")
129
+ output.append(json.dumps(analysis_result, indent=2))
130
+
131
+ if speech_response and speech_response.choices:
132
+ output.append("\nSpeech Response:")
133
+ output.append(speech_response.choices[0].message.content)
134
+ for tool_call in speech_response.choices[0].message.tool_calls or []:
135
+ if tool_call.function.name == "text_to_speech":
136
+ analysis_text = "Climate analysis reveals significant warming trends with regional variations requiring immediate attention."
137
+ audio_url = simulate_text_to_speech(analysis_text)
138
+ output.append(f"Generated Audio URL: {audio_url}")
139
+ play_result = play_wav(audio_url)
140
+ output.append(f"Audio Play Result: {play_result}")
141
+ if "file://" in audio_url:
142
+ audio_path = audio_url.replace("file://", "")
143
+ if os.path.exists(audio_path):
144
+ return "\n".join(output), audio_path
145
+ else:
146
+ output.append("Error: Audio file not found.")
147
+
148
+ return "\n".join(output), None
149
+ except Exception as e:
150
+ return f"Error: {str(e)}", None
151
+
152
+ # Function to handle TTS workflow
153
+ async def run_tts_workflow(api_key: str, text_input):
154
+ if not api_key:
155
+ return "Error: Please provide a valid API key.", None
156
+ client, agents_or_error = initialize_client_and_agents(api_key)
157
+ if client is None:
158
+ return agents_or_error, None
159
+ try:
160
+ response = await tts_with_mcp(client, text_input)
161
+ output = []
162
+ if response and response.choices:
163
+ output.append("TTS Agent Response:")
164
+ output.append(response.choices[0].message.content)
165
+ for tool_call in response.choices[0].message.tool_calls or []:
166
+ if tool_call.function.name == "text_to_speech":
167
+ audio_url = simulate_text_to_speech(text=text_input)
168
+ output.append(f"Generated Audio URL: {audio_url}")
169
+ play_result = play_wav(audio_url)
170
+ output.append(f"Audio Play Result: {play_result}")
171
+ if "file://" in audio_url:
172
+ audio_path = audio_url.replace("file://", "")
173
+ if os.path.exists(audio_path):
174
+ return "\n".join(output), audio_path
175
+ else:
176
+ output.append("Error: Audio file not found.")
177
+ return "\n".join(output), None
178
+ except Exception as e:
179
+ return f"Error: {str(e)}", None
180
+
181
+ # Gradio interface
182
+ with gr.Blocks(css=custom_css) as demo:
183
+
184
+ gr.Markdown("# MistyClimate Multi-Agent System")
185
+ gr.Markdown("## Mistral Multi-Agent Processing System")
186
+ gr.Markdown("Enter your Mistral API key and interact with document processing, image analysis, JSON analysis, and text-to-speech functionalities.")
187
+
188
+ api_key_input = gr.Textbox(label="Mistral API Key", type="password", placeholder="Enter your Mistral API key here")
189
+
190
+ with gr.Tab("Document Processing"):
191
+ doc_file = gr.File(label="Upload Document (PDF)")
192
+ doc_type = gr.Dropdown(choices=["climate_report", "analysis", "data"], label="Document Type", value="climate_report")
193
+ doc_button = gr.Button("Process Document")
194
+ doc_output = gr.Textbox(label="Document Processing Output", lines=10)
195
+ doc_button.click(
196
+ fn=run_document_workflow,
197
+ inputs=[api_key_input, doc_file, doc_type],
198
+ outputs=doc_output
199
+ )
200
+
201
+ with gr.Tab("Image Analysis"):
202
+ img_file = gr.File(label="Upload Image (PNG/JPG/PDF)")
203
+ analysis_focus = gr.Dropdown(choices=["text_extraction", "chart_analysis", "table_extraction"],
204
+ label="Analysis Focus", value="text_extraction")
205
+ img_button = gr.Button("Analyze Image")
206
+ img_output = gr.Textbox(label="Image Analysis Output", lines=10)
207
+ img_button.click(
208
+ fn=run_image_workflow,
209
+ inputs=[api_key_input, img_file, analysis_focus],
210
+ outputs=img_output
211
+ )
212
+
213
+ with gr.Tab("JSON Analysis & Speech"):
214
+ json_input = gr.Textbox(label="JSON Data Input", lines=5,
215
+ placeholder='{"temperature_data": [20.1, 20.5, 21.2, 21.8], "emissions": [400, 410, 415, 420], "regions": ["Global", "Arctic", "Tropical"]}')
216
+ analysis_type = gr.Dropdown(choices=["statistical", "content", "structural"],
217
+ label="Analysis Type", value="content")
218
+ analysis_button = gr.Button("Run Analysis & Speech")
219
+ analysis_output = gr.Textbox(label="Analysis and Speech Output", lines=10)
220
+ audio_output = gr.Audio(label="Generated Audio")
221
+ analysis_button.click(
222
+ fn=run_analysis_and_speech_workflow,
223
+ inputs=[api_key_input, json_input, analysis_type],
224
+ outputs=[analysis_output, audio_output]
225
+ )
226
+
227
+ with gr.Tab("Text-to-Speech"):
228
+ tts_input = gr.Textbox(label="Text Input", value="hello, and good luck for the hackathon")
229
+ tts_button = gr.Button("Generate Speech")
230
+ tts_output = gr.Textbox(label="TTS Output", lines=5)
231
+ tts_audio = gr.Audio(label="Generated Audio")
232
+ tts_button.click(
233
+ fn=run_tts_workflow,
234
+ inputs=[api_key_input, tts_input],
235
+ outputs=[tts_output, tts_audio]
236
+ )
237
+
238
+ if __name__ == "__main__":
239
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ mistralai
2
+ requests
3
+ pydantic
4
+ IPython
5
+ gtts
6
+ gradio
7
+ asyncio
8
+ json
9
+ mcp