hugh2023 commited on
Commit
adec1cb
·
1 Parent(s): 81917a3

Add multi-modal agent system with media analysis, web scraping, and enhanced configuration management

Browse files
Files changed (10) hide show
  1. README.md +274 -1
  2. SETUP.md +195 -0
  3. api_keys copy.json +12 -0
  4. app.py +971 -84
  5. check_ffmpeg.py +148 -0
  6. config.py +122 -0
  7. prompts.py +61 -0
  8. requirements.txt +23 -1
  9. run.py +138 -0
  10. tools.py +2197 -0
README.md CHANGED
@@ -12,4 +12,277 @@ hf_oauth: true
12
  hf_oauth_expiration_minutes: 480
13
  ---
14
 
15
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  hf_oauth_expiration_minutes: 480
13
  ---
14
 
15
+ # 多模态智能体系统 (Multi-Modal Agent System)
16
+
17
+ 一个基于Hugging Face和LangGraph的智能多模态智能体系统,能够理解视频、图片,并使用搜索引擎回答问题。
18
+
19
+ ## 🚀 功能特性
20
+
21
+ ### 🎥 视频理解与分析
22
+ - **关键帧提取**: 自动提取视频关键帧进行分析
23
+ - **视频描述**: 生成视频内容的自然语言描述
24
+ - **音频分析**: 分析视频的音频信息
25
+ - **时长统计**: 获取视频的基本信息(时长、帧率、分辨率等)
26
+
27
+ ### 🖼️ 图像识别与描述
28
+ - **图像描述**: 使用BLIP模型生成图像的自然语言描述
29
+ - **对象检测**: 检测图像中的物体和位置
30
+ - **图像分类**: 对图像进行分类识别
31
+ - **OCR文本提取**: 从图像中提取文字内容
32
+ - **情感分析**: 分析图像中的情感元素
33
+
34
+ ### 📄 PDF文档处理
35
+ - **PDF下载**: 从URL下载PDF文档
36
+ - **文本提取**: 从PDF中提取文本内容
37
+ - **结构分析**: 分析PDF文档结构和元数据
38
+ - **内容搜索**: 在PDF中搜索特定文本
39
+ - **图像提取**: 从PDF中提取图像
40
+ - **内容总结**: 自动总结PDF文档内容
41
+
42
+ ### 🌐 网页内容分析
43
+ - **网页抓取**: 获取网页内容和结构
44
+ - **文本提取**: 从网页中提取纯文本内容
45
+ - **结构分析**: 分析网页的标题、表单、表格等结构
46
+ - **内容搜索**: 在网页中搜索特定文本
47
+ - **链接提取**: 提取网页中的所有链接
48
+ - **内容总结**: 自动总结网页内容
49
+ - **可访问性检查**: 检查网页的可访问性问题
50
+
51
+ ### 📺 YouTube视频处理
52
+ - **视频信息获取**: 获取YouTube视频的标题、作者、时长、观看次数等
53
+ - **视频下载**: 下载YouTube视频到本地
54
+ - **音频提取**: 从YouTube视频中提取音频
55
+ - **缩略图下载**: 下载YouTube视频缩略图
56
+ - **视频搜索**: 搜索YouTube视频
57
+ - **评论分析**: 分析YouTube视频评论
58
+ - **播放列表处理**: 获取播放列表信息和视频列表
59
+
60
+ ### 📚 Wikipedia百科处理
61
+ - **页面搜索**: 搜索Wikipedia页面
62
+ - **内容获取**: 获取Wikipedia页面完整内容
63
+ - **摘要提取**: 获取页面摘要信息
64
+ - **分类获取**: 获取页面分类信息
65
+ - **链接提取**: 获取页面相关链接
66
+ - **搜索建议**: 获取搜索建议
67
+ - **英文版本**: 支持英文Wikipedia搜索
68
+ - **随机页面**: 获取随机Wikipedia页面
69
+ - **地理搜索**: 根据坐标搜索附近页面
70
+
71
+ ### 🔍 智能搜索引擎
72
+ - **网络搜索**: 使用DuckDuckGo进行实时网络搜索
73
+ - **图像搜索**: 搜索相关图像资源
74
+ - **视频搜索**: 搜索相关视频内容
75
+ - **智能查询**: 根据问题自动构建搜索查询
76
+
77
+ ### 🤖 LangGraph工作流编排
78
+ - **状态管理**: 使用AgentState管理智能体状态
79
+ - **工作流节点**: 媒体分类 → 媒体分析 → 信息搜索 → 工具使用 → 答案合成
80
+ - **智能路由**: 根据问题类型自动选择合适的处理路径
81
+
82
+ ### 🛠️ 丰富工具集
83
+ - **文本分析**: 情感分析、关键词提取、文本摘要
84
+ - **翻译工具**: 多语言文本翻译
85
+ - **数学计算**: 安全的数学表达式计算
86
+ - **天气查询**: 实时天气信息获取
87
+
88
+ ## 📋 系统架构
89
+
90
+ ```
91
+ 用户问题 → 媒体分类 → 媒体分析 → 信息搜索 → 工具使用 → 答案合成 → 最终答案
92
+ ↓ ↓ ↓ ↓ ↓ ↓
93
+ 文本/图片/视频/PDF/网页/YouTube/Wikipedia 图像/视频/PDF/网页/YouTube/Wikipedia处理 网络搜索 专用工具 信息整合 自然语言回答
94
+ ```
95
+
96
+ ## 🛠️ 安装与配置
97
+
98
+ ### 1. 环境要求
99
+ - Python 3.8+
100
+ - CUDA支持(可选,用于GPU加速)
101
+
102
+ ### 2. 安装依赖
103
+ ```bash
104
+ pip install -r requirements.txt
105
+ ```
106
+
107
+ ### 3. 环境变量配置
108
+ 创建 `.env` 文件并配置以下变量:
109
+ ```env
110
+ # OpenAI API配置
111
+ OPENAI_API_KEY=your_openai_api_key_here
112
+
113
+ # Hugging Face配置(可选)
114
+ HUGGINGFACE_API_KEY=your_huggingface_api_key_here
115
+
116
+ # 搜索引擎配置(可选)
117
+ SERPER_API_KEY=your_serper_api_key_here
118
+
119
+ # 调试配置
120
+ DEBUG=True
121
+ LOG_LEVEL=INFO
122
+ ```
123
+
124
+ ### 4. 运行系统
125
+ ```bash
126
+ python app.py
127
+ ```
128
+
129
+ ## 🎯 使用示例
130
+
131
+ ### 基本使用
132
+ ```python
133
+ from app import MultiModalAgent
134
+
135
+ # 初始化智能体
136
+ agent = MultiModalAgent()
137
+
138
+ # 文本问题
139
+ answer = agent("什么是人工智能?")
140
+
141
+ # 图像问题
142
+ answer = agent("这张图片里有什么?", "https://example.com/image.jpg")
143
+
144
+ # 视频问题
145
+ answer = agent("这个视频在讲什么?", "https://youtube.com/watch?v=example")
146
+
147
+ # 网页问题
148
+ answer = agent("这个网页的主要内容是什么?", "https://example.com")
149
+
150
+ # YouTube问题
151
+ answer = agent("这个YouTube视频的信息是什么?", "https://www.youtube.com/watch?v=example")
152
+
153
+ # Wikipedia问题
154
+ answer = agent("Wikipedia关于人工智能的信息是什么?")
155
+ ```
156
+
157
+ ### 高级功能
158
+ ```python
159
+ # 情感分析
160
+ answer = agent("分析这段文字的情感", "这是一段需要分析的文本")
161
+
162
+ # 关键词提取
163
+ answer = agent("提取这段文字的关键词", "这是一段需要提取关键词的文本")
164
+
165
+ # 文本摘要
166
+ answer = agent("总结这段文字", "这是一段很长的文字需要总结...")
167
+ ```
168
+
169
+ ## 📊 支持的模型
170
+
171
+ ### 图像处理模型
172
+ - **BLIP**: Salesforce/blip-image-captioning-base
173
+ - **ResNet**: microsoft/resnet-50
174
+ - **DETR**: facebook/detr-resnet-50
175
+ - **GIT**: microsoft/git-base
176
+
177
+ ### 文本处理模型
178
+ - **情感分析**: cardiffnlp/twitter-roberta-base-sentiment-latest
179
+ - **命名实体识别**: dbmdz/bert-large-cased-finetuned-conll03-english
180
+ - **文本摘要**: facebook/bart-large-cnn
181
+ - **翻译**: Helsinki-NLP/opus-mt-en-zh
182
+
183
+ ### 视频处理
184
+ - **MoviePy**: 视频编辑和处理
185
+ - **OpenCV**: 计算机视觉处理
186
+ - **PyTube**: YouTube视频下载
187
+
188
+ ## 🔧 自定义扩展
189
+
190
+ ### 添加新工具
191
+ ```python
192
+ from tools import ToolManager
193
+
194
+ class CustomTools:
195
+ @staticmethod
196
+ @tool
197
+ def custom_function(input_text: str) -> str:
198
+ """自定义工具函数"""
199
+ # 实现你的逻辑
200
+ return "处理结果"
201
+
202
+ # 注册工具
203
+ tool_manager = ToolManager()
204
+ tool_manager.tools["custom_function"] = CustomTools.custom_function
205
+ ```
206
+
207
+ ### 修改工作流
208
+ ```python
209
+ def _build_workflow(self) -> StateGraph:
210
+ workflow = StateGraph(AgentState)
211
+
212
+ # 添加自定义节点
213
+ workflow.add_node("custom_node", self._custom_processing)
214
+
215
+ # 修改工作流路径
216
+ workflow.add_edge("analyze_media", "custom_node")
217
+ workflow.add_edge("custom_node", "search_info")
218
+
219
+ return workflow.compile()
220
+ ```
221
+
222
+ ## 📈 性能优化
223
+
224
+ ### GPU加速
225
+ 系统会自动检测CUDA可用性并使用GPU加速:
226
+ ```python
227
+ device = 0 if torch.cuda.is_available() else -1
228
+ ```
229
+
230
+ ### 缓存机制
231
+ - 模型缓存:自动缓存下载的模型
232
+ - 结果缓存:缓存分析结果避免重复计算
233
+
234
+ ### 内存优化
235
+ - 图像尺寸限制:自动调整大图像尺寸
236
+ - 视频帧采样:智能选择关键帧进行分析
237
+
238
+ ## 🐛 故障排除
239
+
240
+ ### 常见问题
241
+
242
+ 1. **OpenAI API错误**
243
+ - 检查API密钥是否正确
244
+ - 确认账户余额充足
245
+
246
+ 2. **模型下载失败**
247
+ - 检查网络连接
248
+ - 尝试使用镜像源
249
+
250
+ 3. **内存不足**
251
+ - 减少批处理大小
252
+ - 使用CPU模式运行
253
+
254
+ 4. **视频处理失败**
255
+ - 检查视频格式是否支持
256
+ - 确认视频文件完整性
257
+
258
+ ### 调试模式
259
+ 设置环境变量启用调试模式:
260
+ ```env
261
+ DEBUG=True
262
+ LOG_LEVEL=DEBUG
263
+ ```
264
+
265
+ ## 🤝 贡献指南
266
+
267
+ 欢迎提交Issue和Pull Request来改进这个项目!
268
+
269
+ ### 开发环境设置
270
+ 1. Fork项目
271
+ 2. 创建功能分支
272
+ 3. 提交更改
273
+ 4. 创建Pull Request
274
+
275
+ ## 📄 许可证
276
+
277
+ 本项目采用MIT许可证 - 查看 [LICENSE](LICENSE) 文件了解详情。
278
+
279
+ ## 🙏 致谢
280
+
281
+ - [Hugging Face](https://huggingface.co/) - 提供优秀的预训练模型
282
+ - [LangGraph](https://github.com/langchain-ai/langgraph) - 工作流编排框架
283
+ - [LangChain](https://langchain.com/) - LLM应用开发框架
284
+ - [Gradio](https://gradio.app/) - 快速构建Web界面
285
+
286
+ ---
287
+
288
+ **注意**: 这是一个教育项目,请确保遵守相关API的使用条款和隐私政策。
SETUP.md ADDED
@@ -0,0 +1,195 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 多模态智能体系统配置指南
2
+
3
+ ## 🚀 快速开始
4
+
5
+ ### 1. 安装依赖
6
+ ```bash
7
+ pip install -r requirements.txt
8
+ ```
9
+
10
+ ### 2. 配置API密钥
11
+
12
+ #### 方法一:使用配置文件(推荐)
13
+
14
+ 1. 编辑 `api_keys.json` 文件:
15
+ ```json
16
+ {
17
+ "openai": {
18
+ "api_key": "sk-your-openai-api-key-here"
19
+ },
20
+ "huggingface": {
21
+ "api_key": "hf-your-huggingface-api-key-here"
22
+ },
23
+ "search_engine": {
24
+ "type": "duckduckgo",
25
+ "api_key": null
26
+ }
27
+ }
28
+ ```
29
+
30
+ 2. 将你的OpenAI API密钥替换 `sk-your-openai-api-key-here`
31
+
32
+ #### 方法二:使用环境变量
33
+ ```bash
34
+ # Windows
35
+ set OPENAI_API_KEY=sk-your-openai-api-key-here
36
+
37
+ # Linux/Mac
38
+ export OPENAI_API_KEY=sk-your-openai-api-key-here
39
+ ```
40
+
41
+ ### 3. 运行系统
42
+
43
+ #### Web界面模式
44
+ ```bash
45
+ python run.py --mode web
46
+ ```
47
+
48
+ #### 测试模式
49
+ ```bash
50
+ python run.py --mode test
51
+ ```
52
+
53
+ #### 交互式模式
54
+ ```bash
55
+ python run.py --mode interactive
56
+ ```
57
+
58
+ ## 🔑 API密钥获取指南
59
+
60
+ ### OpenAI API密钥
61
+ 1. 访问 [OpenAI官网](https://platform.openai.com/)
62
+ 2. 注册或登录账户
63
+ 3. 进入 "API Keys" 页面
64
+ 4. 点击 "Create new secret key"
65
+ 5. 复制生成的密钥(以 `sk-` 开头)
66
+
67
+ ### Hugging Face API密钥(可选)
68
+ 1. 访问 [Hugging Face](https://huggingface.co/)
69
+ 2. 注册或登录账户
70
+ 3. 进入 "Settings" → "Access Tokens"
71
+ 4. 点击 "New token"
72
+ 5. 复制生成的令牌(以 `hf_` 开头)
73
+
74
+ ## 🔍 搜索引擎配置
75
+
76
+ ### DuckDuckGo搜索(默认,无需API密钥)
77
+ - 无需配置API密钥
78
+ - 免费使用
79
+ - 支持文本、图像、视频搜索
80
+
81
+ ### 其他搜索引擎(可选)
82
+ 如果需要使用其他搜索引擎,可以修改 `api_keys.json`:
83
+
84
+ ```json
85
+ {
86
+ "search_engine": {
87
+ "type": "serper",
88
+ "api_key": "your-serper-api-key"
89
+ }
90
+ }
91
+ ```
92
+
93
+ ## ⚙️ 高级配置
94
+
95
+ ### 模型配置
96
+ 在 `config.py` 中可以修改使用的模型:
97
+
98
+ ```python
99
+ # 图像描述模型
100
+ IMAGE_CAPTION_MODEL = "Salesforce/blip-image-captioning-base"
101
+
102
+ # 图像分类模型
103
+ IMAGE_CLASSIFICATION_MODEL = "microsoft/resnet-50"
104
+
105
+ # 对象检测模型
106
+ OBJECT_DETECTION_MODEL = "facebook/detr-resnet-50"
107
+ ```
108
+
109
+ ### 系统配置
110
+ ```python
111
+ # 调试模式
112
+ DEBUG = True
113
+
114
+ # 日志级别
115
+ LOG_LEVEL = "DEBUG"
116
+
117
+ # 视频处理配置
118
+ MAX_VIDEO_DURATION = 300 # 最大视频时长(秒)
119
+ FRAMES_TO_ANALYZE = 5 # 视频分析帧数
120
+ ```
121
+
122
+ ## 🐛 常见问题
123
+
124
+ ### 1. API密钥错误
125
+ **错误信息**: `OpenAI API密钥未配置`
126
+ **解决方案**:
127
+ - 检查 `api_keys.json` 文件是否存在
128
+ - 确认API密钥格式正确(OpenAI密钥以 `sk-` 开头)
129
+ - 验证API密钥是否有效
130
+
131
+ ### 2. 依赖包安装失败
132
+ **错误信息**: `ModuleNotFoundError`
133
+ **解决方案**:
134
+ ```bash
135
+ # 升级pip
136
+ pip install --upgrade pip
137
+
138
+ # 重新安装依赖
139
+ pip install -r requirements.txt --force-reinstall
140
+ ```
141
+
142
+ ### 3. 模型下载失败
143
+ **错误信息**: `模型下载失败`
144
+ **解决方案**:
145
+ - 检查网络连接
146
+ - 使用VPN或代理
147
+ - 手动下载模型到本地缓存目录
148
+
149
+ ### 4. 内存不足
150
+ **错误信息**: `CUDA out of memory`
151
+ **解决方案**:
152
+ - 减少批处理大小
153
+ - 使用CPU模式运行
154
+ - 关闭其他占用内存的程序
155
+
156
+ ## 📁 文件结构
157
+
158
+ ```
159
+ Final_Assignment_Agent/
160
+ ├── api_keys.json # API密钥配置文件
161
+ ├── config.py # 系统配置
162
+ ├── app.py # 主应用
163
+ ├── tools.py # 工具模块
164
+ ├── test_agent.py # 测试脚本
165
+ ├── run.py # 启动脚本
166
+ ├── requirements.txt # 依赖包列表
167
+ ├── README.md # 项目说明
168
+ └── SETUP.md # 配置指南
169
+ ```
170
+
171
+ ## 🔒 安全注意事项
172
+
173
+ 1. **不要提交API密钥到版本控制**
174
+ - 将 `api_keys.json` 添加到 `.gitignore`
175
+ - 使用环境变量或配置文件
176
+
177
+ 2. **定期更新API密钥**
178
+ - 定期检查API密钥的有效性
179
+ - 及时更新过期的密钥
180
+
181
+ 3. **限制API使用**
182
+ - 设置API使用限制
183
+ - 监控API调用次数和费用
184
+
185
+ ## 📞 技术支持
186
+
187
+ 如果遇到问题,请:
188
+ 1. 查看错误日志
189
+ 2. 检查配置文件
190
+ 3. 运行测试脚本
191
+ 4. 查看常见问题解答
192
+
193
+ ---
194
+
195
+ **注意**: 请确保遵守相关API的使用条款和隐私政策。
api_keys copy.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "openai": {
3
+ "api_key":""
4
+ },
5
+ "huggingface": {
6
+ "api_key": ""
7
+ },
8
+ "search_engine": {
9
+ "type": "duckduckgo",
10
+ "api_key": null
11
+ }
12
+ }
app.py CHANGED
@@ -1,54 +1,946 @@
1
  import os
2
  import gradio as gr
3
  import requests
4
- import inspect
5
  import pandas as pd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
- # (Keep Constants as is)
8
- # --- Constants ---
 
 
 
 
 
 
 
 
9
  DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
10
 
11
- # --- Basic Agent Definition ---
12
- # ----- THIS IS WERE YOU CAN BUILD WHAT YOU WANT ------
13
- class BasicAgent:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  def __init__(self):
15
- print("BasicAgent initialized.")
16
- def __call__(self, question: str) -> str:
17
- print(f"Agent received question (first 50 chars): {question[:50]}...")
18
- fixed_answer = "This is a default answer."
19
- print(f"Agent returning fixed answer: {fixed_answer}")
20
- return fixed_answer
21
-
22
- def run_and_submit_all( profile: gr.OAuthProfile | None):
23
- """
24
- Fetches all questions, runs the BasicAgent on them, submits all answers,
25
- and displays the results.
26
- """
27
- # --- Determine HF Space Runtime URL and Repo URL ---
28
- space_id = os.getenv("SPACE_ID") # Get the SPACE_ID for sending link to the code
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  if profile:
31
- username= f"{profile.username}"
32
  print(f"User logged in: {username}")
33
  else:
34
  print("User not logged in.")
35
  return "Please Login to Hugging Face with the button.", None
36
 
 
37
  api_url = DEFAULT_API_URL
38
  questions_url = f"{api_url}/questions"
39
  submit_url = f"{api_url}/submit"
40
 
41
- # 1. Instantiate Agent ( modify this part to create your agent)
42
  try:
43
- agent = BasicAgent()
44
  except Exception as e:
45
  print(f"Error instantiating agent: {e}")
46
  return f"Error initializing agent: {e}", None
47
- # In the case of an app running as a hugging Face space, this link points toward your codebase ( usefull for others so please keep it public)
48
  agent_code = f"https://huggingface.co/spaces/{space_id}/tree/main"
49
  print(agent_code)
50
 
51
- # 2. Fetch Questions
52
  print(f"Fetching questions from: {questions_url}")
53
  try:
54
  response = requests.get(questions_url, timeout=15)
@@ -58,27 +950,22 @@ def run_and_submit_all( profile: gr.OAuthProfile | None):
58
  print("Fetched questions list is empty.")
59
  return "Fetched questions list is empty or invalid format.", None
60
  print(f"Fetched {len(questions_data)} questions.")
61
- except requests.exceptions.RequestException as e:
62
  print(f"Error fetching questions: {e}")
63
  return f"Error fetching questions: {e}", None
64
- except requests.exceptions.JSONDecodeError as e:
65
- print(f"Error decoding JSON response from questions endpoint: {e}")
66
- print(f"Response text: {response.text[:500]}")
67
- return f"Error decoding server response for questions: {e}", None
68
- except Exception as e:
69
- print(f"An unexpected error occurred fetching questions: {e}")
70
- return f"An unexpected error occurred fetching questions: {e}", None
71
 
72
- # 3. Run your Agent
73
  results_log = []
74
  answers_payload = []
75
  print(f"Running agent on {len(questions_data)} questions...")
 
76
  for item in questions_data:
77
  task_id = item.get("task_id")
78
  question_text = item.get("question")
79
  if not task_id or question_text is None:
80
  print(f"Skipping item with missing task_id or question: {item}")
81
  continue
 
82
  try:
83
  submitted_answer = agent(question_text)
84
  answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
@@ -91,12 +978,12 @@ def run_and_submit_all( profile: gr.OAuthProfile | None):
91
  print("Agent did not produce any answers to submit.")
92
  return "Agent did not produce any answers to submit.", pd.DataFrame(results_log)
93
 
94
- # 4. Prepare Submission
95
  submission_data = {"username": username.strip(), "agent_code": agent_code, "answers": answers_payload}
96
  status_update = f"Agent finished. Submitting {len(answers_payload)} answers for user '{username}'..."
97
  print(status_update)
98
 
99
- # 5. Submit
100
  print(f"Submitting {len(answers_payload)} answers to: {submit_url}")
101
  try:
102
  response = requests.post(submit_url, json=submission_data, timeout=60)
@@ -112,85 +999,85 @@ def run_and_submit_all( profile: gr.OAuthProfile | None):
112
  print("Submission successful.")
113
  results_df = pd.DataFrame(results_log)
114
  return final_status, results_df
115
- except requests.exceptions.HTTPError as e:
116
- error_detail = f"Server responded with status {e.response.status_code}."
117
- try:
118
- error_json = e.response.json()
119
- error_detail += f" Detail: {error_json.get('detail', e.response.text)}"
120
- except requests.exceptions.JSONDecodeError:
121
- error_detail += f" Response: {e.response.text[:500]}"
122
- status_message = f"Submission Failed: {error_detail}"
123
- print(status_message)
124
- results_df = pd.DataFrame(results_log)
125
- return status_message, results_df
126
- except requests.exceptions.Timeout:
127
- status_message = "Submission Failed: The request timed out."
128
- print(status_message)
129
- results_df = pd.DataFrame(results_log)
130
- return status_message, results_df
131
- except requests.exceptions.RequestException as e:
132
- status_message = f"Submission Failed: Network error - {e}"
133
- print(status_message)
134
- results_df = pd.DataFrame(results_log)
135
- return status_message, results_df
136
  except Exception as e:
137
- status_message = f"An unexpected error occurred during submission: {e}"
138
  print(status_message)
139
  results_df = pd.DataFrame(results_log)
140
  return status_message, results_df
141
 
 
 
 
 
 
 
 
 
142
 
143
- # --- Build Gradio Interface using Blocks ---
144
  with gr.Blocks() as demo:
145
- gr.Markdown("# Basic Agent Evaluation Runner")
146
  gr.Markdown(
147
  """
148
- **Instructions:**
149
-
150
- 1. Please clone this space, then modify the code to define your agent's logic, the tools, the necessary packages, etc ...
151
- 2. Log in to your Hugging Face account using the button below. This uses your HF username for submission.
152
- 3. Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
153
-
154
- ---
155
- **Disclaimers:**
156
- Once clicking on the "submit button, it can take quite some time ( this is the time for the agent to go through all the questions).
157
- This space provides a basic setup and is intentionally sub-optimal to encourage you to develop your own, more robust solution. For instance for the delay process of the submit button, a solution could be to cache the answers and submit in a seperate action or even to answer the questions in async.
 
158
  """
159
  )
160
 
161
  gr.LoginButton()
162
 
163
- run_button = gr.Button("Run Evaluation & Submit All Answers")
 
 
 
 
 
 
 
 
164
 
165
- status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
166
- # Removed max_rows=10 from DataFrame constructor
167
- results_table = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
 
 
168
 
 
 
 
 
169
  run_button.click(
170
  fn=run_and_submit_all,
171
  outputs=[status_output, results_table]
172
  )
173
 
174
  if __name__ == "__main__":
175
- print("\n" + "-"*30 + " App Starting " + "-"*30)
176
- # Check for SPACE_HOST and SPACE_ID at startup for information
177
  space_host_startup = os.getenv("SPACE_HOST")
178
- space_id_startup = os.getenv("SPACE_ID") # Get SPACE_ID at startup
179
 
180
  if space_host_startup:
181
  print(f"✅ SPACE_HOST found: {space_host_startup}")
182
- print(f" Runtime URL should be: https://{space_host_startup}.hf.space")
183
  else:
184
  print("ℹ️ SPACE_HOST environment variable not found (running locally?).")
185
 
186
- if space_id_startup: # Print repo URLs if SPACE_ID is found
187
  print(f"✅ SPACE_ID found: {space_id_startup}")
188
  print(f" Repo URL: https://huggingface.co/spaces/{space_id_startup}")
189
- print(f" Repo Tree URL: https://huggingface.co/spaces/{space_id_startup}/tree/main")
190
  else:
191
- print("ℹ️ SPACE_ID environment variable not found (running locally?). Repo URL cannot be determined.")
192
-
193
- print("-"*(60 + len(" App Starting ")) + "\n")
194
 
195
- print("Launching Gradio Interface for Basic Agent Evaluation...")
 
196
  demo.launch(debug=True, share=False)
 
1
  import os
2
  import gradio as gr
3
  import requests
 
4
  import pandas as pd
5
+ import json
6
+ import base64
7
+ import io
8
+ from typing import Dict, List, Any, Optional, Union
9
+ from dataclasses import dataclass
10
+ from pathlib import Path
11
+ import tempfile
12
+ import cv2
13
+ import numpy as np
14
+ from PIL import Image
15
+ import torch
16
+ from transformers import pipeline, AutoProcessor, AutoModel
17
+ # import moviepy.editor as mp # 暂时注释掉,需要安装moviepy
18
+ # from pytube import YouTube # 暂时注释掉,需要安装pytube
19
+ import urllib.request
20
+ from langgraph.graph import StateGraph, END
21
+ from langchain_core.messages import HumanMessage, AIMessage
22
+ from langchain_openai import ChatOpenAI
23
+ from langchain_community.tools import DuckDuckGoSearchRun
24
+ from langchain_core.tools import tool
25
+ import matplotlib.pyplot as plt
26
+ import seaborn as sns
27
 
28
+ # 环境变量设置
29
+ from dotenv import load_dotenv
30
+ load_dotenv()
31
+
32
+ # 导入自定义模块
33
+ from config import Config
34
+ from tools import ToolManager
35
+ from prompts import get_answer_prompt, ERROR_ANSWER_TEMPLATE
36
+
37
+ # 常量定义
38
  DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
39
 
40
+ @dataclass
41
+ class AgentState:
42
+ """智能体状态类"""
43
+ question: str
44
+ media_type: Optional[str] = None # 'image', 'video', 'text'
45
+ media_path: Optional[str] = None
46
+ extracted_info: Dict[str, Any] = None
47
+ search_results: List[str] = None
48
+ analysis_results: Dict[str, Any] = None
49
+ workflow_plan: List[Dict[str, Any]] = None # 工作流计划
50
+ current_step: int = 0 # 当前执行步骤
51
+ final_answer: str = ""
52
+ error: Optional[str] = None
53
+
54
+ def __post_init__(self):
55
+ if self.extracted_info is None:
56
+ self.extracted_info = {}
57
+ if self.search_results is None:
58
+ self.search_results = []
59
+ if self.analysis_results is None:
60
+ self.analysis_results = {}
61
+ if self.workflow_plan is None:
62
+ self.workflow_plan = []
63
+
64
+ class MediaAnalyzer:
65
+ """媒体分析器类"""
66
+
67
  def __init__(self):
68
+ # 初始化图像分析模型
69
+ self.image_processor = AutoProcessor.from_pretrained("microsoft/git-base")
70
+ self.image_model = AutoModel.from_pretrained("microsoft/git-base")
71
+
72
+ # 初始化图像描述模型
73
+ self.image_caption_pipeline = pipeline(
74
+ "image-to-text",
75
+ model="Salesforce/blip-image-captioning-base",
76
+ device=0 if torch.cuda.is_available() else -1
77
+ )
78
+
79
+ # 初始化图像分类模型
80
+ self.image_classification_pipeline = pipeline(
81
+ "image-classification",
82
+ model="microsoft/resnet-50",
83
+ device=0 if torch.cuda.is_available() else -1
84
+ )
85
+
86
+ # 初始化对象检测模型
87
+ self.object_detection_pipeline = pipeline(
88
+ "object-detection",
89
+ model="facebook/detr-resnet-50",
90
+ device=0 if torch.cuda.is_available() else -1
91
+ )
92
+
93
+ print("MediaAnalyzer initialized successfully")
94
+
95
+ def analyze_image(self, image_path: str) -> Dict[str, Any]:
96
+ """分析图像内容"""
97
+ try:
98
+ # 加载图像
99
+ image = Image.open(image_path)
100
+
101
+ # 图像描述
102
+ caption_result = self.image_caption_pipeline(image)
103
+ caption = caption_result[0]['generated_text']
104
+
105
+ # 图像分类
106
+ classification_result = self.image_classification_pipeline(image)
107
+ top_classes = classification_result[:5]
108
+
109
+ # 对象检测
110
+ detection_result = self.object_detection_pipeline(image)
111
+ detected_objects = []
112
+ for detection in detection_result:
113
+ detected_objects.append({
114
+ 'label': detection['label'],
115
+ 'confidence': detection['score'],
116
+ 'box': detection['box']
117
+ })
118
+
119
+ # 图像基本信息
120
+ image_info = {
121
+ 'size': image.size,
122
+ 'mode': image.mode,
123
+ 'format': image.format
124
+ }
125
+
126
+ return {
127
+ 'caption': caption,
128
+ 'classification': top_classes,
129
+ 'detected_objects': detected_objects,
130
+ 'image_info': image_info
131
+ }
132
+
133
+ except Exception as e:
134
+ return {'error': f"图像分析失败: {str(e)}"}
135
+
136
+ def analyze_video(self, video_path: str) -> Dict[str, Any]:
137
+ """分析视频内容 - 真正让VLLM看视频"""
138
+ try:
139
+ # 使用OpenCV分析视频
140
+ cap = cv2.VideoCapture(video_path)
141
+ if not cap.isOpened():
142
+ return {'error': "无法打开视频文件"}
143
+
144
+ # 获取视频基本信息
145
+ fps = cap.get(cv2.CAP_PROP_FPS)
146
+ frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
147
+ duration = frame_count / fps if fps > 0 else 0
148
+ width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
149
+ height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
150
+
151
+ print(f"🎬 开始分析视频: {frame_count}帧, {fps}fps, 时长{duration:.1f}秒")
152
+
153
+ # 提取关键帧进行分析(每秒1帧)
154
+ frames_analyzed = []
155
+ frame_interval = max(1, int(fps)) # 每秒1帧
156
+
157
+ for i in range(0, frame_count, frame_interval):
158
+ cap.set(cv2.CAP_PROP_POS_FRAMES, i)
159
+ ret, frame = cap.read()
160
+ if ret:
161
+ # 转换为PIL图像进行分析
162
+ frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
163
+ pil_image = Image.fromarray(frame_rgb)
164
+
165
+ # 使用VLLM分析图像
166
+ try:
167
+ caption_result = self.image_caption_pipeline(pil_image)
168
+ frame_info = {
169
+ "frame_number": i,
170
+ "timestamp": i / fps if fps > 0 else 0,
171
+ "caption": caption_result[0]['generated_text']
172
+ }
173
+ frames_analyzed.append(frame_info)
174
+
175
+ print(f"📸 第{i//frame_interval}帧 ({i/fps:.1f}s): {frame_info['caption']}")
176
+
177
+ except Exception as e:
178
+ print(f"帧分析失败: {e}")
179
+ frames_analyzed.append({
180
+ "frame_number": i,
181
+ "timestamp": i / fps if fps > 0 else 0,
182
+ "caption": "无法分析此帧"
183
+ })
184
+
185
+ cap.release()
186
+
187
+ # 生成视频内容总结
188
+ if frames_analyzed:
189
+ # 提取所有描述
190
+ descriptions = [frame['caption'] for frame in frames_analyzed if frame['caption'] != "无法分析此帧"]
191
+ if descriptions:
192
+ # 使用LLM总结视频内容
193
+ summary_prompt = f"""
194
+ 基于以下视频帧描述,总结这个视频的主要内容:
195
+
196
+ {chr(10).join([f"时间 {frame['timestamp']:.1f}s: {frame['caption']}" for frame in frames_analyzed[:10]])}
197
+
198
+ 请用中文总结这个视频的主要内容:
199
+ """
200
+ try:
201
+ from langchain_openai import ChatOpenAI
202
+ llm = ChatOpenAI(
203
+ model="gpt-3.5-turbo",
204
+ temperature=0.7,
205
+ api_key=Config.OPENAI_API_KEY
206
+ )
207
+ summary_response = llm.invoke(summary_prompt)
208
+ video_summary = summary_response.content
209
+ except:
210
+ video_summary = f"视频包含{len(frames_analyzed)}个场景,主要展示了各种视觉内容"
211
+ else:
212
+ video_summary = "无法分析视频内容"
213
+ else:
214
+ video_summary = "视频分析失败"
215
+
216
+ return {
217
+ 'type': 'video',
218
+ 'video_info': {
219
+ 'duration': duration,
220
+ 'fps': fps,
221
+ 'frame_count': frame_count,
222
+ 'resolution': f"{width}x{height}"
223
+ },
224
+ 'frames_analyzed': frames_analyzed[:10], # 只返回前10帧
225
+ 'video_summary': video_summary,
226
+ 'analysis_method': 'OpenCV + VLLM',
227
+ 'summary': f"视频时长{duration:.1f}秒,分析了{len(frames_analyzed)}个关键帧,内容:{video_summary}"
228
+ }
229
+
230
+ except Exception as e:
231
+ return {'error': f"视频分析失败: {str(e)}"}
232
+
233
+ def download_media(self, url: str, media_type: str) -> str:
234
+ """下载媒体文件"""
235
+ try:
236
+ if media_type == 'video':
237
+ # 简化版本:对于视频,只返回URL
238
+ print("⚠️ 视频下载功能需要安装moviepy和pytube")
239
+ return url
240
+ else:
241
+ # 下载图像文件
242
+ temp_path = tempfile.mktemp(suffix='.jpg')
243
+ urllib.request.urlretrieve(url, temp_path)
244
+ return temp_path
245
+ except Exception as e:
246
+ raise Exception(f"媒体下载失败: {str(e)}")
247
 
248
+ class SearchEngine:
249
+ """搜索引擎类"""
250
+
251
+ def __init__(self):
252
+ self.search_tool = DuckDuckGoSearchRun()
253
+
254
+ def search(self, query: str) -> List[str]:
255
+ """执行搜索"""
256
+ try:
257
+ results = self.search_tool.run(query)
258
+ return [results] if isinstance(results, str) else results
259
+ except Exception as e:
260
+ return [f"搜索失败: {str(e)}"]
261
+
262
+ class MultiModalAgent:
263
+ """多模态智能体主类"""
264
+
265
+ def __init__(self):
266
+ # 验证配置
267
+ if not Config.validate():
268
+ raise ValueError("配置验证失败,请检查环境变量")
269
+
270
+ self.media_analyzer = MediaAnalyzer()
271
+ self.search_engine = SearchEngine()
272
+ self.tool_manager = ToolManager()
273
+
274
+ self.llm = ChatOpenAI(
275
+ model=Config.OPENAI_MODEL,
276
+ temperature=Config.OPENAI_TEMPERATURE,
277
+ api_key=Config.OPENAI_API_KEY
278
+ )
279
+
280
+ # 构建LangGraph工作流
281
+ self.workflow = self._build_workflow()
282
+
283
+ print("MultiModalAgent initialized successfully")
284
+
285
+ def _build_workflow(self) -> StateGraph:
286
+ """构建LangGraph工作流"""
287
+
288
+ # 创建状态图
289
+ workflow = StateGraph(AgentState)
290
+
291
+ # 添加节点
292
+ workflow.add_node("plan_workflow", self._plan_workflow)
293
+ workflow.add_node("classify_media", self._classify_media)
294
+ workflow.add_node("analyze_media", self._analyze_media)
295
+ workflow.add_node("search_info", self._search_info)
296
+ workflow.add_node("use_tools", self._use_tools)
297
+ workflow.add_node("synthesize_answer", self._synthesize_answer)
298
+
299
+ # 设置入口点
300
+ workflow.set_entry_point("plan_workflow")
301
+
302
+ # 添加边
303
+ workflow.add_edge("plan_workflow", "classify_media")
304
+ workflow.add_edge("classify_media", "analyze_media")
305
+ workflow.add_edge("analyze_media", "search_info")
306
+ workflow.add_edge("search_info", "use_tools")
307
+ workflow.add_edge("use_tools", "synthesize_answer")
308
+ workflow.add_edge("synthesize_answer", END)
309
+
310
+ return workflow.compile()
311
+
312
+ def _plan_workflow(self, state: AgentState) -> AgentState:
313
+ """智能规划工作流"""
314
+ try:
315
+ # 使用LLM分析任务并制定工作流计划
316
+ planning_prompt = f"""
317
+ 你是一个智能工作流规划专家。请分析以下任务,并制定一个详细的工作流计划。
318
+
319
+ 任务: {state.question}
320
+
321
+ 请根据任务类型和需求,设计一个合适的工作流。工作流应该包含以下信息:
322
+ 1. 步骤编号
323
+ 2. 步骤名称
324
+ 3. 步骤描述
325
+ 4. 是否需要搜索网络
326
+ 5. 需要使用哪些工具
327
+ 6. 预期输出
328
+
329
+ 请以JSON格式返回工作流计划,格式如下:
330
+ {{
331
+ "workflow": [
332
+ {{
333
+ "step": 1,
334
+ "name": "步骤名称",
335
+ "description": "步骤描述",
336
+ "needs_search": true/false,
337
+ "tools": ["工具1", "工具2"],
338
+ "expected_output": "预期输出"
339
+ }}
340
+ ]
341
+ }}
342
+
343
+ 请确保工作流是合理的、高效的,并且能够完成任务。
344
+ """
345
+
346
+ # 调用LLM进行工作流规划
347
+ response = self.llm.invoke(planning_prompt)
348
+
349
+ # 解析工作流计划
350
+ try:
351
+ import json
352
+ # 尝试从响应中提取JSON
353
+ if "```json" in response.content:
354
+ json_start = response.content.find("```json") + 7
355
+ json_end = response.content.find("```", json_start)
356
+ json_str = response.content[json_start:json_end].strip()
357
+ else:
358
+ # 尝试直接解析
359
+ json_str = response.content.strip()
360
+
361
+ workflow_data = json.loads(json_str)
362
+ state.workflow_plan = workflow_data.get("workflow", [])
363
+
364
+ print(f"🤖 工作流规划完成,共 {len(state.workflow_plan)} 个步骤:")
365
+ for step in state.workflow_plan:
366
+ print(f" 📋 步骤 {step.get('step', '?')}: {step.get('name', 'Unknown')}")
367
+ print(f" {step.get('description', 'No description')}")
368
+ if step.get('needs_search', False):
369
+ print(f" 🔍 需要搜索: 是")
370
+ if step.get('tools'):
371
+ print(f" 🛠️ 工具: {', '.join(step['tools'])}")
372
+ print()
373
+
374
+ except json.JSONDecodeError:
375
+ # 如果JSON���析失败,使用默认工作流
376
+ print("⚠️ 工作流规划解析失败,使用默认工作流")
377
+ state.workflow_plan = [
378
+ {
379
+ "step": 1,
380
+ "name": "媒体分类",
381
+ "description": "分析任务中的媒体类型",
382
+ "needs_search": False,
383
+ "tools": [],
384
+ "expected_output": "确定媒体类型"
385
+ },
386
+ {
387
+ "step": 2,
388
+ "name": "媒体分析",
389
+ "description": "分析媒体内容",
390
+ "needs_search": False,
391
+ "tools": ["媒体分析工具"],
392
+ "expected_output": "提取媒体信息"
393
+ },
394
+ {
395
+ "step": 3,
396
+ "name": "信息搜索",
397
+ "description": "搜索相关信息",
398
+ "needs_search": True,
399
+ "tools": ["搜索引擎"],
400
+ "expected_output": "搜索结果"
401
+ },
402
+ {
403
+ "step": 4,
404
+ "name": "工具使用",
405
+ "description": "使用专业工具",
406
+ "needs_search": False,
407
+ "tools": ["各种专业工具"],
408
+ "expected_output": "工具分析结果"
409
+ },
410
+ {
411
+ "step": 5,
412
+ "name": "答案合成",
413
+ "description": "综合所有信息生成答案",
414
+ "needs_search": False,
415
+ "tools": [],
416
+ "expected_output": "最终答案"
417
+ }
418
+ ]
419
+
420
+ except Exception as e:
421
+ print(f"❌ 工作流规划失败: {e}")
422
+ # 使用默认工作流
423
+ state.workflow_plan = [
424
+ {
425
+ "step": 1,
426
+ "name": "默认工作流",
427
+ "description": "使用默认工作流处理任务",
428
+ "needs_search": True,
429
+ "tools": [],
430
+ "expected_output": "任务完成"
431
+ }
432
+ ]
433
+
434
+ return state
435
+
436
+ def _classify_media(self, state: AgentState) -> AgentState:
437
+ """分类媒体类型"""
438
+ question = state.question.lower()
439
+
440
+ # 提取URL
441
+ import re
442
+ url_pattern = r'https?://[^\s]+'
443
+ urls = re.findall(url_pattern, state.question)
444
+
445
+ # 检测媒体类型
446
+ if any(keyword in question for keyword in ['图片', '图像', 'image', 'photo', 'img']):
447
+ state.media_type = 'image'
448
+ elif any(keyword in question for keyword in ['视频', 'video', 'movie', 'clip']):
449
+ state.media_type = 'video'
450
+ elif any(keyword in question for keyword in ['pdf', '文档', 'document', '报告', 'report']):
451
+ state.media_type = 'pdf'
452
+ elif any(keyword in question for keyword in ['网页', '网站', 'webpage', 'website', 'url', 'http', 'https']):
453
+ state.media_type = 'webpage'
454
+ elif any(keyword in question for keyword in ['youtube', 'yt', '视频', 'video']) and 'youtube.com' in question.lower():
455
+ state.media_type = 'youtube'
456
+ elif any(keyword in question for keyword in ['wikipedia', 'wiki', '维基', '百科']):
457
+ state.media_type = 'wikipedia'
458
+ else:
459
+ state.media_type = 'text'
460
+
461
+ # 设置媒体路径
462
+ if urls:
463
+ state.media_path = urls[0] # 使用第一个URL
464
+ else:
465
+ state.media_path = None
466
+
467
+ return state
468
+
469
+ def _analyze_media(self, state: AgentState) -> AgentState:
470
+ """分析媒体内容"""
471
+ if state.media_type == 'image' and state.media_path:
472
+ state.extracted_info = self.media_analyzer.analyze_image(state.media_path)
473
+ elif state.media_type == 'video' and state.media_path:
474
+ state.extracted_info = self.media_analyzer.analyze_video(state.media_path)
475
+ elif state.media_type == 'pdf' and state.media_path:
476
+ # PDF分析
477
+ pdf_info = self.tool_manager.execute_tool('analyze_pdf_structure', pdf_path=state.media_path)
478
+ pdf_text = self.tool_manager.execute_tool('extract_text_from_pdf', pdf_path=state.media_path)
479
+ state.extracted_info = {
480
+ 'type': 'pdf',
481
+ 'pdf_info': pdf_info,
482
+ 'text_content': pdf_text[:2000] if len(pdf_text) > 2000 else pdf_text # 限制文本长度
483
+ }
484
+ elif state.media_type == 'webpage' and state.media_path:
485
+ # 网页分析
486
+ webpage_content = self.tool_manager.execute_tool('fetch_webpage_content', url=state.media_path)
487
+ webpage_structure = self.tool_manager.execute_tool('analyze_webpage_structure', url=state.media_path)
488
+ state.extracted_info = {
489
+ 'type': 'webpage',
490
+ 'webpage_content': webpage_content,
491
+ 'webpage_structure': webpage_structure
492
+ }
493
+ elif state.media_type == 'youtube' and state.media_path:
494
+ # YouTube分析
495
+ youtube_info = self.tool_manager.execute_tool('get_youtube_info', url=state.media_path)
496
+ youtube_thumbnail = self.tool_manager.execute_tool('download_youtube_thumbnail', url=state.media_path)
497
+ state.extracted_info = {
498
+ 'type': 'youtube',
499
+ 'youtube_info': youtube_info,
500
+ 'thumbnail_path': youtube_thumbnail
501
+ }
502
+ elif state.media_type == 'wikipedia':
503
+ # Wikipedia分析 - 从问题中提取搜索词
504
+ import re
505
+ # 提取可能的Wikipedia页面标题
506
+ wiki_pattern = r'(?:wikipedia|wiki|维基|百科)\s*(?:关于|的|页面|词条)?\s*[::]\s*(.+)'
507
+ match = re.search(wiki_pattern, state.question, re.IGNORECASE)
508
+ if match:
509
+ search_term = match.group(1).strip()
510
+ else:
511
+ # 如果没有明确格式,尝试提取关键词
512
+ words = state.question.split()
513
+ search_term = ' '.join([w for w in words if w not in ['wikipedia', 'wiki', '维基', '百科', '的', '是', '什么', '关于']])
514
+
515
+ if search_term:
516
+ # 搜索Wikipedia
517
+ wiki_search = self.tool_manager.execute_tool('search_wikipedia', query=search_term, max_results=3)
518
+ if wiki_search and not 'error' in wiki_search[0]:
519
+ # 获取第一个结果的详细信息
520
+ first_result = wiki_search[0]
521
+ wiki_page = self.tool_manager.execute_tool('get_wikipedia_page', title=first_result['title'])
522
+ state.extracted_info = {
523
+ 'type': 'wikipedia',
524
+ 'search_term': search_term,
525
+ 'search_results': wiki_search,
526
+ 'page_content': wiki_page
527
+ }
528
+ else:
529
+ state.extracted_info = {
530
+ 'type': 'wikipedia',
531
+ 'search_term': search_term,
532
+ 'error': '未找到相关Wikipedia页面'
533
+ }
534
+ else:
535
+ state.extracted_info = {
536
+ 'type': 'wikipedia',
537
+ 'error': '无法提取搜索词'
538
+ }
539
+ else:
540
+ state.extracted_info = {'type': 'text', 'content': state.question}
541
+
542
+ return state
543
+
544
+ def _search_info(self, state: AgentState) -> AgentState:
545
+ """智能搜索相关信息"""
546
+ # 根据工作流计划决定是否搜索
547
+ should_search = False
548
+
549
+ # 检查当前步骤是否需要搜索
550
+ if state.workflow_plan and state.current_step < len(state.workflow_plan):
551
+ current_step_plan = state.workflow_plan[state.current_step]
552
+ should_search = current_step_plan.get('needs_search', False)
553
+
554
+ # 如果没有工作流计划,使用原来的逻辑
555
+ if not state.workflow_plan:
556
+ should_search = self.tool_manager.should_use_search(state.question, {'extracted_info': state.extracted_info})
557
+
558
+ if should_search:
559
+ print(f"🔍 执行搜索 (步骤 {state.current_step + 1})")
560
+ # 构建搜索查询
561
+ search_query = state.question
562
+ if state.extracted_info and 'caption' in state.extracted_info:
563
+ search_query += f" {state.extracted_info['caption']}"
564
+
565
+ state.search_results = self.search_engine.search(search_query)
566
+ print(f"✅ 搜索完成,找到 {len(state.search_results)} 个结果")
567
+ else:
568
+ print(f"⏭️ 跳过搜索 (步骤 {state.current_step + 1})")
569
+ # 不需要搜索,设置为空
570
+ state.search_results = []
571
+
572
+ # 更新当前步骤
573
+ state.current_step += 1
574
+
575
+ return state
576
+
577
+ def _use_tools(self, state: AgentState) -> AgentState:
578
+ """使用工具进行额外分析"""
579
+ try:
580
+ tool_results = {}
581
+
582
+ # 根据工作流计划选择工具
583
+ current_tools = []
584
+ if state.workflow_plan and state.current_step < len(state.workflow_plan):
585
+ current_step_plan = state.workflow_plan[state.current_step]
586
+ current_tools = current_step_plan.get('tools', [])
587
+ print(f"🛠️ 使用工具 (步骤 {state.current_step + 1}): {', '.join(current_tools) if current_tools else '无'}")
588
+
589
+ # 如果没有工作流计划或工具列表为空,使用原来的逻辑
590
+ if not current_tools:
591
+ question_lower = state.question.lower()
592
+
593
+ # 代码分析工具
594
+ if any(keyword in question_lower for keyword in ['代码', 'code', 'python', '程序', 'program']):
595
+ # 检查是否有代码内容
596
+ if '```python' in state.question or 'def ' in state.question or 'import ' in state.question:
597
+ # 提取代码块
598
+ code_start = state.question.find('```python')
599
+ if code_start != -1:
600
+ code_end = state.question.find('```', code_start + 8)
601
+ if code_end != -1:
602
+ code = state.question[code_start + 8:code_end].strip()
603
+ else:
604
+ code = state.question[code_start + 8:].strip()
605
+ else:
606
+ # 尝试提取代码片段
607
+ lines = state.question.split('\n')
608
+ code_lines = []
609
+ for line in lines:
610
+ if line.strip().startswith(('def ', 'import ', 'class ', 'if ', 'for ', 'while ')):
611
+ code_lines.append(line)
612
+ code = '\n'.join(code_lines)
613
+
614
+ if code.strip():
615
+ # 分析代码
616
+ tool_results['code_analysis'] = self.tool_manager.execute_tool(
617
+ 'analyze_python_code',
618
+ code=code
619
+ )
620
+
621
+ # 解释代码
622
+ tool_results['code_explanation'] = self.tool_manager.execute_tool(
623
+ 'explain_code',
624
+ code=code
625
+ )
626
+
627
+ # 如果需要执行代码
628
+ if any(keyword in question_lower for keyword in ['运行', '执行', 'execute', 'run']):
629
+ tool_results['code_execution'] = self.tool_manager.execute_tool(
630
+ 'execute_python_code',
631
+ code=code
632
+ )
633
+
634
+ # 视频内容分析
635
+ if state.media_type == 'video' and state.media_path:
636
+ if any(keyword in question_lower for keyword in ['视频', 'video', '内容', 'content']):
637
+ tool_results['video_analysis'] = self.tool_manager.execute_tool(
638
+ 'analyze_video_content',
639
+ video_path=state.media_path
640
+ )
641
+
642
+ # PDF内容分析
643
+ if state.media_type == 'pdf' and state.media_path:
644
+ if any(keyword in question_lower for keyword in ['pdf', '文档', 'document', '内容', 'content', '总结', 'summary']):
645
+ tool_results['pdf_summary'] = self.tool_manager.execute_tool(
646
+ 'summarize_pdf_content',
647
+ pdf_path=state.media_path
648
+ )
649
+
650
+ # PDF文本搜索
651
+ if any(keyword in question_lower for keyword in ['搜索', '查找', 'search', 'find']):
652
+ # 尝试从问题中提取搜索词
653
+ search_terms = []
654
+ for word in question_lower.split():
655
+ if len(word) > 2 and word not in ['搜索', '查找', 'search', 'find', 'pdf', '文档']:
656
+ search_terms.append(word)
657
+
658
+ if search_terms:
659
+ search_term = ' '.join(search_terms[:3]) # 最多3个词
660
+ tool_results['pdf_search'] = self.tool_manager.execute_tool(
661
+ 'search_text_in_pdf',
662
+ pdf_path=state.media_path,
663
+ search_term=search_term
664
+ )
665
+
666
+ # PDF图像提取
667
+ if any(keyword in question_lower for keyword in ['图像', '图片', 'image', '图', '图表']):
668
+ tool_results['pdf_images'] = self.tool_manager.execute_tool(
669
+ 'extract_images_from_pdf',
670
+ pdf_path=state.media_path
671
+ )
672
+
673
+ # 网页内容分析
674
+ if state.media_type == 'webpage' and state.media_path:
675
+ if any(keyword in question_lower for keyword in ['网页', '网站', 'webpage', 'website', '内容', 'content', '总结', 'summary']):
676
+ tool_results['webpage_summary'] = self.tool_manager.execute_tool(
677
+ 'summarize_webpage_content',
678
+ url=state.media_path
679
+ )
680
+
681
+ # 网页文本搜索
682
+ if any(keyword in question_lower for keyword in ['搜索', '查找', 'search', 'find']):
683
+ # 尝试从问题中提取搜索词
684
+ search_terms = []
685
+ for word in question_lower.split():
686
+ if len(word) > 2 and word not in ['搜索', '查找', 'search', 'find', '网页', '网站']:
687
+ search_terms.append(word)
688
+
689
+ if search_terms:
690
+ search_term = ' '.join(search_terms[:3]) # 最多3个词
691
+ tool_results['webpage_search'] = self.tool_manager.execute_tool(
692
+ 'search_content_in_webpage',
693
+ url=state.media_path,
694
+ search_term=search_term
695
+ )
696
+
697
+ # 网页链接提取
698
+ if any(keyword in question_lower for keyword in ['链接', 'link', 'url', '地址']):
699
+ tool_results['webpage_links'] = self.tool_manager.execute_tool(
700
+ 'extract_links_from_webpage',
701
+ url=state.media_path
702
+ )
703
+
704
+ # 网页可访问性检查
705
+ if any(keyword in question_lower for keyword in ['可访问性', 'accessibility', '无障碍', '检查']):
706
+ tool_results['webpage_accessibility'] = self.tool_manager.execute_tool(
707
+ 'check_webpage_accessibility',
708
+ url=state.media_path
709
+ )
710
+
711
+ # YouTube内容分析
712
+ if state.media_type == 'youtube' and state.media_path:
713
+ if any(keyword in question_lower for keyword in ['youtube', '视频', 'video', '内容', 'content', '信息', 'info']):
714
+ # 获取YouTube信息已经在_analyze_media中完成
715
+ pass
716
+
717
+ # YouTube视频下载
718
+ if any(keyword in question_lower for keyword in ['下载', 'download', '保存', 'save']):
719
+ tool_results['youtube_download'] = self.tool_manager.execute_tool(
720
+ 'download_youtube_video',
721
+ url=state.media_path
722
+ )
723
+
724
+ # YouTube音频提取
725
+ if any(keyword in question_lower for keyword in ['音频', 'audio', '声音', 'sound', '提取', 'extract']):
726
+ tool_results['youtube_audio'] = self.tool_manager.execute_tool(
727
+ 'extract_youtube_audio',
728
+ url=state.media_path
729
+ )
730
+
731
+ # YouTube评论分析
732
+ if any(keyword in question_lower for keyword in ['评论', 'comment', '反馈', 'feedback']):
733
+ tool_results['youtube_comments'] = self.tool_manager.execute_tool(
734
+ 'analyze_youtube_comments',
735
+ url=state.media_path
736
+ )
737
+
738
+ # Wikipedia内容分析
739
+ if state.media_type == 'wikipedia':
740
+ if any(keyword in question_lower for keyword in ['wikipedia', 'wiki', '维基', '百科', '搜索', 'search']):
741
+ # Wikipedia搜索已经在_analyze_media中完成
742
+ pass
743
+
744
+ # Wikipedia页面分类
745
+ if any(keyword in question_lower for keyword in ['分类', 'category', '类别']):
746
+ if state.extracted_info and 'page_content' in state.extracted_info and 'title' in state.extracted_info['page_content']:
747
+ tool_results['wikipedia_categories'] = self.tool_manager.execute_tool(
748
+ 'get_wikipedia_categories',
749
+ title=state.extracted_info['page_content']['title']
750
+ )
751
+
752
+ # Wikipedia页面链接
753
+ if any(keyword in question_lower for keyword in ['链接', 'link', '相关', 'related']):
754
+ if state.extracted_info and 'page_content' in state.extracted_info and 'title' in state.extracted_info['page_content']:
755
+ tool_results['wikipedia_links'] = self.tool_manager.execute_tool(
756
+ 'get_wikipedia_links',
757
+ title=state.extracted_info['page_content']['title']
758
+ )
759
+
760
+ # Wikipedia搜索建议
761
+ if any(keyword in question_lower for keyword in ['建议', 'suggestion', '推荐', 'recommend']):
762
+ if state.extracted_info and 'search_term' in state.extracted_info:
763
+ tool_results['wikipedia_suggestions'] = self.tool_manager.execute_tool(
764
+ 'get_wikipedia_suggestions',
765
+ query=state.extracted_info['search_term']
766
+ )
767
+
768
+ # 英文Wikipedia搜索
769
+ if any(keyword in question_lower for keyword in ['英文', 'english', '英文版']):
770
+ if state.extracted_info and 'search_term' in state.extracted_info:
771
+ tool_results['wikipedia_english_search'] = self.tool_manager.execute_tool(
772
+ 'search_wikipedia_english',
773
+ query=state.extracted_info['search_term']
774
+ )
775
+
776
+ # 随机Wikipedia页面
777
+ if any(keyword in question_lower for keyword in ['随机', 'random', '随便', '任意']):
778
+ tool_results['wikipedia_random'] = self.tool_manager.execute_tool(
779
+ 'get_wikipedia_random_page'
780
+ )
781
+
782
+ # 文本分析工具
783
+ if any(keyword in question_lower for keyword in ['情感', '情绪', 'sentiment', 'emotion']):
784
+ if state.extracted_info and 'caption' in state.extracted_info:
785
+ tool_results['sentiment'] = self.tool_manager.execute_tool(
786
+ 'analyze_text_sentiment',
787
+ text=state.extracted_info['caption']
788
+ )
789
+
790
+ # 关键词提取
791
+ if any(keyword in question_lower for keyword in ['关键词', '关键', 'keywords', 'key']):
792
+ tool_results['keywords'] = self.tool_manager.execute_tool(
793
+ 'extract_keywords',
794
+ text=state.question
795
+ )
796
+
797
+ # 文本摘要
798
+ if any(keyword in question_lower for keyword in ['摘要', '总结', 'summary', 'summarize']):
799
+ if state.search_results:
800
+ combined_text = " ".join(state.search_results)
801
+ tool_results['summary'] = self.tool_manager.execute_tool(
802
+ 'summarize_text',
803
+ text=combined_text
804
+ )
805
+
806
+ # 图像文本提取
807
+ if state.media_type == 'image' and state.media_path:
808
+ if any(keyword in question_lower for keyword in ['文字', '文本', 'text', 'ocr']):
809
+ tool_results['ocr_text'] = self.tool_manager.execute_tool(
810
+ 'extract_text_from_image',
811
+ image_path=state.media_path
812
+ )
813
+
814
+ # 视频音频分析
815
+ if state.media_type == 'video' and state.media_path:
816
+ if any(keyword in question_lower for keyword in ['音频', '声音', 'audio', 'sound']):
817
+ tool_results['audio_info'] = self.tool_manager.execute_tool(
818
+ 'extract_video_audio',
819
+ video_path=state.media_path
820
+ )
821
+
822
+ # 数学计算
823
+ if any(keyword in question_lower for keyword in ['计算', 'calculate', 'math', '数学']):
824
+ # 尝试提取数学表达式
825
+ import re
826
+ math_pattern = r'[\d\+\-\*\/\(\)\.\s]+'
827
+ math_matches = re.findall(math_pattern, state.question)
828
+ for match in math_matches:
829
+ if len(match.strip()) > 3: # 至少3个字符
830
+ try:
831
+ tool_results['math_calculation'] = self.tool_manager.execute_tool(
832
+ 'calculate_math_expression',
833
+ expression=match.strip()
834
+ )
835
+ break
836
+ except:
837
+ continue
838
+
839
+ # 翻译
840
+ if any(keyword in question_lower for keyword in ['翻译', 'translate']):
841
+ # 提取需要翻译的文本
842
+ text_to_translate = state.question
843
+ if '翻译' in text_to_translate:
844
+ text_to_translate = text_to_translate.split('翻译')[-1].strip()
845
+ elif 'translate' in text_to_translate:
846
+ text_to_translate = text_to_translate.split('translate')[-1].strip()
847
+
848
+ if text_to_translate and len(text_to_translate) > 2:
849
+ tool_results['translation'] = self.tool_manager.execute_tool(
850
+ 'translate_text',
851
+ text=text_to_translate
852
+ )
853
+
854
+ state.analysis_results = tool_results
855
+
856
+ except Exception as e:
857
+ state.error = f"工具使用失败: {str(e)}"
858
+ state.analysis_results = {}
859
+
860
+ return state
861
+
862
+ def _synthesize_answer(self, state: AgentState) -> AgentState:
863
+ """综合生成答案"""
864
+ try:
865
+ # 使用提示词函数生成提示
866
+ prompt = get_answer_prompt(
867
+ question=state.question,
868
+ media_analysis=json.dumps(state.extracted_info, ensure_ascii=False, indent=2),
869
+ search_results=json.dumps(state.search_results, ensure_ascii=False, indent=2),
870
+ tool_analysis=json.dumps(state.analysis_results, ensure_ascii=False, indent=2)
871
+ )
872
+
873
+ # 使用LLM生成答案
874
+ response = self.llm.invoke([HumanMessage(content=prompt)])
875
+ state.final_answer = response.content
876
+
877
+ except Exception as e:
878
+ state.error = f"答案生成失败: {str(e)}"
879
+ state.final_answer = ERROR_ANSWER_TEMPLATE
880
+
881
+ return state
882
+
883
+ def __call__(self, question: str, media_url: Optional[str] = None) -> str:
884
+ """主调用方法"""
885
+ try:
886
+ # 初始化状态
887
+ state = AgentState(question=question)
888
+
889
+ # 如果有媒体URL,下载并设置路径
890
+ if media_url:
891
+ if any(ext in media_url.lower() for ext in ['.pdf']):
892
+ media_type = 'pdf'
893
+ state.media_path = self.tool_manager.execute_tool('download_pdf_from_url', url=media_url)
894
+ elif 'youtube.com' in media_url.lower() or 'youtu.be' in media_url.lower():
895
+ media_type = 'youtube'
896
+ state.media_path = media_url # 直接使用URL
897
+ elif any(ext in media_url.lower() for ext in ['.mp4', '.avi', '.mov']):
898
+ media_type = 'video'
899
+ state.media_path = self.media_analyzer.download_media(media_url, media_type)
900
+ elif any(ext in media_url.lower() for ext in ['http://', 'https://', 'www.']):
901
+ media_type = 'webpage'
902
+ state.media_path = media_url # 直接使用URL
903
+ else:
904
+ media_type = 'image'
905
+ state.media_path = self.media_analyzer.download_media(media_url, media_type)
906
+ state.media_type = media_type
907
+
908
+ # 执行工作流
909
+ final_state = self.workflow.invoke(state)
910
+
911
+ # LangGraph返回的是字典,因此使用键来访问
912
+ return final_state['final_answer']
913
+
914
+ except Exception as e:
915
+ return f"智能体执行失败: {str(e)}"
916
+
917
+ def run_and_submit_all(profile: gr.OAuthProfile | None):
918
+ """运行评估并提交所有答案"""
919
+
920
+ # 获取用户信息
921
  if profile:
922
+ username = f"{profile.username}"
923
  print(f"User logged in: {username}")
924
  else:
925
  print("User not logged in.")
926
  return "Please Login to Hugging Face with the button.", None
927
 
928
+ space_id = os.getenv("SPACE_ID")
929
  api_url = DEFAULT_API_URL
930
  questions_url = f"{api_url}/questions"
931
  submit_url = f"{api_url}/submit"
932
 
933
+ # 初始化多模态智能体
934
  try:
935
+ agent = MultiModalAgent()
936
  except Exception as e:
937
  print(f"Error instantiating agent: {e}")
938
  return f"Error initializing agent: {e}", None
939
+
940
  agent_code = f"https://huggingface.co/spaces/{space_id}/tree/main"
941
  print(agent_code)
942
 
943
+ # 获取问题
944
  print(f"Fetching questions from: {questions_url}")
945
  try:
946
  response = requests.get(questions_url, timeout=15)
 
950
  print("Fetched questions list is empty.")
951
  return "Fetched questions list is empty or invalid format.", None
952
  print(f"Fetched {len(questions_data)} questions.")
953
+ except Exception as e:
954
  print(f"Error fetching questions: {e}")
955
  return f"Error fetching questions: {e}", None
 
 
 
 
 
 
 
956
 
957
+ # 运行智能体
958
  results_log = []
959
  answers_payload = []
960
  print(f"Running agent on {len(questions_data)} questions...")
961
+
962
  for item in questions_data:
963
  task_id = item.get("task_id")
964
  question_text = item.get("question")
965
  if not task_id or question_text is None:
966
  print(f"Skipping item with missing task_id or question: {item}")
967
  continue
968
+
969
  try:
970
  submitted_answer = agent(question_text)
971
  answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
 
978
  print("Agent did not produce any answers to submit.")
979
  return "Agent did not produce any answers to submit.", pd.DataFrame(results_log)
980
 
981
+ # 准备提交
982
  submission_data = {"username": username.strip(), "agent_code": agent_code, "answers": answers_payload}
983
  status_update = f"Agent finished. Submitting {len(answers_payload)} answers for user '{username}'..."
984
  print(status_update)
985
 
986
+ # 提交答案
987
  print(f"Submitting {len(answers_payload)} answers to: {submit_url}")
988
  try:
989
  response = requests.post(submit_url, json=submission_data, timeout=60)
 
999
  print("Submission successful.")
1000
  results_df = pd.DataFrame(results_log)
1001
  return final_status, results_df
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1002
  except Exception as e:
1003
+ status_message = f"Submission Failed: {e}"
1004
  print(status_message)
1005
  results_df = pd.DataFrame(results_log)
1006
  return status_message, results_df
1007
 
1008
+ def test_agent(question: str, media_url: str = ""):
1009
+ """测试智能体功能"""
1010
+ try:
1011
+ agent = MultiModalAgent()
1012
+ answer = agent(question, media_url if media_url else None)
1013
+ return answer
1014
+ except Exception as e:
1015
+ return f"测试失败: {str(e)}"
1016
 
1017
+ # 构建Gradio界面
1018
  with gr.Blocks() as demo:
1019
+ gr.Markdown("# 多模态智能体系统")
1020
  gr.Markdown(
1021
  """
1022
+ **功能特性:**
1023
+ - 🎥 视频理解与分析
1024
+ - 🖼️ 图像识别与描述
1025
+ - 🔍 智能搜索引擎
1026
+ - 🤖 LangGraph工作流编排
1027
+ - 🧠 多模态信息融合
1028
+
1029
+ **使用说明:**
1030
+ 1. 登录你的Hugging Face账户
1031
+ 2. 在测试区域输入问题(可选媒体URL)
1032
+ 3. 点击"运行评估"进行批量测试
1033
  """
1034
  )
1035
 
1036
  gr.LoginButton()
1037
 
1038
+ with gr.Tab("智能体测试"):
1039
+ with gr.Row():
1040
+ with gr.Column():
1041
+ test_question = gr.Textbox(label="问题", placeholder="请输入你的问题...")
1042
+ test_media_url = gr.Textbox(label="媒体URL(可选)", placeholder="图片或视频URL...")
1043
+ test_button = gr.Button("测试智能体")
1044
+
1045
+ with gr.Column():
1046
+ test_output = gr.Textbox(label="智能体回答", lines=10)
1047
 
1048
+ test_button.click(
1049
+ fn=test_agent,
1050
+ inputs=[test_question, test_media_url],
1051
+ outputs=test_output
1052
+ )
1053
 
1054
+ with gr.Tab("批量评估"):
1055
+ run_button = gr.Button("运行评估 & 提交所有答案")
1056
+ status_output = gr.Textbox(label="运行状态 / 提交结果", lines=5, interactive=False)
1057
+ results_table = gr.DataFrame(label="问题和智能体答案", wrap=True)
1058
  run_button.click(
1059
  fn=run_and_submit_all,
1060
  outputs=[status_output, results_table]
1061
  )
1062
 
1063
  if __name__ == "__main__":
1064
+ print("\n" + "-"*30 + " 多模态智能体系统启动 " + "-"*30)
1065
+
1066
  space_host_startup = os.getenv("SPACE_HOST")
1067
+ space_id_startup = os.getenv("SPACE_ID")
1068
 
1069
  if space_host_startup:
1070
  print(f"✅ SPACE_HOST found: {space_host_startup}")
1071
+ print(f" Runtime URL: https://{space_host_startup}.hf.space")
1072
  else:
1073
  print("ℹ️ SPACE_HOST environment variable not found (running locally?).")
1074
 
1075
+ if space_id_startup:
1076
  print(f"✅ SPACE_ID found: {space_id_startup}")
1077
  print(f" Repo URL: https://huggingface.co/spaces/{space_id_startup}")
 
1078
  else:
1079
+ print("ℹ️ SPACE_ID environment variable not found (running locally?).")
 
 
1080
 
1081
+ print("-"*(60 + len(" 多模态智能体系统启动 ")) + "\n")
1082
+ print("启动多模态智能体系统...")
1083
  demo.launch(debug=True, share=False)
check_ffmpeg.py ADDED
@@ -0,0 +1,148 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ 检查ffmpeg安装情况
4
+ """
5
+
6
+ import subprocess
7
+ import os
8
+ import sys
9
+
10
+ def check_ffmpeg():
11
+ """检查ffmpeg是否可用"""
12
+ print("🔍 检查ffmpeg安装情况...")
13
+
14
+ # 方法1: 检查系统PATH中的ffmpeg
15
+ try:
16
+ result = subprocess.run(['ffmpeg', '-version'],
17
+ capture_output=True, text=True, timeout=10)
18
+ if result.returncode == 0:
19
+ print("✅ ffmpeg在系统PATH中可用")
20
+ print(f" 版本信息: {result.stdout.split('ffmpeg version')[1].split('\n')[0]}")
21
+ return True
22
+ except (subprocess.TimeoutExpired, FileNotFoundError, subprocess.CalledProcessError):
23
+ print("❌ ffmpeg不在系统PATH中")
24
+
25
+ # 方法2: 检查conda环境中的ffmpeg
26
+ try:
27
+ conda_prefix = os.environ.get('CONDA_PREFIX')
28
+ if conda_prefix:
29
+ ffmpeg_path = os.path.join(conda_prefix, 'bin', 'ffmpeg')
30
+ if os.path.exists(ffmpeg_path):
31
+ print(f"✅ 在conda环境中找到ffmpeg: {ffmpeg_path}")
32
+ return True
33
+ else:
34
+ print(f"❌ conda环境中没有ffmpeg: {ffmpeg_path}")
35
+ except Exception as e:
36
+ print(f"❌ 检查conda环境失败: {e}")
37
+
38
+ # 方法3: 检查常见的ffmpeg安装路径
39
+ common_paths = [
40
+ r"C:\ffmpeg\bin\ffmpeg.exe",
41
+ r"C:\Program Files\ffmpeg\bin\ffmpeg.exe",
42
+ r"C:\Program Files (x86)\ffmpeg\bin\ffmpeg.exe",
43
+ os.path.expanduser(r"~\ffmpeg\bin\ffmpeg.exe")
44
+ ]
45
+
46
+ for path in common_paths:
47
+ if os.path.exists(path):
48
+ print(f"✅ 找到ffmpeg: {path}")
49
+ return True
50
+
51
+ print("❌ 未找到ffmpeg")
52
+ return False
53
+
54
+ def install_ffmpeg_conda():
55
+ """通过conda安装ffmpeg"""
56
+ print("\n📦 尝试通过conda安装ffmpeg...")
57
+ try:
58
+ result = subprocess.run(['conda', 'install', '-c', 'conda-forge', 'ffmpeg', '-y'],
59
+ capture_output=True, text=True, timeout=60)
60
+ if result.returncode == 0:
61
+ print("✅ ffmpeg安装成功")
62
+ return True
63
+ else:
64
+ print(f"❌ ffmpeg安装失败: {result.stderr}")
65
+ return False
66
+ except Exception as e:
67
+ print(f"❌ conda安装失败: {e}")
68
+ return False
69
+
70
+ def install_ffmpeg_pip():
71
+ """通过pip安装ffmpeg-python"""
72
+ print("\n📦 尝试通过pip安装ffmpeg-python...")
73
+ try:
74
+ result = subprocess.run([sys.executable, '-m', 'pip', 'install', 'ffmpeg-python'],
75
+ capture_output=True, text=True, timeout=60)
76
+ if result.returncode == 0:
77
+ print("✅ ffmpeg-python安装成功")
78
+ return True
79
+ else:
80
+ print(f"❌ ffmpeg-python安装失败: {result.stderr}")
81
+ return False
82
+ except Exception as e:
83
+ print(f"❌ pip安装失败: {e}")
84
+ return False
85
+
86
+ def test_audio_without_ffmpeg():
87
+ """测试不使用ffmpeg的音频处理"""
88
+ print("\n🎵 测试不使用ffmpeg的音频处理...")
89
+
90
+ try:
91
+ import yt_dlp
92
+ print("✅ yt-dlp可用")
93
+
94
+ # 测试下载音频(不转换格式)
95
+ test_url = "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
96
+
97
+ ydl_opts = {
98
+ 'format': 'bestaudio/best',
99
+ 'outtmpl': 'downloads/test_audio.%(ext)s',
100
+ 'quiet': True
101
+ }
102
+
103
+ with yt_dlp.YoutubeDL(ydl_opts) as ydl:
104
+ info = ydl.extract_info(test_url, download=True)
105
+ audio_path = ydl.prepare_filename(info)
106
+
107
+ if os.path.exists(audio_path):
108
+ print(f"✅ 音频下载成功: {audio_path}")
109
+ print(f" 文件大小: {os.path.getsize(audio_path)} bytes")
110
+ return True
111
+ else:
112
+ print(f"❌ 音频下载失败")
113
+ return False
114
+
115
+ except Exception as e:
116
+ print(f"❌ 音频处理测试失败: {e}")
117
+ return False
118
+
119
+ def main():
120
+ print("🔧 ffmpeg检查和安装工具")
121
+ print("="*50)
122
+
123
+ # 检查ffmpeg
124
+ ffmpeg_available = check_ffmpeg()
125
+
126
+ if not ffmpeg_available:
127
+ print("\n📋 解决方案:")
128
+ print("1. 通过conda安装ffmpeg")
129
+ print("2. 手动下载ffmpeg并添加到PATH")
130
+ print("3. 使用不依赖ffmpeg的音频处理方法")
131
+
132
+ choice = input("\n选择解决方案 (1/2/3): ").strip()
133
+
134
+ if choice == "1":
135
+ install_ffmpeg_conda()
136
+ elif choice == "2":
137
+ print("请手动下载ffmpeg并添加到系统PATH")
138
+ print("下载地址: https://ffmpeg.org/download.html")
139
+ elif choice == "3":
140
+ test_audio_without_ffmpeg()
141
+ else:
142
+ print("无效选择")
143
+ else:
144
+ print("\n✅ ffmpeg已可用,可以正常进行音频处理")
145
+ test_audio_without_ffmpeg()
146
+
147
+ if __name__ == "__main__":
148
+ main()
config.py ADDED
@@ -0,0 +1,122 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ 多模态智能体系统配置文件
3
+ """
4
+ import os
5
+ import json
6
+ from typing import Optional, Dict, Any
7
+ from pathlib import Path
8
+
9
+ class Config:
10
+ """系统配置类"""
11
+
12
+ # API配置文件路径
13
+ API_KEYS_FILE: str = "api_keys.json"
14
+
15
+ # OpenAI配置
16
+ OPENAI_API_KEY: Optional[str] = None
17
+ OPENAI_MODEL: str = "gpt-4o"
18
+ OPENAI_TEMPERATURE: float = 0.7
19
+
20
+ # Hugging Face配置
21
+ HUGGINGFACE_API_KEY: Optional[str] = None
22
+
23
+ # 搜索引擎配置
24
+ SEARCH_ENGINE_TYPE: str = "duckduckgo"
25
+ SEARCH_ENGINE_API_KEY: Optional[str] = None
26
+
27
+ # 模型配置
28
+ IMAGE_CAPTION_MODEL: str = "Salesforce/blip-image-captioning-base"
29
+ IMAGE_CLASSIFICATION_MODEL: str = "microsoft/resnet-50"
30
+ OBJECT_DETECTION_MODEL: str = "facebook/detr-resnet-50"
31
+ GIT_MODEL: str = "microsoft/git-base"
32
+
33
+ # 系统配置
34
+ DEBUG: bool = os.getenv("DEBUG", "False").lower() == "true"
35
+ LOG_LEVEL: str = os.getenv("LOG_LEVEL", "INFO")
36
+
37
+ # 媒体处理配置
38
+ MAX_VIDEO_DURATION: int = 300 # 最大视频时长(秒)
39
+ FRAMES_TO_ANALYZE: int = 5 # 视频分析帧数
40
+ MAX_IMAGE_SIZE: int = 1024 # 最大图像尺寸
41
+
42
+ # 缓存配置
43
+ CACHE_DIR: str = "./cache"
44
+ TEMP_DIR: str = "./temp"
45
+
46
+ @classmethod
47
+ def load_api_keys(cls) -> bool:
48
+ """从文件加载API密钥"""
49
+ try:
50
+ api_file = Path(cls.API_KEYS_FILE)
51
+ if not api_file.exists():
52
+ print(f"⚠️ API配置文件 {cls.API_KEYS_FILE} 不存在")
53
+ print("请创建该文件并配置你的API密钥")
54
+ return False
55
+
56
+ with open(api_file, 'r', encoding='utf-8') as f:
57
+ api_config = json.load(f)
58
+
59
+ # 加载OpenAI配置
60
+ if 'openai' in api_config and api_config['openai'].get('api_key'):
61
+ cls.OPENAI_API_KEY = api_config['openai']['api_key']
62
+ print("✅ OpenAI API密钥已加载")
63
+ else:
64
+ print("⚠️ OpenAI API密钥未配置")
65
+
66
+ # 加载Hugging Face配置
67
+ if 'huggingface' in api_config and api_config['huggingface'].get('api_key'):
68
+ cls.HUGGINGFACE_API_KEY = api_config['huggingface']['api_key']
69
+ print("✅ Hugging Face API密钥已加载")
70
+
71
+ # 加载搜索引擎配置
72
+ if 'search_engine' in api_config:
73
+ search_config = api_config['search_engine']
74
+ cls.SEARCH_ENGINE_TYPE = search_config.get('type', 'duckduckgo')
75
+ cls.SEARCH_ENGINE_API_KEY = search_config.get('api_key')
76
+ print(f"✅ 搜索引擎类型: {cls.SEARCH_ENGINE_TYPE}")
77
+
78
+ return True
79
+
80
+ except json.JSONDecodeError as e:
81
+ print(f"❌ API配置文件格式错误: {e}")
82
+ return False
83
+ except Exception as e:
84
+ print(f"❌ 加载API配置失败: {e}")
85
+ return False
86
+
87
+ @classmethod
88
+ def validate(cls) -> bool:
89
+ """验证配置是否完整"""
90
+ # 首先尝试从文件加载API密钥
91
+ cls.load_api_keys()
92
+
93
+ # 如果文件加载失败,尝试从环境变量加载
94
+ if not cls.OPENAI_API_KEY:
95
+ cls.OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
96
+
97
+ if not cls.HUGGINGFACE_API_KEY:
98
+ cls.HUGGINGFACE_API_KEY = os.getenv("HUGGINGFACE_API_KEY")
99
+
100
+ # 验证必要的配置
101
+ if not cls.OPENAI_API_KEY:
102
+ print("❌ 缺少OpenAI API密钥")
103
+ print("请在 api_keys.json 文件中配置或设置环境变量 OPENAI_API_KEY")
104
+ return False
105
+
106
+ return True
107
+
108
+ @classmethod
109
+ def print_config(cls):
110
+ """打印当前配置"""
111
+ print("=== 多模态智能体系统配置 ===")
112
+ print(f"OpenAI模型: {cls.OPENAI_MODEL}")
113
+ print(f"OpenAI温度: {cls.OPENAI_TEMPERATURE}")
114
+ print(f"OpenAI API密钥: {'已配置' if cls.OPENAI_API_KEY else '未配置'}")
115
+ print(f"Hugging Face API密钥: {'已配置' if cls.HUGGINGFACE_API_KEY else '未配置'}")
116
+ print(f"搜索引擎类型: {cls.SEARCH_ENGINE_TYPE}")
117
+ print(f"图像描述模型: {cls.IMAGE_CAPTION_MODEL}")
118
+ print(f"图像分类模型: {cls.IMAGE_CLASSIFICATION_MODEL}")
119
+ print(f"对象检测模型: {cls.OBJECT_DETECTION_MODEL}")
120
+ print(f"调试模式: {cls.DEBUG}")
121
+ print(f"日志级别: {cls.LOG_LEVEL}")
122
+ print("=" * 30)
prompts.py ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ 提示词配置文件
3
+ 包含系统提示和各种提示模板
4
+ """
5
+
6
+ # 系统提示 - 用于智能体回答问题的格式规范
7
+ SYSTEM_PROMPT = """You are a helpful assistant tasked with answering questions using a set of tools.
8
+ Now, I will ask you a question. Report your thoughts, and finish your answer with the following template:
9
+ [YOUR FINAL ANSWER].
10
+ YOUR FINAL ANSWER should be a number OR as few words as possible OR a comma separated list of numbers and/or strings. If you are asked for a number, don't use comma to write your number neither use units such as $ or percent sign unless specified otherwise. If you are asked for a string, don't use articles, neither abbreviations (e.g. for cities), and write the digits in plain text unless specified otherwise. If you are asked for a comma separated list, Apply the rules above for each element (number or string), ensure there is exactly one space after each comma.
11
+ Your answer should only contain the final answer without any prefix or additional text.
12
+
13
+ IMPORTANT: Only provide the final answer without any explanations, reasoning, or additional text."""
14
+
15
+ # 答案生成提示模板
16
+ ANSWER_GENERATION_TEMPLATE = f"""{SYSTEM_PROMPT}
17
+
18
+ 基于以下信息回答问题:
19
+
20
+ 问题: {{question}}
21
+
22
+ 媒体分析结果: {{media_analysis}}
23
+
24
+ 搜索结果: {{search_results}}
25
+
26
+ 工具分析结果: {{tool_analysis}}
27
+
28
+ 请分析以上信息,然后直接使用指定的格式提供最终答案。不要包含任何解释或推理过程。"""
29
+
30
+ # 错误回答模板
31
+ ERROR_ANSWER_TEMPLATE = "抱歉,我无法生成答案。"
32
+
33
+ # 工具使用提示
34
+ TOOL_USAGE_PROMPT = """你是一个智能助手,可以使用各种工具来回答问题。
35
+ 请根据问题类型和可用信息,选择合适的工具来获取答案。
36
+ 记住最终答案应该简洁明了,不包含任何前缀。"""
37
+
38
+ # 媒体分析提示
39
+ MEDIA_ANALYSIS_PROMPT = """请分析提供的媒体内容(图像或视频),提取关键信息。
40
+ 重点关注:
41
+ - 视觉内容描述
42
+ - 文本内容(如果有)
43
+ - 对象识别
44
+ - 场景理解
45
+ - 任何相关的数字或文本信息"""
46
+
47
+ # 搜索提示
48
+ SEARCH_PROMPT = """请使用搜索引擎查找相关信息来回答问题。
49
+ 搜索查询应该:
50
+ - 简洁明确
51
+ - 包含问题的关键信息
52
+ - 避免过于宽泛或过于具体"""
53
+
54
+ def get_answer_prompt(question: str, media_analysis: str, search_results: str, tool_analysis: str) -> str:
55
+ """生成答案提示词"""
56
+ return ANSWER_GENERATION_TEMPLATE.format(
57
+ question=question,
58
+ media_analysis=media_analysis,
59
+ search_results=search_results,
60
+ tool_analysis=tool_analysis
61
+ )
requirements.txt CHANGED
@@ -1,2 +1,24 @@
1
  gradio
2
- requests
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  gradio
2
+ requests
3
+ langgraph
4
+ langchain
5
+ langchain-community
6
+ langchain-openai
7
+ transformers
8
+ torch
9
+ torchvision
10
+ pillow
11
+ opencv-python
12
+ duckduckgo-search
13
+ python-dotenv
14
+ numpy
15
+ pandas
16
+ matplotlib
17
+ seaborn
18
+ PyPDF2
19
+ PyMuPDF
20
+ pdf2image
21
+ beautifulsoup4
22
+ pytube
23
+ yt-dlp
24
+ wikipedia-api
run.py ADDED
@@ -0,0 +1,138 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ 多模态智能体系统启动脚本
4
+ """
5
+ import os
6
+ import sys
7
+ import argparse
8
+ from pathlib import Path
9
+
10
+ # 添加项目根目录到Python路径
11
+ sys.path.append(os.path.dirname(os.path.abspath(__file__)))
12
+
13
+ from config import Config
14
+
15
+ def check_environment():
16
+ """检查运行环境"""
17
+ print("🔍 检查运行环境...")
18
+
19
+ # 检查Python版本
20
+ if sys.version_info < (3, 8):
21
+ print("❌ Python版本过低,需要Python 3.8+")
22
+ return False
23
+
24
+ print(f"✅ Python版本: {sys.version}")
25
+
26
+ # 检查必要的环境变量
27
+ if not Config.validate():
28
+ print("❌ 环境变量配置不完整")
29
+ print("请设置以下环境变量:")
30
+ print(" - OPENAI_API_KEY")
31
+ return False
32
+
33
+ print("✅ 环境变量配置正确")
34
+
35
+ # 检查依赖包
36
+ try:
37
+ import torch
38
+ import transformers
39
+ import langchain
40
+ import langgraph
41
+ import gradio
42
+ print("✅ 核心依赖包已安装")
43
+ except ImportError as e:
44
+ print(f"❌ 缺少依赖包: {e}")
45
+ print("请运行: pip install -r requirements.txt")
46
+ return False
47
+
48
+ return True
49
+
50
+ def run_web_interface():
51
+ """运行Web界面"""
52
+ print("🌐 启动Web界面...")
53
+ from app import demo
54
+ demo.launch(debug=Config.DEBUG, share=False)
55
+
56
+ def run_test():
57
+ """运行测试"""
58
+ print("🧪 运行系统测试...")
59
+ from test_agent import main as test_main
60
+ test_main()
61
+
62
+ def run_interactive():
63
+ """运行交互式模式"""
64
+ print("💬 启动交互式模式...")
65
+ from app import MultiModalAgent
66
+
67
+ agent = MultiModalAgent()
68
+ print("智能体已初始化,输入 'quit' 退出")
69
+
70
+ while True:
71
+ try:
72
+ question = input("\n请输入问题: ").strip()
73
+ if question.lower() in ['quit', 'exit', 'q']:
74
+ break
75
+
76
+ if not question:
77
+ continue
78
+
79
+ print("🤖 正在处理...")
80
+ answer = agent(question)
81
+ print(f"回答: {answer}")
82
+
83
+ except KeyboardInterrupt:
84
+ print("\n👋 再见!")
85
+ break
86
+ except Exception as e:
87
+ print(f"❌ 错误: {str(e)}")
88
+
89
+ def main():
90
+ """主函数"""
91
+ parser = argparse.ArgumentParser(description="多模态智能体系统")
92
+ parser.add_argument(
93
+ "--mode",
94
+ choices=["web", "test", "interactive"],
95
+ default="web",
96
+ help="运行模式: web(Web界面), test(测试), interactive(交互式)"
97
+ )
98
+ parser.add_argument(
99
+ "--debug",
100
+ action="store_true",
101
+ help="启用调试模式"
102
+ )
103
+
104
+ args = parser.parse_args()
105
+
106
+ # 设置调试模式
107
+ if args.debug:
108
+ os.environ["DEBUG"] = "True"
109
+ os.environ["LOG_LEVEL"] = "DEBUG"
110
+
111
+ print("🚀 多模态智能体系统")
112
+ print("=" * 40)
113
+
114
+ # 检查环境
115
+ if not check_environment():
116
+ sys.exit(1)
117
+
118
+ # 打印配置
119
+ Config.print_config()
120
+
121
+ # 根据模式运行
122
+ try:
123
+ if args.mode == "web":
124
+ run_web_interface()
125
+ elif args.mode == "test":
126
+ run_test()
127
+ elif args.mode == "interactive":
128
+ run_interactive()
129
+ except KeyboardInterrupt:
130
+ print("\n👋 程序被用户中断")
131
+ except Exception as e:
132
+ print(f"❌ 运行错误: {str(e)}")
133
+ if Config.DEBUG:
134
+ import traceback
135
+ traceback.print_exc()
136
+
137
+ if __name__ == "__main__":
138
+ main()
tools.py ADDED
@@ -0,0 +1,2197 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ 多模态智能体工具模块
3
+ """
4
+ import os
5
+ import json
6
+ import requests
7
+ import tempfile
8
+ import ast
9
+ import subprocess
10
+ import sys
11
+ from typing import Dict, List, Any, Optional
12
+ from pathlib import Path
13
+ import cv2
14
+ import numpy as np
15
+ from PIL import Image
16
+ import torch
17
+ from transformers import pipeline
18
+ from langchain_core.tools import tool
19
+ from langchain_community.tools import DuckDuckGoSearchRun
20
+ from config import Config
21
+
22
+ # PDF处理相关导入
23
+ try:
24
+ import PyPDF2
25
+ import fitz # PyMuPDF
26
+ from pdf2image import convert_from_path
27
+ PDF_AVAILABLE = True
28
+ except ImportError:
29
+ PDF_AVAILABLE = False
30
+ print("⚠️ PDF处理功能需要安装: pip install PyPDF2 PyMuPDF pdf2image")
31
+
32
+ # 网页处理相关导入
33
+ try:
34
+ import requests
35
+ from bs4 import BeautifulSoup
36
+ import urllib.parse
37
+ from urllib.parse import urljoin, urlparse
38
+ import re
39
+ import time
40
+ WEB_AVAILABLE = True
41
+ except ImportError:
42
+ WEB_AVAILABLE = False
43
+ print("⚠️ 网页处理功能需要安装: pip install beautifulsoup4 requests")
44
+
45
+ # YouTube处理相关导入
46
+ try:
47
+ from pytube import YouTube
48
+ YOUTUBE_AVAILABLE = True
49
+ YT_DLP_AVAILABLE = False
50
+ try:
51
+ import yt_dlp
52
+ YT_DLP_AVAILABLE = True
53
+ except ImportError:
54
+ pass
55
+ except ImportError:
56
+ YOUTUBE_AVAILABLE = False
57
+ YT_DLP_AVAILABLE = False
58
+ print("⚠️ YouTube处理功能需要安装: pip install pytube")
59
+
60
+ # 音频处理相关导入
61
+ try:
62
+ import speech_recognition as sr
63
+ from pydub import AudioSegment
64
+ AUDIO_PROCESSING_AVAILABLE = True
65
+ except ImportError:
66
+ AUDIO_PROCESSING_AVAILABLE = False
67
+ print("⚠️ 音频处理功能需要安装: pip install SpeechRecognition pydub")
68
+
69
+ # Wikipedia处理相关导入
70
+ try:
71
+ import wikipediaapi
72
+ import requests
73
+ from bs4 import BeautifulSoup
74
+ WIKIPEDIA_AVAILABLE = True
75
+ except ImportError:
76
+ WIKIPEDIA_AVAILABLE = False
77
+ print("⚠️ Wikipedia处理功能需要安装: pip install wikipedia-api requests beautifulsoup4")
78
+
79
+ class WebTools:
80
+ """网页内容分析工具类"""
81
+
82
+ @staticmethod
83
+ @tool
84
+ def fetch_webpage_content(url: str) -> Dict[str, Any]:
85
+ """获取网页内容"""
86
+ try:
87
+ if not WEB_AVAILABLE:
88
+ return {"error": "网页处理功能未安装,请运行: pip install beautifulsoup4 requests"}
89
+
90
+ # 设置请求头,模拟浏览器
91
+ headers = {
92
+ 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
93
+ }
94
+
95
+ # 发送请求
96
+ response = requests.get(url, headers=headers, timeout=30)
97
+ response.raise_for_status()
98
+
99
+ # 解析HTML
100
+ soup = BeautifulSoup(response.content, 'html.parser')
101
+
102
+ # 提取基本信息
103
+ title = soup.find('title')
104
+ title_text = title.get_text().strip() if title else "无标题"
105
+
106
+ # 提取主要文本内容
107
+ # 移除脚本和样式标签
108
+ for script in soup(["script", "style"]):
109
+ script.decompose()
110
+
111
+ # 获取文本内容
112
+ text_content = soup.get_text()
113
+ lines = (line.strip() for line in text_content.splitlines())
114
+ chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
115
+ text_content = ' '.join(chunk for chunk in chunks if chunk)
116
+
117
+ # 提取链接
118
+ links = []
119
+ for link in soup.find_all('a', href=True):
120
+ href = link.get('href')
121
+ text = link.get_text().strip()
122
+ if href and text:
123
+ full_url = urljoin(url, href)
124
+ links.append({
125
+ 'url': full_url,
126
+ 'text': text[:100] # 限制文本长度
127
+ })
128
+
129
+ # 提取图片
130
+ images = []
131
+ for img in soup.find_all('img', src=True):
132
+ src = img.get('src')
133
+ alt = img.get('alt', '')
134
+ if src:
135
+ full_url = urljoin(url, src)
136
+ images.append({
137
+ 'url': full_url,
138
+ 'alt': alt[:100]
139
+ })
140
+
141
+ # 提取元数据
142
+ meta_data = {}
143
+ for meta in soup.find_all('meta'):
144
+ name = meta.get('name') or meta.get('property')
145
+ content = meta.get('content')
146
+ if name and content:
147
+ meta_data[name] = content
148
+
149
+ return {
150
+ 'url': url,
151
+ 'title': title_text,
152
+ 'text_content': text_content[:5000], # 限制文本长度
153
+ 'links_count': len(links),
154
+ 'images_count': len(images),
155
+ 'links': links[:20], # 限制链接数量
156
+ 'images': images[:10], # 限制图片数量
157
+ 'meta_data': meta_data,
158
+ 'status_code': response.status_code,
159
+ 'content_type': response.headers.get('content-type', ''),
160
+ 'encoding': response.encoding
161
+ }
162
+
163
+ except Exception as e:
164
+ return {"error": f"网页内容获取失败: {str(e)}"}
165
+
166
+ @staticmethod
167
+ @tool
168
+ def extract_text_from_webpage(url: str) -> str:
169
+ """从网页中提取纯文本内容"""
170
+ try:
171
+ if not WEB_AVAILABLE:
172
+ return "网页处理功能未安装,请运行: pip install beautifulsoup4 requests"
173
+
174
+ headers = {
175
+ 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
176
+ }
177
+
178
+ response = requests.get(url, headers=headers, timeout=30)
179
+ response.raise_for_status()
180
+
181
+ soup = BeautifulSoup(response.content, 'html.parser')
182
+
183
+ # 移除不需要的标签
184
+ for tag in soup(['script', 'style', 'nav', 'footer', 'header']):
185
+ tag.decompose()
186
+
187
+ # 提取文本
188
+ text = soup.get_text()
189
+ lines = (line.strip() for line in text.splitlines())
190
+ chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
191
+ text = ' '.join(chunk for chunk in chunks if chunk)
192
+
193
+ return text if text.strip() else "网页中没有找到文本内容"
194
+
195
+ except Exception as e:
196
+ return f"文本提取失败: {str(e)}"
197
+
198
+ @staticmethod
199
+ @tool
200
+ def analyze_webpage_structure(url: str) -> Dict[str, Any]:
201
+ """分析网页结构"""
202
+ try:
203
+ if not WEB_AVAILABLE:
204
+ return {"error": "网页处理功能未安装,请运行: pip install beautifulsoup4 requests"}
205
+
206
+ headers = {
207
+ 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
208
+ }
209
+
210
+ response = requests.get(url, headers=headers, timeout=30)
211
+ response.raise_for_status()
212
+
213
+ soup = BeautifulSoup(response.content, 'html.parser')
214
+
215
+ # 分析页面结构
216
+ structure = {
217
+ 'url': url,
218
+ 'title': soup.find('title').get_text().strip() if soup.find('title') else "无标题",
219
+ 'headings': {},
220
+ 'sections': [],
221
+ 'forms': [],
222
+ 'tables': [],
223
+ 'lists': []
224
+ }
225
+
226
+ # 分析标题层级
227
+ for i in range(1, 7):
228
+ headings = soup.find_all(f'h{i}')
229
+ structure['headings'][f'h{i}'] = len(headings)
230
+
231
+ # 分析主要区域
232
+ main_sections = soup.find_all(['main', 'article', 'section', 'div'], class_=re.compile(r'main|content|article|post'))
233
+ for section in main_sections[:5]: # 限制数量
234
+ section_text = section.get_text().strip()[:200]
235
+ structure['sections'].append({
236
+ 'tag': section.name,
237
+ 'class': section.get('class', []),
238
+ 'text_preview': section_text
239
+ })
240
+
241
+ # 分析表单
242
+ forms = soup.find_all('form')
243
+ for form in forms[:3]:
244
+ inputs = form.find_all('input')
245
+ structure['forms'].append({
246
+ 'action': form.get('action', ''),
247
+ 'method': form.get('method', ''),
248
+ 'input_count': len(inputs)
249
+ })
250
+
251
+ # 分析表格
252
+ tables = soup.find_all('table')
253
+ for table in tables[:3]:
254
+ rows = table.find_all('tr')
255
+ structure['tables'].append({
256
+ 'row_count': len(rows),
257
+ 'has_header': bool(table.find('th'))
258
+ })
259
+
260
+ # 分析列表
261
+ lists = soup.find_all(['ul', 'ol'])
262
+ for lst in lists[:5]:
263
+ items = lst.find_all('li')
264
+ structure['lists'].append({
265
+ 'type': lst.name,
266
+ 'item_count': len(items)
267
+ })
268
+
269
+ return structure
270
+
271
+ except Exception as e:
272
+ return {"error": f"网页结构分析失败: {str(e)}"}
273
+
274
+ @staticmethod
275
+ @tool
276
+ def search_content_in_webpage(url: str, search_term: str) -> List[Dict[str, Any]]:
277
+ """在网页中搜索特定内容"""
278
+ try:
279
+ if not WEB_AVAILABLE:
280
+ return [{"error": "网页处��功能未安装,请运行: pip install beautifulsoup4 requests"}]
281
+
282
+ headers = {
283
+ 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
284
+ }
285
+
286
+ response = requests.get(url, headers=headers, timeout=30)
287
+ response.raise_for_status()
288
+
289
+ soup = BeautifulSoup(response.content, 'html.parser')
290
+
291
+ # 移除脚本和样式
292
+ for script in soup(["script", "style"]):
293
+ script.decompose()
294
+
295
+ text = soup.get_text()
296
+
297
+ # 搜索匹配项
298
+ search_results = []
299
+ lines = text.split('\n')
300
+
301
+ for i, line in enumerate(lines):
302
+ if search_term.lower() in line.lower():
303
+ # 获取上下文
304
+ start = max(0, i - 1)
305
+ end = min(len(lines), i + 2)
306
+ context = '\n'.join(lines[start:end])
307
+
308
+ search_results.append({
309
+ 'line_number': i + 1,
310
+ 'matched_text': line.strip(),
311
+ 'context': context.strip()
312
+ })
313
+
314
+ if len(search_results) >= 10: # 限制结果数量
315
+ break
316
+
317
+ return search_results
318
+
319
+ except Exception as e:
320
+ return [{"error": f"网页内容搜索失败: {str(e)}"}]
321
+
322
+ @staticmethod
323
+ @tool
324
+ def extract_links_from_webpage(url: str) -> List[Dict[str, str]]:
325
+ """从网页中提取所有链接"""
326
+ try:
327
+ if not WEB_AVAILABLE:
328
+ return [{"error": "网页处理功能未安装,请运行: pip install beautifulsoup4 requests"}]
329
+
330
+ headers = {
331
+ 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
332
+ }
333
+
334
+ response = requests.get(url, headers=headers, timeout=30)
335
+ response.raise_for_status()
336
+
337
+ soup = BeautifulSoup(response.content, 'html.parser')
338
+
339
+ links = []
340
+ for link in soup.find_all('a', href=True):
341
+ href = link.get('href')
342
+ text = link.get_text().strip()
343
+
344
+ if href and text:
345
+ full_url = urljoin(url, href)
346
+ parsed_url = urlparse(full_url)
347
+
348
+ links.append({
349
+ 'url': full_url,
350
+ 'text': text[:100],
351
+ 'domain': parsed_url.netloc,
352
+ 'path': parsed_url.path
353
+ })
354
+
355
+ return links[:50] # 限制链接数量
356
+
357
+ except Exception as e:
358
+ return [{"error": f"链接提取失败: {str(e)}"}]
359
+
360
+ @staticmethod
361
+ @tool
362
+ def summarize_webpage_content(url: str) -> str:
363
+ """总结网页内容"""
364
+ try:
365
+ if not WEB_AVAILABLE:
366
+ return "网页处理功能未安装,请运行: pip install beautifulsoup4 requests"
367
+
368
+ # 获取网页内容
369
+ content_result = WebTools.fetch_webpage_content(url)
370
+ if "error" in content_result:
371
+ return content_result["error"]
372
+
373
+ # 提取文本内容
374
+ text_content = content_result.get('text_content', '')
375
+ if not text_content:
376
+ return "网页中没有找到可总结的内容"
377
+
378
+ # 使用LLM总结内容
379
+ from langchain_openai import ChatOpenAI
380
+ from langchain_core.messages import HumanMessage
381
+
382
+ llm = ChatOpenAI(
383
+ model=Config.OPENAI_MODEL,
384
+ temperature=0.3,
385
+ api_key=Config.OPENAI_API_KEY
386
+ )
387
+
388
+ # 如果文本太长,分段处理
389
+ if len(text_content) > 4000:
390
+ text_content = text_content[:4000] + "..."
391
+
392
+ prompt = f"""
393
+ 请总结以下网页的主要内容:
394
+
395
+ 标题: {content_result.get('title', '无标题')}
396
+ URL: {url}
397
+
398
+ 内容:
399
+ {text_content}
400
+
401
+ 请提供:
402
+ 1. 网页的主要主题
403
+ 2. 关键信息点
404
+ 3. 重要内容摘要
405
+ 4. 网页类型和用途
406
+ """
407
+
408
+ response = llm.invoke([HumanMessage(content=prompt)])
409
+ return response.content
410
+
411
+ except Exception as e:
412
+ return f"网页内容总结失败: {str(e)}"
413
+
414
+ @staticmethod
415
+ @tool
416
+ def check_webpage_accessibility(url: str) -> Dict[str, Any]:
417
+ """检查网页的可访问性"""
418
+ try:
419
+ if not WEB_AVAILABLE:
420
+ return {"error": "网页处理功能未安装,请运行: pip install beautifulsoup4 requests"}
421
+
422
+ headers = {
423
+ 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
424
+ }
425
+
426
+ response = requests.get(url, headers=headers, timeout=30)
427
+ response.raise_for_status()
428
+
429
+ soup = BeautifulSoup(response.content, 'html.parser')
430
+
431
+ accessibility_report = {
432
+ 'url': url,
433
+ 'status_code': response.status_code,
434
+ 'load_time': response.elapsed.total_seconds(),
435
+ 'issues': [],
436
+ 'recommendations': []
437
+ }
438
+
439
+ # 检查标题
440
+ title = soup.find('title')
441
+ if not title or not title.get_text().strip():
442
+ accessibility_report['issues'].append("缺少页面标题")
443
+ accessibility_report['recommendations'].append("添加有意义的页面标题")
444
+
445
+ # 检查图片alt属性
446
+ images = soup.find_all('img')
447
+ images_without_alt = [img for img in images if not img.get('alt')]
448
+ if images_without_alt:
449
+ accessibility_report['issues'].append(f"发现 {len(images_without_alt)} 张图片缺少alt属性")
450
+ accessibility_report['recommendations'].append("为所有图片添加alt属性")
451
+
452
+ # 检查链接文本
453
+ links = soup.find_all('a', href=True)
454
+ empty_links = [link for link in links if not link.get_text().strip()]
455
+ if empty_links:
456
+ accessibility_report['issues'].append(f"发现 {len(empty_links)} 个空链接")
457
+ accessibility_report['recommendations'].append("为所有链接添加描述性文本")
458
+
459
+ # 检查表单标签
460
+ forms = soup.find_all('form')
461
+ for form in forms:
462
+ inputs = form.find_all('input')
463
+ for input_field in inputs:
464
+ if input_field.get('type') in ['text', 'email', 'password']:
465
+ if not input_field.get('id') or not soup.find('label', {'for': input_field.get('id')}):
466
+ accessibility_report['issues'].append("表单输入字段缺少标签")
467
+ accessibility_report['recommendations'].append("为表单字段添加label标签")
468
+ break
469
+
470
+ # 检查颜色对比度(简化版)
471
+ style_tags = soup.find_all('style')
472
+ if not style_tags:
473
+ accessibility_report['recommendations'].append("考虑添加CSS样式以提高可读性")
474
+
475
+ return accessibility_report
476
+
477
+ except Exception as e:
478
+ return {"error": f"可访问性检查失败: {str(e)}"}
479
+
480
+ class PDFTools:
481
+ """PDF处理工具类"""
482
+
483
+ @staticmethod
484
+ @tool
485
+ def download_pdf_from_url(url: str) -> str:
486
+ """从URL下载PDF文件"""
487
+ try:
488
+ if not PDF_AVAILABLE:
489
+ return "PDF处理功能未安装,请运行: pip install PyPDF2 PyMuPDF pdf2image"
490
+
491
+ # 创建临时文件
492
+ temp_path = tempfile.mktemp(suffix='.pdf')
493
+
494
+ # 下载PDF文件
495
+ response = requests.get(url, stream=True, timeout=30)
496
+ response.raise_for_status()
497
+
498
+ with open(temp_path, 'wb') as f:
499
+ for chunk in response.iter_content(chunk_size=8192):
500
+ f.write(chunk)
501
+
502
+ return temp_path
503
+
504
+ except Exception as e:
505
+ return f"PDF下载失败: {str(e)}"
506
+
507
+ @staticmethod
508
+ @tool
509
+ def extract_text_from_pdf(pdf_path: str) -> str:
510
+ """从PDF中提取文本"""
511
+ try:
512
+ if not PDF_AVAILABLE:
513
+ return "PDF处理功能未安装,请运行: pip install PyPDF2 PyMuPDF pdf2image"
514
+
515
+ # 使用PyMuPDF提取文本
516
+ doc = fitz.open(pdf_path)
517
+ text = ""
518
+
519
+ for page_num in range(len(doc)):
520
+ page = doc.load_page(page_num)
521
+ text += page.get_text()
522
+
523
+ doc.close()
524
+
525
+ return text if text.strip() else "PDF中没有找到文本内容"
526
+
527
+ except Exception as e:
528
+ return f"PDF文本提取失败: {str(e)}"
529
+
530
+ @staticmethod
531
+ @tool
532
+ def extract_images_from_pdf(pdf_path: str) -> List[str]:
533
+ """从PDF中提取图像"""
534
+ try:
535
+ if not PDF_AVAILABLE:
536
+ return ["PDF处理功能未安装,请运行: pip install PyPDF2 PyMuPDF pdf2image"]
537
+
538
+ # 使用pdf2image转换PDF页面为图像
539
+ images = convert_from_path(pdf_path, dpi=200)
540
+ image_paths = []
541
+
542
+ for i, image in enumerate(images):
543
+ temp_path = tempfile.mktemp(suffix=f'_page_{i+1}.jpg')
544
+ image.save(temp_path, 'JPEG')
545
+ image_paths.append(temp_path)
546
+
547
+ return image_paths
548
+
549
+ except Exception as e:
550
+ return [f"PDF图像提取失败: {str(e)}"]
551
+
552
+ @staticmethod
553
+ @tool
554
+ def analyze_pdf_structure(pdf_path: str) -> Dict[str, Any]:
555
+ """分析PDF结构"""
556
+ try:
557
+ if not PDF_AVAILABLE:
558
+ return {"error": "PDF处理功能未安装,请运行: pip install PyPDF2 PyMuPDF pdf2image"}
559
+
560
+ # 使用PyPDF2分析PDF结构
561
+ with open(pdf_path, 'rb') as file:
562
+ pdf_reader = PyPDF2.PdfReader(file)
563
+
564
+ # 获取基本信息
565
+ info = {
566
+ "page_count": len(pdf_reader.pages),
567
+ "title": pdf_reader.metadata.get('/Title', 'Unknown'),
568
+ "author": pdf_reader.metadata.get('/Author', 'Unknown'),
569
+ "subject": pdf_reader.metadata.get('/Subject', 'Unknown'),
570
+ "creator": pdf_reader.metadata.get('/Creator', 'Unknown'),
571
+ "producer": pdf_reader.metadata.get('/Producer', 'Unknown'),
572
+ "creation_date": pdf_reader.metadata.get('/CreationDate', 'Unknown'),
573
+ "modification_date": pdf_reader.metadata.get('/ModDate', 'Unknown')
574
+ }
575
+
576
+ # 分析每页内容
577
+ pages_info = []
578
+ for i, page in enumerate(pdf_reader.pages):
579
+ page_text = page.extract_text()
580
+ pages_info.append({
581
+ "page_number": i + 1,
582
+ "text_length": len(page_text),
583
+ "has_text": bool(page_text.strip()),
584
+ "rotation": page.get('/Rotate', 0)
585
+ })
586
+
587
+ info["pages_info"] = pages_info
588
+ return info
589
+
590
+ except Exception as e:
591
+ return {"error": f"PDF结构分析失败: {str(e)}"}
592
+
593
+ @staticmethod
594
+ @tool
595
+ def search_text_in_pdf(pdf_path: str, search_term: str) -> List[Dict[str, Any]]:
596
+ """在PDF中搜索文本"""
597
+ try:
598
+ if not PDF_AVAILABLE:
599
+ return [{"error": "PDF处理功能未安装,请运行: pip install PyPDF2 PyMuPDF pdf2image"}]
600
+
601
+ # 使用PyMuPDF搜索文本
602
+ doc = fitz.open(pdf_path)
603
+ search_results = []
604
+
605
+ for page_num in range(len(doc)):
606
+ page = doc.load_page(page_num)
607
+ text_instances = page.search_for(search_term)
608
+
609
+ for inst in text_instances:
610
+ search_results.append({
611
+ "page_number": page_num + 1,
612
+ "text": search_term,
613
+ "bbox": inst,
614
+ "context": page.get_text("text", clip=inst)
615
+ })
616
+
617
+ doc.close()
618
+ return search_results
619
+
620
+ except Exception as e:
621
+ return [{"error": f"PDF文本搜索失败: {str(e)}"}]
622
+
623
+ @staticmethod
624
+ @tool
625
+ def summarize_pdf_content(pdf_path: str) -> str:
626
+ """总结PDF内容"""
627
+ try:
628
+ if not PDF_AVAILABLE:
629
+ return "PDF处理功能未安装,请运行: pip install PyPDF2 PyMuPDF pdf2image"
630
+
631
+ # 提取文本
632
+ doc = fitz.open(pdf_path)
633
+ text = ""
634
+
635
+ for page_num in range(len(doc)):
636
+ page = doc.load_page(page_num)
637
+ text += page.get_text()
638
+
639
+ doc.close()
640
+
641
+ if not text.strip():
642
+ return "PDF中没有找到文本内容"
643
+
644
+ # 使用LLM总结内容
645
+ from langchain_openai import ChatOpenAI
646
+ from langchain_core.messages import HumanMessage
647
+
648
+ llm = ChatOpenAI(
649
+ model=Config.OPENAI_MODEL,
650
+ temperature=0.3,
651
+ api_key=Config.OPENAI_API_KEY
652
+ )
653
+
654
+ # 如果文本太长,分段处理
655
+ if len(text) > 4000:
656
+ text = text[:4000] + "..."
657
+
658
+ prompt = f"""
659
+ 请总结以下PDF文档的主要内容:
660
+
661
+ {text}
662
+
663
+ 请提供:
664
+ 1. 文档的主要主题
665
+ 2. 关键要点
666
+ 3. 重要信息摘要
667
+ 4. 文档类型和用途
668
+ """
669
+
670
+ response = llm.invoke([HumanMessage(content=prompt)])
671
+ return response.content
672
+
673
+ except Exception as e:
674
+ return f"PDF内容总结失败: {str(e)}"
675
+
676
+ class MediaTools:
677
+ """媒体处理工具类"""
678
+
679
+ @staticmethod
680
+ @tool
681
+ def extract_text_from_image(image_path: str) -> str:
682
+ """从图像中提取文本"""
683
+ try:
684
+ # 使用OCR模型提取文本
685
+ ocr_pipeline = pipeline(
686
+ "image-to-text",
687
+ model="microsoft/trocr-base-handwritten",
688
+ device=0 if torch.cuda.is_available() else -1
689
+ )
690
+
691
+ image = Image.open(image_path)
692
+ result = ocr_pipeline(image)
693
+ return result[0]['generated_text']
694
+ except Exception as e:
695
+ return f"文本提取失败: {str(e)}"
696
+
697
+ @staticmethod
698
+ @tool
699
+ def analyze_image_emotion(image_path: str) -> Dict[str, Any]:
700
+ """分析图像中的情感"""
701
+ try:
702
+ # 使用情感分析模型
703
+ emotion_pipeline = pipeline(
704
+ "image-classification",
705
+ model="microsoft/DialoGPT-medium",
706
+ device=0 if torch.cuda.is_available() else -1
707
+ )
708
+
709
+ image = Image.open(image_path)
710
+ result = emotion_pipeline(image)
711
+ return {
712
+ "emotions": result[:3], # 返回前3个最可能的情感
713
+ "confidence": result[0]['score'] if result else 0.0
714
+ }
715
+ except Exception as e:
716
+ return {"error": f"情感分析失败: {str(e)}"}
717
+
718
+ @staticmethod
719
+ @tool
720
+ def extract_video_audio(video_path: str) -> str:
721
+ """从视频中提取音频信息"""
722
+ try:
723
+ # 简化版本:返回提示信息
724
+ return "视频音频分析功能需要安装moviepy包"
725
+ except Exception as e:
726
+ return f"音频提取失败: {str(e)}"
727
+
728
+ @staticmethod
729
+ @tool
730
+ def analyze_video_content(video_path: str) -> Dict[str, Any]:
731
+ """分析视频内容"""
732
+ try:
733
+ # 使用OpenCV分析视频
734
+ cap = cv2.VideoCapture(video_path)
735
+ if not cap.isOpened():
736
+ return {"error": "无法打开视频文件"}
737
+
738
+ # 获取视频基本信息
739
+ fps = cap.get(cv2.CAP_PROP_FPS)
740
+ frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
741
+ duration = frame_count / fps if fps > 0 else 0
742
+ width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
743
+ height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
744
+
745
+ # 分析前几帧
746
+ frames_analyzed = []
747
+ frame_interval = max(1, frame_count // 10) # 分析10帧
748
+
749
+ for i in range(0, min(frame_count, 10)):
750
+ cap.set(cv2.CAP_PROP_POS_FRAMES, i * frame_interval)
751
+ ret, frame = cap.read()
752
+ if ret:
753
+ # 转换为PIL图像进行分析
754
+ frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
755
+ pil_image = Image.fromarray(frame_rgb)
756
+
757
+ # 使用图像描述模型
758
+ caption_pipeline = pipeline(
759
+ "image-to-text",
760
+ model="Salesforce/blip-image-captioning-base",
761
+ device=0 if torch.cuda.is_available() else -1
762
+ )
763
+
764
+ caption_result = caption_pipeline(pil_image)
765
+ frames_analyzed.append({
766
+ "frame_number": i * frame_interval,
767
+ "caption": caption_result[0]['generated_text']
768
+ })
769
+
770
+ cap.release()
771
+
772
+ return {
773
+ "video_info": {
774
+ "duration": duration,
775
+ "fps": fps,
776
+ "frame_count": frame_count,
777
+ "resolution": f"{width}x{height}"
778
+ },
779
+ "frames_analyzed": frames_analyzed,
780
+ "analysis_method": "OpenCV + BLIP"
781
+ }
782
+
783
+ except Exception as e:
784
+ return {"error": f"视频分析失败: {str(e)}"}
785
+
786
+ class CodeAnalysisTools:
787
+ """代码分析工具类"""
788
+
789
+ @staticmethod
790
+ @tool
791
+ def analyze_python_code(code: str) -> Dict[str, Any]:
792
+ """分析Python代码"""
793
+ try:
794
+ # 语法检查
795
+ try:
796
+ ast.parse(code)
797
+ syntax_valid = True
798
+ syntax_error = None
799
+ except SyntaxError as e:
800
+ syntax_valid = False
801
+ syntax_error = str(e)
802
+
803
+ # 代码复杂度分析
804
+ tree = ast.parse(code) if syntax_valid else None
805
+ if tree:
806
+ functions = [node for node in ast.walk(tree) if isinstance(node, ast.FunctionDef)]
807
+ classes = [node for node in ast.walk(tree) if isinstance(node, ast.ClassDef)]
808
+ imports = [node for node in ast.walk(tree) if isinstance(node, (ast.Import, ast.ImportFrom))]
809
+
810
+ # 计算圈复杂度(简化版)
811
+ complexity = 0
812
+ for node in ast.walk(tree):
813
+ if isinstance(node, (ast.If, ast.While, ast.For, ast.ExceptHandler)):
814
+ complexity += 1
815
+
816
+ analysis = {
817
+ "syntax_valid": syntax_valid,
818
+ "syntax_error": syntax_error,
819
+ "function_count": len(functions),
820
+ "class_count": len(classes),
821
+ "import_count": len(imports),
822
+ "complexity": complexity,
823
+ "functions": [f.name for f in functions],
824
+ "classes": [c.name for c in classes]
825
+ }
826
+ else:
827
+ analysis = {
828
+ "syntax_valid": syntax_valid,
829
+ "syntax_error": syntax_error
830
+ }
831
+
832
+ return analysis
833
+
834
+ except Exception as e:
835
+ return {"error": f"代码分析失败: {str(e)}"}
836
+
837
+ @staticmethod
838
+ @tool
839
+ def execute_python_code(code: str) -> str:
840
+ """执行Python代码"""
841
+ try:
842
+ # 创建临时文件
843
+ with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
844
+ f.write(code)
845
+ temp_file = f.name
846
+
847
+ # 执行代码
848
+ result = subprocess.run(
849
+ [sys.executable, temp_file],
850
+ capture_output=True,
851
+ text=True,
852
+ timeout=30 # 30秒超时
853
+ )
854
+
855
+ # 清理临时文件
856
+ os.unlink(temp_file)
857
+
858
+ if result.returncode == 0:
859
+ return f"执行成功:\n{result.stdout}"
860
+ else:
861
+ return f"执行失败:\n{result.stderr}"
862
+
863
+ except subprocess.TimeoutExpired:
864
+ return "代码执行超时"
865
+ except Exception as e:
866
+ return f"代码执行失败: {str(e)}"
867
+
868
+ @staticmethod
869
+ @tool
870
+ def explain_code(code: str) -> str:
871
+ """解释代码功能"""
872
+ try:
873
+ # 使用LLM解释代码
874
+ from langchain_openai import ChatOpenAI
875
+ from langchain_core.messages import HumanMessage
876
+
877
+ llm = ChatOpenAI(
878
+ model=Config.OPENAI_MODEL,
879
+ temperature=0.3,
880
+ api_key=Config.OPENAI_API_KEY
881
+ )
882
+
883
+ prompt = f"""
884
+ 请分析以下Python代码的功能和作用:
885
+
886
+ ```python
887
+ {code}
888
+ ```
889
+
890
+ 请提供:
891
+ 1. 代码的主要功能
892
+ 2. 关键部分的解释
893
+ 3. 可能的改进建议
894
+ """
895
+
896
+ response = llm.invoke([HumanMessage(content=prompt)])
897
+ return response.content
898
+
899
+ except Exception as e:
900
+ return f"代码解释失败: {str(e)}"
901
+
902
+ class SearchTools:
903
+ """搜索工具类"""
904
+
905
+ def __init__(self):
906
+ # 使用DuckDuckGo搜索,无需API密钥
907
+ self.search_tool = DuckDuckGoSearchRun()
908
+ print("✅ DuckDuckGo搜索引擎已初始化")
909
+
910
+ @tool
911
+ def web_search(self, query: str) -> str:
912
+ """执行网络搜索"""
913
+ try:
914
+ print(f"🔍 搜索查询: {query}")
915
+ results = self.search_tool.run(query)
916
+ return results if isinstance(results, str) else str(results)
917
+ except Exception as e:
918
+ print(f"❌ 搜索失败: {str(e)}")
919
+ return f"搜索失败: {str(e)}"
920
+
921
+ @tool
922
+ def search_images(self, query: str) -> List[str]:
923
+ """搜索相关图像"""
924
+ try:
925
+ search_query = f"{query} images"
926
+ print(f"🖼️ 图像搜索查询: {search_query}")
927
+ results = self.search_tool.run(search_query)
928
+ # 简单返回搜索结果,实际应用中需要解析图像URL
929
+ return [results] if isinstance(results, str) else results
930
+ except Exception as e:
931
+ print(f"❌ 图像搜索失败: {str(e)}")
932
+ return [f"图像搜索失败: {str(e)}"]
933
+
934
+ @tool
935
+ def search_videos(self, query: str) -> List[str]:
936
+ """搜索相关视频"""
937
+ try:
938
+ search_query = f"{query} videos"
939
+ print(f"🎥 视频搜索查询: {search_query}")
940
+ results = self.search_tool.run(search_query)
941
+ return [results] if isinstance(results, str) else results
942
+ except Exception as e:
943
+ print(f"❌ 视频搜索失败: {str(e)}")
944
+ return [f"视频搜索失败: {str(e)}"]
945
+
946
+ @tool
947
+ def search_pdfs(self, query: str) -> List[str]:
948
+ """搜索PDF文档"""
949
+ try:
950
+ search_query = f"{query} filetype:pdf"
951
+ print(f"📄 PDF搜索查询: {search_query}")
952
+ results = self.search_tool.run(search_query)
953
+ return [results] if isinstance(results, str) else results
954
+ except Exception as e:
955
+ print(f"❌ PDF搜索失败: {str(e)}")
956
+ return [f"PDF搜索失败: {str(e)}"]
957
+
958
+ class AnalysisTools:
959
+ """分析工具类"""
960
+
961
+ @staticmethod
962
+ @tool
963
+ def analyze_text_sentiment(text: str) -> Dict[str, Any]:
964
+ """分析文本情感"""
965
+ try:
966
+ # 使用情感分析模型
967
+ sentiment_pipeline = pipeline(
968
+ "sentiment-analysis",
969
+ model="cardiffnlp/twitter-roberta-base-sentiment-latest",
970
+ device=0 if torch.cuda.is_available() else -1
971
+ )
972
+
973
+ result = sentiment_pipeline(text)
974
+ return {
975
+ "sentiment": result[0]['label'],
976
+ "confidence": result[0]['score'],
977
+ "text": text
978
+ }
979
+ except Exception as e:
980
+ return {"error": f"情感分析失败: {str(e)}"}
981
+
982
+ @staticmethod
983
+ @tool
984
+ def extract_keywords(text: str) -> List[str]:
985
+ """提取关键词"""
986
+ try:
987
+ # 使用关键词提取模型
988
+ keyword_pipeline = pipeline(
989
+ "token-classification",
990
+ model="dbmdz/bert-large-cased-finetuned-conll03-english",
991
+ device=0 if torch.cuda.is_available() else -1
992
+ )
993
+
994
+ result = keyword_pipeline(text)
995
+ keywords = []
996
+ for item in result:
997
+ if item['entity'] in ['B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC']:
998
+ keywords.append(item['word'])
999
+
1000
+ return list(set(keywords)) if keywords else ["无关键词"]
1001
+ except Exception as e:
1002
+ return [f"关键词提取失败: {str(e)}"]
1003
+
1004
+ @staticmethod
1005
+ @tool
1006
+ def summarize_text(text: str, max_length: int = 150) -> str:
1007
+ """文本摘要"""
1008
+ try:
1009
+ # 使用摘要模型
1010
+ summarizer = pipeline(
1011
+ "summarization",
1012
+ model="facebook/bart-large-cnn",
1013
+ device=0 if torch.cuda.is_available() else -1
1014
+ )
1015
+
1016
+ # 如果文本太长,分段处理
1017
+ if len(text) > 1000:
1018
+ chunks = [text[i:i+1000] for i in range(0, len(text), 1000)]
1019
+ summaries = []
1020
+ for chunk in chunks[:3]: # 只处理前3段
1021
+ result = summarizer(chunk, max_length=max_length//3, min_length=30, do_sample=False)
1022
+ summaries.append(result[0]['summary_text'])
1023
+ return " ".join(summaries)
1024
+ else:
1025
+ result = summarizer(text, max_length=max_length, min_length=30, do_sample=False)
1026
+ return result[0]['summary_text']
1027
+ except Exception as e:
1028
+ return f"摘要生成失败: {str(e)}"
1029
+
1030
+ class UtilityTools:
1031
+ """实用工具类"""
1032
+
1033
+ @staticmethod
1034
+ @tool
1035
+ def get_current_weather(location: str) -> str:
1036
+ """获取当前天气"""
1037
+ try:
1038
+ # 这里可以集成天气API
1039
+ return f"天气查询功能需要配置天气API密钥,查询位置: {location}"
1040
+ except Exception as e:
1041
+ return f"天气查询失败: {str(e)}"
1042
+
1043
+ @staticmethod
1044
+ @tool
1045
+ def translate_text(text: str, target_language: str = "中文") -> str:
1046
+ """翻译文本"""
1047
+ try:
1048
+ # 使用翻译模型
1049
+ translator = pipeline(
1050
+ "translation",
1051
+ model="Helsinki-NLP/opus-mt-en-zh" if target_language == "中文" else "Helsinki-NLP/opus-mt-en-fr",
1052
+ device=0 if torch.cuda.is_available() else -1
1053
+ )
1054
+
1055
+ result = translator(text)
1056
+ return result[0]['translation_text']
1057
+ except Exception as e:
1058
+ return f"翻译失败: {str(e)}"
1059
+
1060
+ @staticmethod
1061
+ @tool
1062
+ def calculate_math_expression(expression: str) -> str:
1063
+ """计算数学表达式"""
1064
+ try:
1065
+ # 安全地计算数学表达式
1066
+ allowed_names = {
1067
+ k: v for k, v in __builtins__.items()
1068
+ if k in ['abs', 'round', 'min', 'max', 'sum', 'pow']
1069
+ }
1070
+ allowed_names.update({
1071
+ 'sin': lambda x: np.sin(x),
1072
+ 'cos': lambda x: np.cos(x),
1073
+ 'tan': lambda x: np.tan(x),
1074
+ 'sqrt': lambda x: np.sqrt(x),
1075
+ 'log': lambda x: np.log(x),
1076
+ 'pi': np.pi,
1077
+ 'e': np.e
1078
+ })
1079
+
1080
+ result = eval(expression, {"__builtins__": {}}, allowed_names)
1081
+ return str(result)
1082
+ except Exception as e:
1083
+ return f"计算失败: {str(e)}"
1084
+
1085
+ class WikipediaTools:
1086
+ """Wikipedia处理工具类"""
1087
+
1088
+ @staticmethod
1089
+ @tool
1090
+ def search_wikipedia(query: str, max_results: int = 5) -> List[Dict[str, Any]]:
1091
+ """搜索Wikipedia页面"""
1092
+ try:
1093
+ if not WIKIPEDIA_AVAILABLE:
1094
+ return [{"error": "Wikipedia处理功能未安装,请运行: pip install wikipedia-api requests beautifulsoup4"}]
1095
+
1096
+ # 创建Wikipedia API实例
1097
+ wiki = wikipediaapi.Wikipedia(
1098
+ language='zh',
1099
+ user_agent='MultiModalAgent/1.0 (https://github.com/your-repo; your-email@example.com)'
1100
+ )
1101
+
1102
+ # 搜索Wikipedia页面
1103
+ search_results = wiki.search(query, results=max_results)
1104
+
1105
+ results = []
1106
+ for title in search_results:
1107
+ try:
1108
+ # 获取页面
1109
+ page = wiki.page(title)
1110
+ if page.exists():
1111
+ results.append({
1112
+ 'title': page.title,
1113
+ 'url': page.fullurl,
1114
+ 'summary': page.summary[:300] + "..." if len(page.summary) > 300 else page.summary,
1115
+ 'page_id': page.pageid
1116
+ })
1117
+ else:
1118
+ results.append({
1119
+ 'title': title,
1120
+ 'url': f"https://zh.wikipedia.org/wiki/{title.replace(' ', '_')}",
1121
+ 'summary': "页面不存在",
1122
+ 'page_id': None
1123
+ })
1124
+ except Exception as e:
1125
+ # 如果获取页面失败,只返回标题
1126
+ results.append({
1127
+ 'title': title,
1128
+ 'url': f"https://zh.wikipedia.org/wiki/{title.replace(' ', '_')}",
1129
+ 'summary': f"无法获取摘要: {str(e)}",
1130
+ 'page_id': None
1131
+ })
1132
+
1133
+ return results
1134
+
1135
+ except Exception as e:
1136
+ return [{"error": f"Wikipedia搜索失败: {str(e)}"}]
1137
+
1138
+ @staticmethod
1139
+ @tool
1140
+ def get_wikipedia_page(title: str) -> Dict[str, Any]:
1141
+ """获取Wikipedia页面内容"""
1142
+ try:
1143
+ if not WIKIPEDIA_AVAILABLE:
1144
+ return {"error": "Wikipedia处理功能未安装,请运行: pip install wikipedia-api requests beautifulsoup4"}
1145
+
1146
+ # 创建Wikipedia API实例
1147
+ wiki = wikipediaapi.Wikipedia(
1148
+ language='zh',
1149
+ user_agent='MultiModalAgent/1.0 (https://github.com/your-repo; your-email@example.com)'
1150
+ )
1151
+
1152
+ # 获取页面
1153
+ page = wiki.page(title)
1154
+
1155
+ if not page.exists():
1156
+ return {"error": f"Wikipedia页面 '{title}' 不存在"}
1157
+
1158
+ # 获取页面信息
1159
+ page_info = {
1160
+ 'title': page.title,
1161
+ 'url': page.fullurl,
1162
+ 'summary': page.summary,
1163
+ 'content': page.text[:5000] + "..." if len(page.text) > 5000 else page.text, # 限制内容长度
1164
+ 'page_id': page.pageid,
1165
+ 'categories': list(page.categories.keys())[:10], # 限制分类数量
1166
+ 'links': list(page.links.keys())[:20], # 限制链接数量
1167
+ 'content_length': len(page.text)
1168
+ }
1169
+
1170
+ return page_info
1171
+
1172
+ except Exception as e:
1173
+ return {"error": f"Wikipedia页面获取失败: {str(e)}"}
1174
+
1175
+ @staticmethod
1176
+ @tool
1177
+ def get_wikipedia_summary(title: str) -> str:
1178
+ """获取Wikipedia页面摘要"""
1179
+ try:
1180
+ if not WIKIPEDIA_AVAILABLE:
1181
+ return "Wikipedia处理功能未安装,请运行: pip install wikipedia-api requests beautifulsoup4"
1182
+
1183
+ # 设置语言为中文
1184
+ wikipedia.set_lang("zh")
1185
+
1186
+ # 获取页面摘要
1187
+ summary = wikipedia.summary(title, sentences=5, auto_suggest=False)
1188
+
1189
+ return summary
1190
+
1191
+ except Exception as e:
1192
+ return f"Wikipedia摘要获取失败: {str(e)}"
1193
+
1194
+ @staticmethod
1195
+ @tool
1196
+ def get_wikipedia_random_page() -> Dict[str, Any]:
1197
+ """获取随机Wikipedia页面"""
1198
+ try:
1199
+ if not WIKIPEDIA_AVAILABLE:
1200
+ return {"error": "Wikipedia处理功能未安装,请运行: pip install wikipedia-api requests beautifulsoup4"}
1201
+
1202
+ # 设置语言为中文
1203
+ wikipedia.set_lang("zh")
1204
+
1205
+ # 获取随机页面
1206
+ random_title = wikipedia.random(1)
1207
+ if random_title:
1208
+ return WikipediaTools.get_wikipedia_page(random_title[0])
1209
+ else:
1210
+ return {"error": "无法获取随机页面"}
1211
+
1212
+ except Exception as e:
1213
+ return {"error": f"随机Wikipedia页面获取失败: {str(e)}"}
1214
+
1215
+ @staticmethod
1216
+ @tool
1217
+ def search_wikipedia_english(query: str, max_results: int = 5) -> List[Dict[str, Any]]:
1218
+ """搜索英文Wikipedia页面"""
1219
+ try:
1220
+ if not WIKIPEDIA_AVAILABLE:
1221
+ return [{"error": "Wikipedia处理功能未安装���请运行: pip install wikipedia-api requests beautifulsoup4"}]
1222
+
1223
+ # 设置语言为英文
1224
+ wikipedia.set_lang("en")
1225
+
1226
+ # 搜索Wikipedia页面
1227
+ search_results = wikipedia.search(query, results=max_results)
1228
+
1229
+ results = []
1230
+ for title in search_results:
1231
+ try:
1232
+ # 获取页面摘要
1233
+ page = wikipedia.page(title, auto_suggest=False)
1234
+ results.append({
1235
+ 'title': title,
1236
+ 'url': page.url,
1237
+ 'summary': page.summary[:300] + "..." if len(page.summary) > 300 else page.summary,
1238
+ 'page_id': page.pageid
1239
+ })
1240
+ except Exception as e:
1241
+ # 如果获取页面失败,只返回标题
1242
+ results.append({
1243
+ 'title': title,
1244
+ 'url': f"https://en.wikipedia.org/wiki/{title.replace(' ', '_')}",
1245
+ 'summary': f"无法获取摘要: {str(e)}",
1246
+ 'page_id': None
1247
+ })
1248
+
1249
+ return results
1250
+
1251
+ except Exception as e:
1252
+ return [{"error": f"英文Wikipedia搜索失败: {str(e)}"}]
1253
+
1254
+ @staticmethod
1255
+ @tool
1256
+ def get_wikipedia_page_english(title: str) -> Dict[str, Any]:
1257
+ """获取英文Wikipedia页面内容"""
1258
+ try:
1259
+ if not WIKIPEDIA_AVAILABLE:
1260
+ return {"error": "Wikipedia处理功能未安装,请运行: pip install wikipedia-api requests beautifulsoup4"}
1261
+
1262
+ # 设置语言为英文
1263
+ wikipedia.set_lang("en")
1264
+
1265
+ # 获取页面
1266
+ page = wikipedia.page(title, auto_suggest=False)
1267
+
1268
+ # 获取页面内容
1269
+ content = page.content
1270
+
1271
+ # 获取页面信息
1272
+ page_info = {
1273
+ 'title': page.title,
1274
+ 'url': page.url,
1275
+ 'summary': page.summary,
1276
+ 'content': content[:5000] + "..." if len(content) > 5000 else content, # 限制内容长度
1277
+ 'page_id': page.pageid,
1278
+ 'categories': page.categories[:10], # 限制分类数量
1279
+ 'links': page.links[:20], # 限制链接数量
1280
+ 'references': page.references[:10] if hasattr(page, 'references') else [], # 限制引用数量
1281
+ 'images': page.images[:10] if hasattr(page, 'images') else [], # 限制图片数量
1282
+ 'content_length': len(content)
1283
+ }
1284
+
1285
+ return page_info
1286
+
1287
+ except Exception as e:
1288
+ return {"error": f"英文Wikipedia页面获取失败: {str(e)}"}
1289
+
1290
+ @staticmethod
1291
+ @tool
1292
+ def get_wikipedia_suggestions(query: str) -> List[str]:
1293
+ """获取Wikipedia搜索建议"""
1294
+ try:
1295
+ if not WIKIPEDIA_AVAILABLE:
1296
+ return ["Wikipedia处理功能未安装,请运行: pip install wikipedia-api requests beautifulsoup4"]
1297
+
1298
+ # 设置语言为中文
1299
+ wikipedia.set_lang("zh")
1300
+
1301
+ # 获取搜索建议
1302
+ suggestions = wikipedia.search(query, results=10)
1303
+
1304
+ return suggestions
1305
+
1306
+ except Exception as e:
1307
+ return [f"Wikipedia搜索建议获取失败: {str(e)}"]
1308
+
1309
+ @staticmethod
1310
+ @tool
1311
+ def get_wikipedia_categories(title: str) -> List[str]:
1312
+ """获取Wikipedia页面分类"""
1313
+ try:
1314
+ if not WIKIPEDIA_AVAILABLE:
1315
+ return ["Wikipedia处理功能未安装,请运行: pip install wikipedia-api requests beautifulsoup4"]
1316
+
1317
+ # 设置语言为中文
1318
+ wikipedia.set_lang("zh")
1319
+
1320
+ # 获取页面
1321
+ page = wikipedia.page(title, auto_suggest=False)
1322
+
1323
+ # 获取分类
1324
+ categories = page.categories
1325
+
1326
+ return categories[:20] # 限制分类数量
1327
+
1328
+ except Exception as e:
1329
+ return [f"Wikipedia分类获取失败: {str(e)}"]
1330
+
1331
+ @staticmethod
1332
+ @tool
1333
+ def get_wikipedia_links(title: str) -> List[str]:
1334
+ """获取Wikipedia页面链接"""
1335
+ try:
1336
+ if not WIKIPEDIA_AVAILABLE:
1337
+ return ["Wikipedia处理功能未安装,请运行: pip install wikipedia-api requests beautifulsoup4"]
1338
+
1339
+ # 设置语言为中文
1340
+ wikipedia.set_lang("zh")
1341
+
1342
+ # 获取页面
1343
+ page = wikipedia.page(title, auto_suggest=False)
1344
+
1345
+ # 获取链接
1346
+ links = page.links
1347
+
1348
+ return links[:30] # 限制链接数量
1349
+
1350
+ except Exception as e:
1351
+ return [f"Wikipedia链接获取失败: {str(e)}"]
1352
+
1353
+ @staticmethod
1354
+ @tool
1355
+ def get_wikipedia_geosearch(latitude: float, longitude: float, radius: int = 1000) -> List[Dict[str, Any]]:
1356
+ """根据地理坐标搜索附近的Wikipedia页面"""
1357
+ try:
1358
+ if not WIKIPEDIA_AVAILABLE:
1359
+ return [{"error": "Wikipedia处理功能未安装,请运行: pip install wikipedia-api requests beautifulsoup4"}]
1360
+
1361
+ # 设置语言为中文
1362
+ wikipedia.set_lang("zh")
1363
+
1364
+ # 地理搜索
1365
+ nearby_pages = wikipedia.geosearch(latitude, longitude, radius=radius)
1366
+
1367
+ results = []
1368
+ for page in nearby_pages:
1369
+ try:
1370
+ results.append({
1371
+ 'title': page.title,
1372
+ 'url': page.url,
1373
+ 'summary': page.summary[:200] + "..." if len(page.summary) > 200 else page.summary,
1374
+ 'distance': page.distance if hasattr(page, 'distance') else None,
1375
+ 'coordinates': page.coordinates if hasattr(page, 'coordinates') else None
1376
+ })
1377
+ except Exception as e:
1378
+ results.append({
1379
+ 'title': page.title,
1380
+ 'url': page.url,
1381
+ 'summary': f"无法获取摘要: {str(e)}",
1382
+ 'distance': None,
1383
+ 'coordinates': None
1384
+ })
1385
+
1386
+ return results
1387
+
1388
+ except Exception as e:
1389
+ return [{"error": f"Wikipedia地理搜索失败: {str(e)}"}]
1390
+
1391
+ class YouTubeTools:
1392
+ """YouTube视频处理工具类"""
1393
+
1394
+ @staticmethod
1395
+ @tool
1396
+ def download_youtube_video(url: str) -> str:
1397
+ """下载YouTube视频"""
1398
+ try:
1399
+ if not YOUTUBE_AVAILABLE:
1400
+ return "YouTube处理功能未安装,请运行: pip install pytube"
1401
+
1402
+ if not YT_DLP_AVAILABLE:
1403
+ return "YouTube视频下载需要安装yt-dlp,请运行: pip install yt-dlp"
1404
+
1405
+ # 使用yt-dlp下载视频(更稳定)
1406
+ ydl_opts = {
1407
+ 'format': 'best[height<=720]', # 限制分辨率
1408
+ 'outtmpl': '%(title)s.%(ext)s',
1409
+ 'quiet': True,
1410
+ 'no_warnings': True
1411
+ }
1412
+
1413
+ with yt_dlp.YoutubeDL(ydl_opts) as ydl:
1414
+ info = ydl.extract_info(url, download=True)
1415
+ video_path = ydl.prepare_filename(info)
1416
+
1417
+ return video_path
1418
+
1419
+ except Exception as e:
1420
+ return f"YouTube视频下载失败: {str(e)}"
1421
+
1422
+ @staticmethod
1423
+ @tool
1424
+ def get_youtube_info(url: str) -> Dict[str, Any]:
1425
+ """获取YouTube视频信息"""
1426
+ try:
1427
+ # 提取视频ID
1428
+ import re
1429
+ video_id_match = re.search(r'(?:youtube\.com\/watch\?v=|youtu\.be\/)([^&\n?#]+)', url)
1430
+ if not video_id_match:
1431
+ return {"error": "无效的YouTube URL"}
1432
+
1433
+ video_id = video_id_match.group(1)
1434
+
1435
+ # 首先尝试使用yt-dlp(更稳定)
1436
+ if YT_DLP_AVAILABLE:
1437
+ try:
1438
+ import yt_dlp
1439
+ ydl_opts = {
1440
+ 'quiet': True,
1441
+ 'no_warnings': True,
1442
+ 'extract_flat': True
1443
+ }
1444
+
1445
+ with yt_dlp.YoutubeDL(ydl_opts) as ydl:
1446
+ info = ydl.extract_info(url, download=False)
1447
+
1448
+ video_info = {
1449
+ 'title': info.get('title', f'YouTube视频 {video_id}'),
1450
+ 'author': info.get('uploader', 'Unknown'),
1451
+ 'length': info.get('duration', 0),
1452
+ 'views': info.get('view_count', 0),
1453
+ 'description': info.get('description', '')[:500] + "..." if len(info.get('description', '')) > 500 else info.get('description', ''),
1454
+ 'publish_date': str(info.get('upload_date', 'Unknown')),
1455
+ 'rating': info.get('average_rating', 0),
1456
+ 'keywords': info.get('tags', []),
1457
+ 'thumbnail_url': info.get('thumbnail', f"https://img.youtube.com/vi/{video_id}/maxresdefault.jpg"),
1458
+ 'video_id': video_id,
1459
+ 'url': url,
1460
+ 'method': 'yt-dlp'
1461
+ }
1462
+
1463
+ return video_info
1464
+
1465
+ except Exception as e:
1466
+ print(f"yt-dlp获取失败: {e}")
1467
+
1468
+ # 如果yt-dlp失败,尝试使用pytube
1469
+ if YOUTUBE_AVAILABLE:
1470
+ try:
1471
+ from pytube import YouTube
1472
+ yt = YouTube(url)
1473
+
1474
+ # 获取视频信息
1475
+ video_info = {
1476
+ 'title': yt.title,
1477
+ 'author': yt.author,
1478
+ 'length': yt.length, # 秒
1479
+ 'views': yt.views,
1480
+ 'description': yt.description[:500] + "..." if len(yt.description) > 500 else yt.description,
1481
+ 'publish_date': str(yt.publish_date) if yt.publish_date else "Unknown",
1482
+ 'rating': yt.rating,
1483
+ 'keywords': yt.keywords,
1484
+ 'thumbnail_url': yt.thumbnail_url,
1485
+ 'video_id': video_id,
1486
+ 'url': url,
1487
+ 'method': 'pytube'
1488
+ }
1489
+
1490
+ return video_info
1491
+
1492
+ except Exception as e:
1493
+ print(f"pytube获取失败: {e}")
1494
+
1495
+ # 如果都失败了,返回基本信息
1496
+ return {
1497
+ 'title': f"YouTube视频 {video_id}",
1498
+ 'author': "Unknown",
1499
+ 'length': 0,
1500
+ 'views': 0,
1501
+ 'description': "无法获取详细信息,可能需要更新YouTube处理库",
1502
+ 'publish_date': "Unknown",
1503
+ 'rating': 0,
1504
+ 'keywords': [],
1505
+ 'thumbnail_url': f"https://img.youtube.com/vi/{video_id}/maxresdefault.jpg",
1506
+ 'video_id': video_id,
1507
+ 'url': url,
1508
+ 'note': "所有YouTube处理库都失败,建议更新pytube或安装yt-dlp"
1509
+ }
1510
+
1511
+ except Exception as e:
1512
+ return {"error": f"YouTube信息获取失败: {str(e)}"}
1513
+
1514
+ @staticmethod
1515
+ @tool
1516
+ def extract_youtube_audio(url: str) -> str:
1517
+ """提取YouTube视频音频"""
1518
+ try:
1519
+ if not YOUTUBE_AVAILABLE:
1520
+ return "YouTube处理功能未安装,请运行: pip install pytube"
1521
+
1522
+ if not YT_DLP_AVAILABLE:
1523
+ return "YouTube音频提取需要安装yt-dlp,请运行: pip install yt-dlp"
1524
+
1525
+ # 使用yt-dlp提取音频
1526
+ ydl_opts = {
1527
+ 'format': 'bestaudio/best',
1528
+ 'postprocessors': [{
1529
+ 'key': 'FFmpegExtractAudio',
1530
+ 'preferredcodec': 'mp3',
1531
+ 'preferredquality': '192',
1532
+ }],
1533
+ 'outtmpl': '%(title)s.%(ext)s',
1534
+ 'quiet': True,
1535
+ 'no_warnings': True
1536
+ }
1537
+
1538
+ with yt_dlp.YoutubeDL(ydl_opts) as ydl:
1539
+ info = ydl.extract_info(url, download=True)
1540
+ audio_path = ydl.prepare_filename(info).replace('.webm', '.mp3').replace('.m4a', '.mp3')
1541
+
1542
+ return audio_path
1543
+
1544
+ except Exception as e:
1545
+ return f"YouTube音频提取失败: {str(e)}"
1546
+
1547
+ @staticmethod
1548
+ @tool
1549
+ def download_youtube_thumbnail(url: str) -> str:
1550
+ """下载YouTube视频缩略图"""
1551
+ try:
1552
+ if not YOUTUBE_AVAILABLE:
1553
+ return "YouTube处理功能未安装,请运行: pip install pytube"
1554
+
1555
+ # 提取视频ID
1556
+ import re
1557
+ video_id_match = re.search(r'(?:youtube\.com\/watch\?v=|youtu\.be\/)([^&\n?#]+)', url)
1558
+ if not video_id_match:
1559
+ return "无效的YouTube URL"
1560
+
1561
+ video_id = video_id_match.group(1)
1562
+
1563
+ # 尝试使用pytube获取缩略图URL
1564
+ try:
1565
+ yt = YouTube(url)
1566
+ thumbnail_url = yt.thumbnail_url
1567
+ except Exception as e:
1568
+ # 如果pytube失败,使用标准缩略图URL
1569
+ thumbnail_url = f"https://img.youtube.com/vi/{video_id}/maxresdefault.jpg"
1570
+
1571
+ # 下载缩略图
1572
+ import tempfile
1573
+ import urllib.request
1574
+
1575
+ temp_path = tempfile.mktemp(suffix='.jpg')
1576
+ urllib.request.urlretrieve(thumbnail_url, temp_path)
1577
+
1578
+ return temp_path
1579
+
1580
+ except Exception as e:
1581
+ return f"YouTube缩略图下载失败: {str(e)}"
1582
+
1583
+ @staticmethod
1584
+ @tool
1585
+ def search_youtube_videos(query: str, max_results: int = 5) -> List[Dict[str, Any]]:
1586
+ """搜索YouTube视频"""
1587
+ try:
1588
+ if not YOUTUBE_AVAILABLE:
1589
+ return [{"error": "YouTube处理功能未安装,请运行: pip install pytube"}]
1590
+
1591
+ # 使用DuckDuckGo搜索YouTube视频
1592
+ from duckduckgo_search import DDGS
1593
+
1594
+ try:
1595
+ with DDGS() as ddgs:
1596
+ search_results = list(ddgs.text(f"{query} site:youtube.com", max_results=max_results))
1597
+
1598
+ videos = []
1599
+ for result in search_results:
1600
+ if result and 'youtube.com/watch' in result.get('link', ''):
1601
+ videos.append({
1602
+ 'title': result.get('title', 'Unknown'),
1603
+ 'url': result.get('link', ''),
1604
+ 'duration': 0,
1605
+ 'view_count': 0,
1606
+ 'uploader': 'Unknown',
1607
+ 'thumbnail': '',
1608
+ 'description': result.get('body', '')[:200] + "..." if len(result.get('body', '')) > 200 else result.get('body', '')
1609
+ })
1610
+
1611
+ return videos
1612
+ except Exception as search_error:
1613
+ return [{"error": f"DuckDuckGo搜索失败: {str(search_error)}"}]
1614
+
1615
+ except Exception as e:
1616
+ return [{"error": f"YouTube搜索失败: {str(e)}"}]
1617
+
1618
+ @staticmethod
1619
+ @tool
1620
+ def analyze_youtube_comments(url: str, max_comments: int = 10) -> List[Dict[str, Any]]:
1621
+ """分析YouTube视频评论"""
1622
+ try:
1623
+ if not YOUTUBE_AVAILABLE:
1624
+ return [{"error": "YouTube处理功能未安装,请运行: pip install pytube yt-dlp"}]
1625
+
1626
+ # 使用yt-dlp获取评论
1627
+ ydl_opts = {
1628
+ 'quiet': True,
1629
+ 'no_warnings': True,
1630
+ 'extract_flat': False,
1631
+ 'writecomments': True
1632
+ }
1633
+
1634
+ with yt_dlp.YoutubeDL(ydl_opts) as ydl:
1635
+ info = ydl.extract_info(url, download=False)
1636
+
1637
+ comments = []
1638
+ if 'comments' in info:
1639
+ for comment in info['comments'][:max_comments]:
1640
+ comments.append({
1641
+ 'author': comment.get('author', 'Unknown'),
1642
+ 'text': comment.get('text', ''),
1643
+ 'like_count': comment.get('like_count', 0),
1644
+ 'time': comment.get('time', ''),
1645
+ 'reply_count': comment.get('reply_count', 0)
1646
+ })
1647
+
1648
+ return comments
1649
+
1650
+ except Exception as e:
1651
+ return [{"error": f"YouTube评论分析失败: {str(e)}"}]
1652
+
1653
+ @staticmethod
1654
+ @tool
1655
+ def get_youtube_playlist_info(playlist_url: str) -> Dict[str, Any]:
1656
+ """获取YouTube播放列表信息"""
1657
+ try:
1658
+ if not YOUTUBE_AVAILABLE:
1659
+ return {"error": "YouTube处理功能未安装,请运行: pip install pytube"}
1660
+
1661
+ if not YT_DLP_AVAILABLE:
1662
+ return {"error": "YouTube播放列表功能需要安装yt-dlp,请运行: pip install yt-dlp"}
1663
+
1664
+ # 使用yt-dlp获取播放列表信息
1665
+ ydl_opts = {
1666
+ 'quiet': True,
1667
+ 'no_warnings': True,
1668
+ 'extract_flat': True,
1669
+ 'playlist_items': '1-10' # 只获取前10个视频
1670
+ }
1671
+
1672
+ with yt_dlp.YoutubeDL(ydl_opts) as ydl:
1673
+ info = ydl.extract_info(playlist_url, download=False)
1674
+
1675
+ playlist_info = {
1676
+ 'title': info.get('title', 'Unknown'),
1677
+ 'description': info.get('description', '')[:500] + "..." if len(info.get('description', '')) > 500 else info.get('description', ''),
1678
+ 'video_count': info.get('playlist_count', 0),
1679
+ 'uploader': info.get('uploader', 'Unknown'),
1680
+ 'videos': []
1681
+ }
1682
+
1683
+ if 'entries' in info:
1684
+ for entry in info['entries']:
1685
+ if entry:
1686
+ playlist_info['videos'].append({
1687
+ 'title': entry.get('title', 'Unknown'),
1688
+ 'url': entry.get('url', ''),
1689
+ 'duration': entry.get('duration', 0),
1690
+ 'uploader': entry.get('uploader', 'Unknown')
1691
+ })
1692
+
1693
+ return playlist_info
1694
+
1695
+ except Exception as e:
1696
+ return {"error": f"YouTube播放列表信息获取失败: {str(e)}"}
1697
+
1698
+ @staticmethod
1699
+ @tool
1700
+ def download_youtube_video_for_watching(url: str, quality: str = "720p") -> str:
1701
+ """下载YouTube视频用于观看"""
1702
+ try:
1703
+ if not YOUTUBE_AVAILABLE:
1704
+ return "YouTube处理功能未安装,请运行: pip install pytube"
1705
+
1706
+ if not YT_DLP_AVAILABLE:
1707
+ return "YouTube视频下载需要安装yt-dlp,请运行: pip install yt-dlp"
1708
+
1709
+ # 设置下载选项
1710
+ ydl_opts = {
1711
+ 'format': f'best[height<={quality.replace("p", "")}]',
1712
+ 'outtmpl': 'downloads/%(title)s.%(ext)s',
1713
+ 'quiet': False,
1714
+ 'no_warnings': False,
1715
+ 'progress_hooks': [lambda d: print(f"下载进���: {d.get('_percent_str', '0%')}") if d['status'] == 'downloading' else None]
1716
+ }
1717
+
1718
+ # 创建下载目录
1719
+ import os
1720
+ os.makedirs('downloads', exist_ok=True)
1721
+
1722
+ with yt_dlp.YoutubeDL(ydl_opts) as ydl:
1723
+ info = ydl.extract_info(url, download=True)
1724
+ video_path = ydl.prepare_filename(info)
1725
+
1726
+ return f"视频已下载到: {video_path}"
1727
+
1728
+ except Exception as e:
1729
+ return f"YouTube视频下载失败: {str(e)}"
1730
+
1731
+ @staticmethod
1732
+ @tool
1733
+ def extract_youtube_audio_for_listening(url: str, format: str = "mp3") -> str:
1734
+ """提取YouTube视频音频用于听取"""
1735
+ try:
1736
+ if not YOUTUBE_AVAILABLE:
1737
+ return "YouTube处理功能未安装,请运行: pip install pytube"
1738
+
1739
+ if not YT_DLP_AVAILABLE:
1740
+ return "YouTube音频提取需要安装yt-dlp,请运行: pip install yt-dlp"
1741
+
1742
+ # 设置下载选项(不使用ffmpeg后处理)
1743
+ ydl_opts = {
1744
+ 'format': 'bestaudio/best',
1745
+ 'outtmpl': 'downloads/%(title)s.%(ext)s',
1746
+ 'quiet': False,
1747
+ 'no_warnings': False
1748
+ }
1749
+
1750
+ # 创建下载目录
1751
+ import os
1752
+ os.makedirs('downloads', exist_ok=True)
1753
+
1754
+ with yt_dlp.YoutubeDL(ydl_opts) as ydl:
1755
+ info = ydl.extract_info(url, download=True)
1756
+ audio_path = ydl.prepare_filename(info)
1757
+
1758
+ return f"音频已提取到: {audio_path} (原始格式,可用播放器播放)"
1759
+
1760
+ except Exception as e:
1761
+ return f"YouTube音频提取失败: {str(e)}"
1762
+
1763
+ @staticmethod
1764
+ @tool
1765
+ def transcribe_youtube_video(url: str) -> str:
1766
+ """将YouTube视频转换为文字"""
1767
+ try:
1768
+ if not YOUTUBE_AVAILABLE:
1769
+ return "YouTube处理功能未安装,请运行: pip install pytube"
1770
+
1771
+ if not YT_DLP_AVAILABLE:
1772
+ return "YouTube视频转录需要安装yt-dlp,请运行: pip install yt-dlp"
1773
+
1774
+ if not AUDIO_PROCESSING_AVAILABLE:
1775
+ return "音频转录功能需要安装SpeechRecognition和pydub,请运行: pip install SpeechRecognition pydub"
1776
+
1777
+ # 首先下载音频
1778
+ ydl_opts = {
1779
+ 'format': 'bestaudio/best',
1780
+ 'outtmpl': 'downloads/%(title)s.%(ext)s',
1781
+ 'quiet': True,
1782
+ 'no_warnings': True
1783
+ }
1784
+
1785
+ import os
1786
+ os.makedirs('downloads', exist_ok=True)
1787
+
1788
+ with yt_dlp.YoutubeDL(ydl_opts) as ydl:
1789
+ info = ydl.extract_info(url, download=True)
1790
+ audio_path = ydl.prepare_filename(info)
1791
+
1792
+ # 转换为WAV格式用于语音识别
1793
+ audio = AudioSegment.from_file(audio_path)
1794
+ wav_path = audio_path.replace('.webm', '.wav').replace('.m4a', '.wav')
1795
+ audio.export(wav_path, format="wav")
1796
+
1797
+ # 语音识别
1798
+ recognizer = sr.Recognizer()
1799
+ with sr.AudioFile(wav_path) as source:
1800
+ audio_data = recognizer.record(source)
1801
+ text = recognizer.recognize_google(audio_data, language='zh-CN')
1802
+
1803
+ # 清理临时文件
1804
+ os.remove(wav_path)
1805
+
1806
+ return f"视频转录结果:\n{text}"
1807
+
1808
+ except Exception as e:
1809
+ return f"YouTube视频转录失败: {str(e)}"
1810
+
1811
+ @staticmethod
1812
+ @tool
1813
+ def analyze_youtube_video_content(url: str) -> Dict[str, Any]:
1814
+ """分析YouTube视频内容 - 真正让VLLM看视频和听视频"""
1815
+ try:
1816
+ # 获取视频信息
1817
+ video_info = YouTubeTools.get_youtube_info(url)
1818
+ if 'error' in video_info:
1819
+ return video_info
1820
+
1821
+ analysis_result = {
1822
+ 'video_info': video_info,
1823
+ 'visual_analysis': "视频视觉分析功能不可用",
1824
+ 'audio_analysis': "音频分析功能不可用",
1825
+ 'transcription': "音频转录功能不可用"
1826
+ }
1827
+
1828
+ # 1. 下载视频用于视觉分析
1829
+ if YT_DLP_AVAILABLE:
1830
+ try:
1831
+ # 下载视频文件
1832
+ ydl_opts = {
1833
+ 'format': 'best[height<=720]', # 限制分辨率
1834
+ 'outtmpl': 'downloads/%(title)s.%(ext)s',
1835
+ 'quiet': True,
1836
+ 'no_warnings': True
1837
+ }
1838
+
1839
+ import os
1840
+ os.makedirs('downloads', exist_ok=True)
1841
+
1842
+ with yt_dlp.YoutubeDL(ydl_opts) as ydl:
1843
+ info = ydl.extract_info(url, download=True)
1844
+ video_path = ydl.prepare_filename(info)
1845
+
1846
+ # 2. 提取关键帧进行视觉分析
1847
+ try:
1848
+ import cv2
1849
+ import numpy as np
1850
+ from PIL import Image
1851
+
1852
+ cap = cv2.VideoCapture(video_path)
1853
+ frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
1854
+ fps = cap.get(cv2.CAP_PROP_FPS)
1855
+ duration = frame_count / fps if fps > 0 else 0
1856
+
1857
+ # 提取关键帧(每秒1帧)
1858
+ key_frames = []
1859
+ frame_interval = max(1, int(fps))
1860
+
1861
+ for i in range(0, frame_count, frame_interval):
1862
+ cap.set(cv2.CAP_PROP_POS_FRAMES, i)
1863
+ ret, frame = cap.read()
1864
+ if ret:
1865
+ # 转换为PIL图像
1866
+ frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
1867
+ pil_image = Image.fromarray(frame_rgb)
1868
+
1869
+ # 保存关键帧
1870
+ frame_path = f"downloads/frame_{i//frame_interval:03d}.jpg"
1871
+ pil_image.save(frame_path, "JPEG", quality=85)
1872
+ key_frames.append({
1873
+ 'frame_number': i,
1874
+ 'timestamp': i / fps if fps > 0 else 0,
1875
+ 'path': frame_path
1876
+ })
1877
+
1878
+ cap.release()
1879
+
1880
+ # 3. 使用VLLM分析关键帧
1881
+ try:
1882
+ from transformers import pipeline
1883
+
1884
+ # 图像描述模型
1885
+ image_to_text = pipeline("image-to-text", model="nlpconnect/vit-gpt2-image-captioning")
1886
+
1887
+ visual_descriptions = []
1888
+ for frame_info in key_frames[:10]: # 限制分析前10帧
1889
+ try:
1890
+ description = image_to_text(frame_info['path'])[0]['generated_text']
1891
+ visual_descriptions.append({
1892
+ 'timestamp': frame_info['timestamp'],
1893
+ 'description': description
1894
+ })
1895
+ except Exception as e:
1896
+ print(f"帧分析失败: {e}")
1897
+
1898
+ analysis_result['visual_analysis'] = {
1899
+ 'video_path': video_path,
1900
+ 'duration': duration,
1901
+ 'fps': fps,
1902
+ 'frame_count': frame_count,
1903
+ 'key_frames_analyzed': len(visual_descriptions),
1904
+ 'visual_descriptions': visual_descriptions,
1905
+ 'summary': f"视频包含{len(visual_descriptions)}个关键场景"
1906
+ }
1907
+
1908
+ except Exception as e:
1909
+ analysis_result['visual_analysis'] = f"VLLM视觉分析失败: {str(e)}"
1910
+
1911
+ except Exception as e:
1912
+ analysis_result['visual_analysis'] = f"视频帧提取失败: {str(e)}"
1913
+
1914
+ except Exception as e:
1915
+ analysis_result['visual_analysis'] = f"视频下载失败: {str(e)}"
1916
+
1917
+ # 4. 音频分析和转录(不依赖ffmpeg)
1918
+ if YT_DLP_AVAILABLE:
1919
+ try:
1920
+ # 下载音频
1921
+ ydl_opts = {
1922
+ 'format': 'bestaudio/best',
1923
+ 'outtmpl': 'downloads/%(title)s_audio.%(ext)s',
1924
+ 'quiet': True,
1925
+ 'no_warnings': True
1926
+ }
1927
+
1928
+ with yt_dlp.YoutubeDL(ydl_opts) as ydl:
1929
+ info = ydl.extract_info(url, download=True)
1930
+ audio_path = ydl.prepare_filename(info)
1931
+
1932
+ # 音频转录(使用多种方法,不依赖ffmpeg)
1933
+ try:
1934
+ # 方法1: 尝试使用whisper(推荐,不需要ffmpeg)
1935
+ try:
1936
+ import whisper
1937
+ print("🎤 使用whisper进行音频转录...")
1938
+ model = whisper.load_model("base")
1939
+ result = model.transcribe(audio_path)
1940
+ transcription_text = result["text"]
1941
+
1942
+ analysis_result['transcription'] = transcription_text
1943
+ analysis_result['audio_analysis'] = {
1944
+ 'audio_path': audio_path,
1945
+ 'duration': result.get('duration', 0),
1946
+ 'transcription': transcription_text,
1947
+ 'method': 'whisper',
1948
+ 'summary': f"音频时长{result.get('duration', 0):.1f}秒,已转录为文字"
1949
+ }
1950
+ print("✅ whisper转录成功")
1951
+
1952
+ except ImportError:
1953
+ print("⚠️ whisper未安装,尝试其他方法...")
1954
+ # 方法2: 尝试使用pydub + speech_recognition(如果ffmpeg可用)
1955
+ try:
1956
+ from pydub import AudioSegment
1957
+ import speech_recognition as sr
1958
+
1959
+ # 检查ffmpeg是否可用
1960
+ import subprocess
1961
+ try:
1962
+ subprocess.run(['ffmpeg', '-version'], capture_output=True, check=True)
1963
+ ffmpeg_available = True
1964
+ print("✅ ffmpeg可用,使用pydub+speech_recognition")
1965
+ except:
1966
+ ffmpeg_available = False
1967
+ print("❌ ffmpeg不可用")
1968
+
1969
+ if ffmpeg_available:
1970
+ # 转换为WAV格式
1971
+ audio = AudioSegment.from_file(audio_path)
1972
+ wav_path = audio_path.replace('.webm', '.wav').replace('.m4a', '.wav')
1973
+ audio.export(wav_path, format="wav")
1974
+
1975
+ # 语音识别
1976
+ recognizer = sr.Recognizer()
1977
+ with sr.AudioFile(wav_path) as source:
1978
+ audio_data = recognizer.record(source)
1979
+ transcription_text = recognizer.recognize_google(audio_data, language='zh-CN')
1980
+
1981
+ analysis_result['transcription'] = transcription_text
1982
+ analysis_result['audio_analysis'] = {
1983
+ 'audio_path': audio_path,
1984
+ 'duration': len(audio) / 1000, # 秒
1985
+ 'transcription': transcription_text,
1986
+ 'method': 'pydub+speech_recognition',
1987
+ 'summary': f"音频时长{len(audio)/1000:.1f}秒,已转录为文字"
1988
+ }
1989
+
1990
+ # 清理临时文件
1991
+ import os
1992
+ if os.path.exists(wav_path):
1993
+ os.remove(wav_path)
1994
+ else:
1995
+ # 方法3: 只提供音频文件信息,不进行转录
1996
+ analysis_result['transcription'] = "音频转录需要安装whisper或ffmpeg"
1997
+ analysis_result['audio_analysis'] = {
1998
+ 'audio_path': audio_path,
1999
+ 'duration': 'unknown',
2000
+ 'transcription': '需要ffmpeg或whisper进行转录',
2001
+ 'method': 'audio_only',
2002
+ 'summary': f"音频已下载到: {audio_path},需要安装whisper或ffmpeg进行转录"
2003
+ }
2004
+
2005
+ except Exception as e:
2006
+ print(f"❌ pydub+speech_recognition失败: {e}")
2007
+ analysis_result['transcription'] = f"音频转录失败: {str(e)}"
2008
+ analysis_result['audio_analysis'] = {
2009
+ 'audio_path': audio_path,
2010
+ 'duration': 'unknown',
2011
+ 'transcription': f'转录失败: {str(e)}',
2012
+ 'method': 'failed',
2013
+ 'summary': f"音频已下载,但转录失败: {str(e)}"
2014
+ }
2015
+
2016
+ except Exception as e:
2017
+ analysis_result['transcription'] = f"音频转录失败: {str(e)}"
2018
+ analysis_result['audio_analysis'] = {
2019
+ 'audio_path': audio_path,
2020
+ 'duration': 'unknown',
2021
+ 'transcription': f'转录失败: {str(e)}',
2022
+ 'method': 'failed',
2023
+ 'summary': f"音频已下载,但转录失败: {str(e)}"
2024
+ }
2025
+
2026
+ except Exception as e:
2027
+ analysis_result['audio_analysis'] = f"音频下载失败: {str(e)}"
2028
+
2029
+ # 5. 综合分析结果
2030
+ analysis_result['summary'] = f"这是一个关于{video_info.get('title', '未知主题')}的视频,时长{video_info.get('length', 0)}秒"
2031
+ analysis_result['key_points'] = [
2032
+ "视频标题: " + video_info.get('title', 'Unknown'),
2033
+ "作者: " + video_info.get('author', 'Unknown'),
2034
+ "时长: " + str(video_info.get('length', 0)) + "秒",
2035
+ "观看次数: " + str(video_info.get('views', 0)),
2036
+ "视觉分析: " + ("已完成" if isinstance(analysis_result['visual_analysis'], dict) else "失败"),
2037
+ "音频分析: " + ("已完成" if isinstance(analysis_result['audio_analysis'], dict) else "失败")
2038
+ ]
2039
+
2040
+ return analysis_result
2041
+
2042
+ except Exception as e:
2043
+ return {"error": f"YouTube视频内容分析失败: {str(e)}"}
2044
+
2045
+ class ToolManager:
2046
+ """工具管理器"""
2047
+
2048
+ def __init__(self):
2049
+ self.media_tools = MediaTools()
2050
+ self.code_tools = CodeAnalysisTools()
2051
+ self.pdf_tools = PDFTools()
2052
+ self.search_tools = SearchTools()
2053
+ self.analysis_tools = AnalysisTools()
2054
+ self.utility_tools = UtilityTools()
2055
+ self.web_tools = WebTools() # 添加WebTools到管理器
2056
+ self.youtube_tools = YouTubeTools() # 添加YouTubeTools到管理器
2057
+ self.wikipedia_tools = WikipediaTools() # 添加WikipediaTools到管理器
2058
+
2059
+ # 注册所有工具
2060
+ self.tools = {
2061
+ # PDF工具
2062
+ 'download_pdf_from_url': self.pdf_tools.download_pdf_from_url,
2063
+ 'extract_text_from_pdf': self.pdf_tools.extract_text_from_pdf,
2064
+ 'extract_images_from_pdf': self.pdf_tools.extract_images_from_pdf,
2065
+ 'analyze_pdf_structure': self.pdf_tools.analyze_pdf_structure,
2066
+ 'search_text_in_pdf': self.pdf_tools.search_text_in_pdf,
2067
+ 'summarize_pdf_content': self.pdf_tools.summarize_pdf_content,
2068
+
2069
+ # 媒体工具
2070
+ 'extract_text_from_image': self.media_tools.extract_text_from_image,
2071
+ 'analyze_image_emotion': self.media_tools.analyze_image_emotion,
2072
+ 'extract_video_audio': self.media_tools.extract_video_audio,
2073
+ 'analyze_video_content': self.media_tools.analyze_video_content,
2074
+
2075
+ # 代码工具
2076
+ 'analyze_python_code': self.code_tools.analyze_python_code,
2077
+ 'execute_python_code': self.code_tools.execute_python_code,
2078
+ 'explain_code': self.code_tools.explain_code,
2079
+
2080
+ # 搜索工具
2081
+ 'web_search': self.search_tools.web_search,
2082
+ 'search_images': self.search_tools.search_images,
2083
+ 'search_videos': self.search_tools.search_videos,
2084
+ 'search_pdfs': self.search_tools.search_pdfs,
2085
+
2086
+ # 分析工具
2087
+ 'analyze_text_sentiment': self.analysis_tools.analyze_text_sentiment,
2088
+ 'extract_keywords': self.analysis_tools.extract_keywords,
2089
+ 'summarize_text': self.analysis_tools.summarize_text,
2090
+
2091
+ # 实用工具
2092
+ 'get_current_weather': self.utility_tools.get_current_weather,
2093
+ 'translate_text': self.utility_tools.translate_text,
2094
+ 'calculate_math_expression': self.utility_tools.calculate_math_expression,
2095
+
2096
+ # 网页工具
2097
+ 'fetch_webpage_content': self.web_tools.fetch_webpage_content,
2098
+ 'extract_text_from_webpage': self.web_tools.extract_text_from_webpage,
2099
+ 'analyze_webpage_structure': self.web_tools.analyze_webpage_structure,
2100
+ 'search_content_in_webpage': self.web_tools.search_content_in_webpage,
2101
+ 'extract_links_from_webpage': self.web_tools.extract_links_from_webpage,
2102
+ 'summarize_webpage_content': self.web_tools.summarize_webpage_content,
2103
+ 'check_webpage_accessibility': self.web_tools.check_webpage_accessibility,
2104
+
2105
+ # YouTube工具
2106
+ 'download_youtube_video': self.youtube_tools.download_youtube_video,
2107
+ 'get_youtube_info': self.youtube_tools.get_youtube_info,
2108
+ 'extract_youtube_audio': self.youtube_tools.extract_youtube_audio,
2109
+ 'download_youtube_thumbnail': self.youtube_tools.download_youtube_thumbnail,
2110
+ 'search_youtube_videos': self.youtube_tools.search_youtube_videos,
2111
+ 'analyze_youtube_comments': self.youtube_tools.analyze_youtube_comments,
2112
+ 'get_youtube_playlist_info': self.youtube_tools.get_youtube_playlist_info,
2113
+ 'download_youtube_video_for_watching': self.youtube_tools.download_youtube_video_for_watching,
2114
+ 'extract_youtube_audio_for_listening': self.youtube_tools.extract_youtube_audio_for_listening,
2115
+ 'transcribe_youtube_video': self.youtube_tools.transcribe_youtube_video,
2116
+ 'analyze_youtube_video_content': self.youtube_tools.analyze_youtube_video_content,
2117
+
2118
+ # Wikipedia工具
2119
+ 'search_wikipedia': self.wikipedia_tools.search_wikipedia,
2120
+ 'get_wikipedia_page': self.wikipedia_tools.get_wikipedia_page,
2121
+ 'get_wikipedia_summary': self.wikipedia_tools.get_wikipedia_summary,
2122
+ 'get_wikipedia_random_page': self.wikipedia_tools.get_wikipedia_random_page,
2123
+ 'search_wikipedia_english': self.wikipedia_tools.search_wikipedia_english,
2124
+ 'get_wikipedia_page_english': self.wikipedia_tools.get_wikipedia_page_english,
2125
+ 'get_wikipedia_suggestions': self.wikipedia_tools.get_wikipedia_suggestions,
2126
+ 'get_wikipedia_categories': self.wikipedia_tools.get_wikipedia_categories,
2127
+ 'get_wikipedia_links': self.wikipedia_tools.get_wikipedia_links,
2128
+ 'get_wikipedia_geosearch': self.wikipedia_tools.get_wikipedia_geosearch,
2129
+ }
2130
+
2131
+ def get_tool(self, tool_name: str):
2132
+ """获取工具"""
2133
+ return self.tools.get(tool_name)
2134
+
2135
+ def list_tools(self) -> List[str]:
2136
+ """列出所有可用工具"""
2137
+ return list(self.tools.keys())
2138
+
2139
+ def execute_tool(self, tool_name: str, **kwargs) -> Any:
2140
+ """执行工具"""
2141
+ tool = self.get_tool(tool_name)
2142
+ if tool:
2143
+ # 直接调用工具函数
2144
+ if hasattr(tool, 'func'):
2145
+ # 如果是@tool装饰的函数,直接调用原始函数
2146
+ return tool.func(**kwargs)
2147
+ elif hasattr(tool, '__wrapped__'):
2148
+ # 备用方法
2149
+ return tool.__wrapped__(**kwargs)
2150
+ else:
2151
+ # 最后尝试run方法
2152
+ return tool.run(**kwargs)
2153
+ else:
2154
+ raise ValueError(f"工具 '{tool_name}' 不存在")
2155
+
2156
+ def should_use_search(self, question: str, context: Dict[str, Any]) -> bool:
2157
+ """判断是否需要使用搜索引擎"""
2158
+ question_lower = question.lower()
2159
+
2160
+ # 不需要搜索的情况
2161
+ no_search_keywords = [
2162
+ '计算', 'calculate', 'math', '数学',
2163
+ '代码', 'code', 'python', 'program',
2164
+ '翻译', 'translate',
2165
+ '天气', 'weather',
2166
+ '情感', 'sentiment', 'emotion',
2167
+ '关键词', 'keywords',
2168
+ '摘要', 'summary', 'summarize',
2169
+ 'pdf', '文档', 'document'
2170
+ ]
2171
+
2172
+ # 需要搜索的情况
2173
+ search_keywords = [
2174
+ '最新', 'latest', 'news', '新闻',
2175
+ '什么是', 'what is', 'how to', '如何',
2176
+ '价格', 'price', 'cost',
2177
+ '地点', 'location', 'where',
2178
+ '时间', 'time', 'when',
2179
+ '比较', 'compare', 'vs',
2180
+ '推荐', 'recommend', 'best'
2181
+ ]
2182
+
2183
+ # 检查问题类型
2184
+ for keyword in no_search_keywords:
2185
+ if keyword in question_lower:
2186
+ return False
2187
+
2188
+ for keyword in search_keywords:
2189
+ if keyword in question_lower:
2190
+ return True
2191
+
2192
+ # 如果问题包含具体实体或需要实时信息,使用搜索
2193
+ if any(word in question_lower for word in ['2024', '2023', 'today', 'now', 'current']):
2194
+ return True
2195
+
2196
+ # 默认不使用搜索,除非问题很长或很复杂
2197
+ return len(question) > 50