Spaces:
Configuration error
Configuration error
Upload 7 files
Browse files- README.md +110 -12
- __init__.py +0 -0
- app.py +354 -64
- graph_demo_ui.py +87 -0
- requirements.txt +10 -1
- webui-test-graph.py +283 -0
- webui-test.py +354 -0
README.md
CHANGED
|
@@ -1,12 +1,110 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Easy-RAG
|
| 2 |
+
一个适合学习、使用、自主扩展的RAG【检索增强生成】系统,可以联网做AI搜索!
|
| 3 |
+
|
| 4 |
+
|
| 5 |
+

|
| 6 |
+
|
| 7 |
+
更新历史
|
| 8 |
+
|
| 9 |
+
2024/9/04 增加 AI网络搜索 可以联网查询
|
| 10 |
+
2024/9/04 优化webui异步调用,提高响应速度
|
| 11 |
+
2024/8/21 增加对 Elasticsearch 支持,在config中设置
|
| 12 |
+
2024/7/23 参考 meet-libai 项目增加了一个知识图谱的时时提取工具,目前仅是提取,未存储 graph_demo_ui.py
|
| 13 |
+
2024/7/11 新增faiss向量数据库支持,目前支持(Chroma\FAISS)
|
| 14 |
+
2024/7/10 更新rerank搜索方式
|
| 15 |
+
2024/7/09 第一版发布
|
| 16 |
+

|
| 17 |
+
|
| 18 |
+
1、目前已有的功能
|
| 19 |
+
|
| 20 |
+
知识库(目前仅支持txt\csv\pdf\md\doc\docx\mp3\mp4\wav\excel\格式数据):
|
| 21 |
+
|
| 22 |
+
1、知识库的创建(目前仅支持Chroma\Faiss\Elasticsearch)
|
| 23 |
+
2、知识库的更新
|
| 24 |
+
3、删除知识库中某个文件
|
| 25 |
+
4、删除知识库
|
| 26 |
+
5、向量化知识库
|
| 27 |
+
6、支持音频视频的语音转文本然后向量化
|
| 28 |
+
语音转文本 使用的 funasr ,第一次启动时,会从魔塔下载模型,可能会慢一些,之后会自动加载模型
|
| 29 |
+
|
| 30 |
+
chat
|
| 31 |
+
|
| 32 |
+
1、支持纯大模型聊天多轮
|
| 33 |
+
2、支持知识库问答 ["复杂召回方式", "简单召回方式","rerank"]
|
| 34 |
+
|
| 35 |
+
AI网络搜索
|
| 36 |
+
|
| 37 |
+
支持网络搜素,大家可以优化 prompt 增加不同 程度的 总结
|
| 38 |
+
llm基于ollama可以选择不同模型
|
| 39 |
+
注意:联网基于 searxng,需要先本地或者服务启动 这个项目,我用docker 启动的
|
| 40 |
+
参考 https://github.com/searxng/searxng-docker
|
| 41 |
+
|
| 42 |
+

|
| 43 |
+
3、通过使用rerank重新排序来提高检索效率
|
| 44 |
+
|
| 45 |
+
本次rerank 使用了bge-reranker-large 模型,需要下载到本地,然后再 rag/rerank.py中配置路径
|
| 46 |
+
模型地址:https://hf-mirror.com/BAAI/bge-reranker-large
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
2、后续更新计划
|
| 50 |
+
|
| 51 |
+
知识库:
|
| 52 |
+
|
| 53 |
+
0、支持Elasticsearch、Milvus,MongoDB等向量数据
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
chat:
|
| 57 |
+
|
| 58 |
+
1、添加 语音回答输出
|
| 59 |
+
2、增加 问题路由知识库的 功能
|
| 60 |
+
|
| 61 |
+
|
| 62 |
+
安装使用
|
| 63 |
+
|
| 64 |
+
Ollma安装,在如下网址选择适合你机器的ollama 安装包,傻瓜式安装即可
|
| 65 |
+
|
| 66 |
+
https://ollama.com/download
|
| 67 |
+
Ollama 安装模型,本次直接安装我们需要的两个 cmd中执行
|
| 68 |
+
|
| 69 |
+
ollama run qwen2:7b
|
| 70 |
+
ollama run mofanke/acge_text_embedding:latest
|
| 71 |
+
|
| 72 |
+
下载bge-reranker-large 模型然后在 rag/rerank.py中配置路径
|
| 73 |
+
|
| 74 |
+
https://hf-mirror.com/BAAI/bge-reranker-large
|
| 75 |
+
|
| 76 |
+
选择你想使用的向量数据库 目前仅支持(Chroma和Faiss)
|
| 77 |
+
|
| 78 |
+
在 Config/config.py中配置你想用的 向量数据库
|
| 79 |
+
如果选择 Elasticsearch 请先启动 Elasticsearch,我是使用docker 启动的
|
| 80 |
+
docker run -p 9200:9200 -e "discovery.type=single-node" -e "xpack.security.enabled=false" -e "xpack.security.http.ssl.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:8.12.1
|
| 81 |
+
注意修改 es_url
|
| 82 |
+
|
| 83 |
+
构造python环境
|
| 84 |
+
|
| 85 |
+
conda create -n Easy-RAG python=3.10.9
|
| 86 |
+
conda activate Easy-RAG
|
| 87 |
+
|
| 88 |
+
项目开发使用的 python3.10.9 经测试 pyhon3.8以上皆可使用
|
| 89 |
+
|
| 90 |
+
git clone https://github.com/yuntianhe2014/Easy-RAG.git
|
| 91 |
+
安装依赖
|
| 92 |
+
|
| 93 |
+
pip3 install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple
|
| 94 |
+
|
| 95 |
+
部署依赖联网项目searxng
|
| 96 |
+
参考 https://github.com/searxng/searxng-docker
|
| 97 |
+
项目启动
|
| 98 |
+
|
| 99 |
+
python webui.py
|
| 100 |
+
|
| 101 |
+
知识图谱时时提取工具
|
| 102 |
+
python graph_demo_ui.py
|
| 103 |
+

|
| 104 |
+
|
| 105 |
+
更多介绍参考 公众号文章:世界大模型
|
| 106 |
+

|
| 107 |
+
|
| 108 |
+
项目参考:
|
| 109 |
+
https://github.com/BinNong/meet-libai
|
| 110 |
+
https://github.com/searxng/searxng-docker
|
__init__.py
ADDED
|
File without changes
|
app.py
CHANGED
|
@@ -1,64 +1,354 @@
|
|
| 1 |
-
import gradio as gr
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
)
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
)
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
)
|
| 59 |
-
|
| 60 |
-
)
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import gradio as gr
|
| 2 |
+
import threading
|
| 3 |
+
import asyncio
|
| 4 |
+
import logging
|
| 5 |
+
from concurrent.futures import ThreadPoolExecutor
|
| 6 |
+
from functools import lru_cache
|
| 7 |
+
import requests
|
| 8 |
+
import json
|
| 9 |
+
|
| 10 |
+
# 假设这些是您的自定义模块,需要根据实际情况进行调整
|
| 11 |
+
from Config.config import VECTOR_DB, DB_directory
|
| 12 |
+
from Ollama_api.ollama_api import *
|
| 13 |
+
from rag.rag_class import *
|
| 14 |
+
|
| 15 |
+
# 设置日志
|
| 16 |
+
logging.basicConfig(level=logging.INFO)
|
| 17 |
+
logger = logging.getLogger(__name__)
|
| 18 |
+
|
| 19 |
+
# 根据VECTOR_DB选择合适的向量数据库
|
| 20 |
+
if VECTOR_DB == 1:
|
| 21 |
+
from embeding.chromadb import ChromaDB as vectorDB
|
| 22 |
+
vectordb = vectorDB(persist_directory=DB_directory)
|
| 23 |
+
elif VECTOR_DB == 2:
|
| 24 |
+
from embeding.faissdb import FaissDB as vectorDB
|
| 25 |
+
vectordb = vectorDB(persist_directory=DB_directory)
|
| 26 |
+
elif VECTOR_DB == 3:
|
| 27 |
+
from embeding.elasticsearchStore import ElsStore as vectorDB
|
| 28 |
+
vectordb = vectorDB()
|
| 29 |
+
|
| 30 |
+
# 存储上传的文件
|
| 31 |
+
uploaded_files = []
|
| 32 |
+
|
| 33 |
+
@lru_cache(maxsize=100)
|
| 34 |
+
def get_knowledge_base_files():
|
| 35 |
+
cl_dict = {}
|
| 36 |
+
cols = vectordb.get_all_collections_name()
|
| 37 |
+
for c_name in cols:
|
| 38 |
+
cl_dict[c_name] = vectordb.get_collcetion_content_files(c_name)
|
| 39 |
+
return cl_dict
|
| 40 |
+
|
| 41 |
+
knowledge_base_files = get_knowledge_base_files()
|
| 42 |
+
|
| 43 |
+
def upload_files(files):
|
| 44 |
+
if files:
|
| 45 |
+
new_files = [file.name for file in files]
|
| 46 |
+
uploaded_files.extend(new_files)
|
| 47 |
+
update_knowledge_base_files()
|
| 48 |
+
logger.info(f"Uploaded files: {new_files}")
|
| 49 |
+
return update_file_list(), new_files, "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Upload successful!</div>"
|
| 50 |
+
update_knowledge_base_files()
|
| 51 |
+
return update_file_list(), [], "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Upload failed!</div>"
|
| 52 |
+
|
| 53 |
+
def delete_files(selected_files):
|
| 54 |
+
global uploaded_files
|
| 55 |
+
uploaded_files = [f for f in uploaded_files if f not in selected_files]
|
| 56 |
+
if selected_files:
|
| 57 |
+
update_knowledge_base_files()
|
| 58 |
+
logger.info(f"Deleted files: {selected_files}")
|
| 59 |
+
return update_file_list(), "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Delete successful!</div>"
|
| 60 |
+
update_knowledge_base_files()
|
| 61 |
+
return update_file_list(), "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Delete failed!</div>"
|
| 62 |
+
|
| 63 |
+
def delete_collection(selected_knowledge_base):
|
| 64 |
+
if selected_knowledge_base and selected_knowledge_base != "创建知识库":
|
| 65 |
+
vectordb.delete_collection(selected_knowledge_base)
|
| 66 |
+
update_knowledge_base_files()
|
| 67 |
+
logger.info(f"Deleted collection: {selected_knowledge_base}")
|
| 68 |
+
return update_knowledge_base_dropdown(), "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Collection deleted successfully!</div>"
|
| 69 |
+
return update_knowledge_base_dropdown(), "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Delete collection failed!</div>"
|
| 70 |
+
|
| 71 |
+
async def async_vectorize_files(selected_files, selected_knowledge_base, new_kb_name, chunk_size, chunk_overlap):
|
| 72 |
+
if selected_files:
|
| 73 |
+
if selected_knowledge_base == "创建知识库":
|
| 74 |
+
knowledge_base = new_kb_name
|
| 75 |
+
vectordb.create_collection(selected_files, knowledge_base, chunk_size=chunk_size, chunk_overlap=chunk_overlap)
|
| 76 |
+
else:
|
| 77 |
+
knowledge_base = selected_knowledge_base
|
| 78 |
+
vectordb.add_chroma(selected_files, knowledge_base, chunk_size=chunk_size, chunk_overlap=chunk_overlap)
|
| 79 |
+
|
| 80 |
+
if knowledge_base not in knowledge_base_files:
|
| 81 |
+
knowledge_base_files[knowledge_base] = []
|
| 82 |
+
knowledge_base_files[knowledge_base].extend(selected_files)
|
| 83 |
+
|
| 84 |
+
logger.info(f"Vectorized files: {selected_files} for knowledge base: {knowledge_base}")
|
| 85 |
+
await asyncio.sleep(0) # 允许其他任务执行
|
| 86 |
+
return f"Vectorized files: {', '.join(selected_files)}\nKnowledge Base: {knowledge_base}\nUploaded Files: {', '.join(uploaded_files)}", "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Vectorization successful!</div>"
|
| 87 |
+
return "", "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Vectorization failed!</div>"
|
| 88 |
+
|
| 89 |
+
def update_file_list():
|
| 90 |
+
return gr.update(choices=uploaded_files, value=[])
|
| 91 |
+
|
| 92 |
+
def search_knowledge_base(selected_knowledge_base):
|
| 93 |
+
if selected_knowledge_base in knowledge_base_files:
|
| 94 |
+
kb_files = knowledge_base_files[selected_knowledge_base]
|
| 95 |
+
return gr.update(choices=kb_files, value=[])
|
| 96 |
+
return gr.update(choices=[], value=[])
|
| 97 |
+
|
| 98 |
+
def update_knowledge_base_files():
|
| 99 |
+
global knowledge_base_files
|
| 100 |
+
knowledge_base_files = get_knowledge_base_files()
|
| 101 |
+
|
| 102 |
+
# 处理聊天消息的函数
|
| 103 |
+
chat_history = []
|
| 104 |
+
|
| 105 |
+
def safe_chat_response(model_dropdown, vector_dropdown, chat_knowledge_base_dropdown, chain_dropdown, message):
|
| 106 |
+
try:
|
| 107 |
+
return chat_response(model_dropdown, vector_dropdown, chat_knowledge_base_dropdown, chain_dropdown, message)
|
| 108 |
+
except Exception as e:
|
| 109 |
+
logger.error(f"Error in chat response: {str(e)}")
|
| 110 |
+
return f"<div style='color: red;'>Error: {str(e)}</div>", ""
|
| 111 |
+
|
| 112 |
+
def chat_response(model_dropdown, vector_dropdown, chat_knowledge_base_dropdown, chain_dropdown, message):
|
| 113 |
+
global chat_history
|
| 114 |
+
if message:
|
| 115 |
+
chat_history.append(("User", message))
|
| 116 |
+
if chat_knowledge_base_dropdown == "仅使用模型":
|
| 117 |
+
rag = RAG_class(model=model_dropdown,persist_directory=DB_directory)
|
| 118 |
+
answer = rag.mult_chat(chat_history)
|
| 119 |
+
if chat_knowledge_base_dropdown and chat_knowledge_base_dropdown != "仅使用模型":
|
| 120 |
+
rag = RAG_class(model=model_dropdown, embed=vector_dropdown, c_name=chat_knowledge_base_dropdown, persist_directory=DB_directory)
|
| 121 |
+
if chain_dropdown == "复杂召回方式":
|
| 122 |
+
questions = rag.decomposition_chain(message)
|
| 123 |
+
answer = rag.rag_chain(questions)
|
| 124 |
+
elif chain_dropdown == "简单召回方式":
|
| 125 |
+
answer = rag.simple_chain(message)
|
| 126 |
+
else:
|
| 127 |
+
answer = rag.rerank_chain(message)
|
| 128 |
+
|
| 129 |
+
response = f" {answer}"
|
| 130 |
+
chat_history.append(("Bot", response))
|
| 131 |
+
return format_chat_history(chat_history), ""
|
| 132 |
+
|
| 133 |
+
def clear_chat():
|
| 134 |
+
global chat_history
|
| 135 |
+
chat_history = []
|
| 136 |
+
return format_chat_history(chat_history)
|
| 137 |
+
|
| 138 |
+
def format_chat_history(history):
|
| 139 |
+
formatted_history = ""
|
| 140 |
+
for user, msg in history:
|
| 141 |
+
if user == "User":
|
| 142 |
+
formatted_history += f'''
|
| 143 |
+
<div style="text-align: right; margin: 10px;">
|
| 144 |
+
<div style="display: inline-block; background-color: #DCF8C6; padding: 10px; border-radius: 10px; max-width: 60%;">
|
| 145 |
+
{msg}
|
| 146 |
+
</div>
|
| 147 |
+
<b>:User</b>
|
| 148 |
+
</div>
|
| 149 |
+
'''
|
| 150 |
+
else:
|
| 151 |
+
if "```" in msg: # 检测是否包含代码片段
|
| 152 |
+
code_content = msg.split("```")[1]
|
| 153 |
+
formatted_history += f'''
|
| 154 |
+
<div style="text-align: left; margin: 10px;">
|
| 155 |
+
<b>Bot:</b>
|
| 156 |
+
<div style="display: inline-block; background-color: #F1F0F0; padding: 10px; border-radius: 10px; max-width: 60%;">
|
| 157 |
+
<pre><code>{code_content}</code></pre>
|
| 158 |
+
</div>
|
| 159 |
+
</div>
|
| 160 |
+
'''
|
| 161 |
+
else:
|
| 162 |
+
formatted_history += f'''
|
| 163 |
+
<div style="text-align: left; margin: 10px;">
|
| 164 |
+
<b>Bot:</b>
|
| 165 |
+
<div style="display: inline-block; background-color: #F1F0F0; padding: 10px; border-radius: 10px; max-width: 60%;">
|
| 166 |
+
{msg}
|
| 167 |
+
</div>
|
| 168 |
+
</div>
|
| 169 |
+
'''
|
| 170 |
+
return formatted_history
|
| 171 |
+
|
| 172 |
+
def clear_status():
|
| 173 |
+
upload_status.update("")
|
| 174 |
+
delete_status.update("")
|
| 175 |
+
vectorize_status.update("")
|
| 176 |
+
delete_collection_status.update("")
|
| 177 |
+
|
| 178 |
+
def handle_knowledge_base_selection(selected_knowledge_base):
|
| 179 |
+
if selected_knowledge_base == "创建知识库":
|
| 180 |
+
return gr.update(visible=True, interactive=True), gr.update(choices=[], value=[]), gr.update(visible=False)
|
| 181 |
+
elif selected_knowledge_base == "仅使用模型":
|
| 182 |
+
return gr.update(visible=False, interactive=False), gr.update(choices=[], value=[]), gr.update(visible=False)
|
| 183 |
+
else:
|
| 184 |
+
return gr.update(visible=False, interactive=False), search_knowledge_base(selected_knowledge_base), gr.update(visible=True)
|
| 185 |
+
|
| 186 |
+
def update_knowledge_base_dropdown():
|
| 187 |
+
global knowledge_base_files
|
| 188 |
+
choices = ["创建知识库"] + list(knowledge_base_files.keys())
|
| 189 |
+
return gr.update(choices=choices)
|
| 190 |
+
|
| 191 |
+
def update_chat_knowledge_base_dropdown():
|
| 192 |
+
global knowledge_base_files
|
| 193 |
+
choices = ["仅使用模型"] + list(knowledge_base_files.keys())
|
| 194 |
+
return gr.update(choices=choices)
|
| 195 |
+
|
| 196 |
+
|
| 197 |
+
# SearxNG搜索函数
|
| 198 |
+
def search_searxng(query):
|
| 199 |
+
searxng_url = 'http://localhost:8080/search' # 替换为你的SearxNG实例URL
|
| 200 |
+
params = {
|
| 201 |
+
'q': query,
|
| 202 |
+
'format': 'json'
|
| 203 |
+
}
|
| 204 |
+
response = requests.get(searxng_url, params=params)
|
| 205 |
+
response.raise_for_status()
|
| 206 |
+
return response.json()
|
| 207 |
+
|
| 208 |
+
|
| 209 |
+
# Ollama总结函数
|
| 210 |
+
def summarize_with_ollama(model_dropdown,text, question):
|
| 211 |
+
prompt = """
|
| 212 |
+
根据下边的内容,回答用户问题,
|
| 213 |
+
内容为:‘{0}‘\n
|
| 214 |
+
问题为:{1}
|
| 215 |
+
""".format(text, question)
|
| 216 |
+
ollama_url = 'http://localhost:11434/api/generate' # 替换为你的Ollama实例URL
|
| 217 |
+
data = {
|
| 218 |
+
'model': model_dropdown,
|
| 219 |
+
"prompt": prompt,
|
| 220 |
+
"stream": False
|
| 221 |
+
}
|
| 222 |
+
response = requests.post(ollama_url, json=data)
|
| 223 |
+
response.raise_for_status()
|
| 224 |
+
return response.json()
|
| 225 |
+
|
| 226 |
+
|
| 227 |
+
# 处理函数
|
| 228 |
+
def ai_web_search(model_dropdown,user_query):
|
| 229 |
+
# 使用SearxNG进行搜索
|
| 230 |
+
search_results = search_searxng(user_query)
|
| 231 |
+
search_texts = [result['title'] + "\n" + result['content'] for result in search_results['results']]
|
| 232 |
+
combined_text = "\n\n".join(search_texts)
|
| 233 |
+
|
| 234 |
+
# 使用Ollama进行总结
|
| 235 |
+
summary = summarize_with_ollama(model_dropdown,combined_text, user_query)
|
| 236 |
+
# print(summary)
|
| 237 |
+
# 返回结果
|
| 238 |
+
return summary['response']
|
| 239 |
+
# 添加新的函数来处理AI网络搜索
|
| 240 |
+
# def ai_web_search(model_dropdown, query):
|
| 241 |
+
# try:
|
| 242 |
+
# # 这里添加实际的网络搜索和AI处理逻辑
|
| 243 |
+
# # 这只是一个示例,您需要根据实际情况实现
|
| 244 |
+
# search_result = f"搜索结果: {query}"
|
| 245 |
+
# ai_response = f"AI回答: 基于搜索结果,对于'{query}'的回答是..."
|
| 246 |
+
# return f"{search_result}\n\n{ai_response}"
|
| 247 |
+
# except Exception as e:
|
| 248 |
+
# logger.error(f"Error in AI web search: {str(e)}")
|
| 249 |
+
# return f"<div style='color: red;'>Error: {str(e)}</div>"
|
| 250 |
+
|
| 251 |
+
# 创建 Gradio 界面
|
| 252 |
+
with gr.Blocks() as demo:
|
| 253 |
+
with gr.Column():
|
| 254 |
+
# 添加标题
|
| 255 |
+
title = gr.HTML("<h1 style='text-align: center; font-size: 32px; font-weight: bold;'>RAG精致系统</h1>")
|
| 256 |
+
# 添加公告栏
|
| 257 |
+
announcement = gr.HTML("<div style='text-align: center; font-size: 18px; color: red;'>公告栏: RAG精致系统,【检索增强生成】系统!<br/>莫大大</div>")
|
| 258 |
+
|
| 259 |
+
with gr.Tabs():
|
| 260 |
+
with gr.TabItem("知识库"):
|
| 261 |
+
knowledge_base_dropdown = gr.Dropdown(choices=["创建知识库"] + list(knowledge_base_files.keys()),
|
| 262 |
+
label="选择知识库")
|
| 263 |
+
new_kb_input = gr.Textbox(label="输入新的知识库名称", visible=False, interactive=True)
|
| 264 |
+
file_input = gr.Files(label="Upload files")
|
| 265 |
+
upload_btn = gr.Button("Upload")
|
| 266 |
+
file_list = gr.CheckboxGroup(label="Uploaded Files")
|
| 267 |
+
delete_btn = gr.Button("Delete Selected Files")
|
| 268 |
+
with gr.Row():
|
| 269 |
+
chunk_size_dropdown = gr.Dropdown(choices=[50, 100, 200, 300, 500, 700], label="chunk_size", value=200)
|
| 270 |
+
chunk_overlap_dropdown = gr.Dropdown(choices=[20, 50, 100, 200], label="chunk_overlap", value=50)
|
| 271 |
+
vectorize_btn = gr.Button("Vectorize Selected Files")
|
| 272 |
+
delete_collection_btn = gr.Button("Delete Collection")
|
| 273 |
+
upload_status = gr.HTML()
|
| 274 |
+
delete_status = gr.HTML()
|
| 275 |
+
vectorize_status = gr.HTML()
|
| 276 |
+
delete_collection_status = gr.HTML()
|
| 277 |
+
|
| 278 |
+
with gr.TabItem("Chat"):
|
| 279 |
+
with gr.Row():
|
| 280 |
+
model_dropdown = gr.Dropdown(choices=get_llm(), label="模型")
|
| 281 |
+
vector_dropdown = gr.Dropdown(choices=get_embeding_model(), label="向量")
|
| 282 |
+
chat_knowledge_base_dropdown = gr.Dropdown(choices=["仅使用模型"] + vectordb.get_all_collections_name(), label="知识库")
|
| 283 |
+
chain_dropdown = gr.Dropdown(choices=["复杂召回方式", "简单召回方式","rerank"], label="chain方式", visible=False)
|
| 284 |
+
chat_display = gr.HTML(label="Chat History")
|
| 285 |
+
chat_input = gr.Textbox(label="Type a message")
|
| 286 |
+
chat_btn = gr.Button("Send")
|
| 287 |
+
clear_btn = gr.Button("Clear Chat History")
|
| 288 |
+
|
| 289 |
+
with gr.TabItem("AI网络搜索"):
|
| 290 |
+
with gr.Row():
|
| 291 |
+
web_search_model_dropdown = gr.Dropdown(choices=get_llm(), label="模型")
|
| 292 |
+
web_search_output = gr.Textbox(label="搜索结果和AI回答", lines=10)
|
| 293 |
+
web_search_input = gr.Textbox(label="输入搜索查询")
|
| 294 |
+
|
| 295 |
+
web_search_btn = gr.Button("搜索")
|
| 296 |
+
|
| 297 |
+
def handle_upload(files):
|
| 298 |
+
upload_result, new_files, status = upload_files(files)
|
| 299 |
+
threading.Thread(target=clear_status).start()
|
| 300 |
+
return upload_result, new_files, status, update_chat_knowledge_base_dropdown()
|
| 301 |
+
|
| 302 |
+
def handle_delete(selected_knowledge_base, selected_files):
|
| 303 |
+
tmp = []
|
| 304 |
+
cols_files_tmp = vectordb.get_collcetion_content_files(c_name=selected_knowledge_base)
|
| 305 |
+
for i in selected_files:
|
| 306 |
+
if i in cols_files_tmp:
|
| 307 |
+
tmp.append(i)
|
| 308 |
+
del cols_files_tmp
|
| 309 |
+
if tmp:
|
| 310 |
+
vectordb.del_files(tmp, c_name=selected_knowledge_base)
|
| 311 |
+
del tmp
|
| 312 |
+
delete_result, status = delete_files(selected_files)
|
| 313 |
+
threading.Thread(target=clear_status).start()
|
| 314 |
+
return delete_result, status, update_chat_knowledge_base_dropdown()
|
| 315 |
+
|
| 316 |
+
def handle_vectorize(selected_files, selected_knowledge_base, new_kb_name, chunk_size, chunk_overlap):
|
| 317 |
+
vectorize_result, status = asyncio.run(async_vectorize_files(selected_files, selected_knowledge_base, new_kb_name, chunk_size, chunk_overlap))
|
| 318 |
+
threading.Thread(target=clear_status).start()
|
| 319 |
+
return vectorize_result, status, update_knowledge_base_dropdown(), update_chat_knowledge_base_dropdown()
|
| 320 |
+
|
| 321 |
+
def handle_delete_collection(selected_knowledge_base):
|
| 322 |
+
result, status = delete_collection(selected_knowledge_base)
|
| 323 |
+
threading.Thread(target=clear_status).start()
|
| 324 |
+
return result, status, update_chat_knowledge_base_dropdown()
|
| 325 |
+
|
| 326 |
+
knowledge_base_dropdown.change(
|
| 327 |
+
handle_knowledge_base_selection,
|
| 328 |
+
inputs=knowledge_base_dropdown,
|
| 329 |
+
outputs=[new_kb_input, file_list, chain_dropdown]
|
| 330 |
+
)
|
| 331 |
+
upload_btn.click(handle_upload, inputs=file_input, outputs=[file_list, file_list, upload_status, chat_knowledge_base_dropdown])
|
| 332 |
+
delete_btn.click(handle_delete, inputs=[knowledge_base_dropdown, file_list], outputs=[file_list, delete_status, chat_knowledge_base_dropdown])
|
| 333 |
+
vectorize_btn.click(handle_vectorize, inputs=[file_list, knowledge_base_dropdown, new_kb_input, chunk_size_dropdown, chunk_overlap_dropdown],
|
| 334 |
+
outputs=[gr.Textbox(visible=False), vectorize_status, knowledge_base_dropdown, chat_knowledge_base_dropdown])
|
| 335 |
+
delete_collection_btn.click(handle_delete_collection, inputs=knowledge_base_dropdown,
|
| 336 |
+
outputs=[knowledge_base_dropdown, delete_collection_status, chat_knowledge_base_dropdown])
|
| 337 |
+
|
| 338 |
+
chat_btn.click(chat_response, inputs=[model_dropdown, vector_dropdown, chat_knowledge_base_dropdown, chain_dropdown, chat_input], outputs=[chat_display, chat_input])
|
| 339 |
+
clear_btn.click(clear_chat, outputs=chat_display)
|
| 340 |
+
|
| 341 |
+
chat_knowledge_base_dropdown.change(
|
| 342 |
+
fn=lambda selected: gr.update(visible=selected != "仅使用模型"),
|
| 343 |
+
inputs=chat_knowledge_base_dropdown,
|
| 344 |
+
outputs=chain_dropdown
|
| 345 |
+
)
|
| 346 |
+
|
| 347 |
+
# 添加新的点击事件处理
|
| 348 |
+
web_search_btn.click(
|
| 349 |
+
ai_web_search,
|
| 350 |
+
inputs=[web_search_model_dropdown, web_search_input],
|
| 351 |
+
outputs=web_search_output
|
| 352 |
+
)
|
| 353 |
+
|
| 354 |
+
demo.launch(debug=True,share=True)
|
graph_demo_ui.py
ADDED
|
@@ -0,0 +1,87 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# -*- coding: utf-8 -*-
|
| 2 |
+
from flask import Flask, render_template, request, jsonify
|
| 3 |
+
import json
|
| 4 |
+
from dotenv import load_dotenv
|
| 5 |
+
from langchain_community.llms import Ollama
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
load_dotenv()
|
| 9 |
+
|
| 10 |
+
app = Flask(__name__)
|
| 11 |
+
|
| 12 |
+
# 测试了 llama3:8b,gemma2:9b,qwen2:7b,glm4:9b,arcee-ai/arcee-agent:latest 目前来看 qwen2:7 效果最好
|
| 13 |
+
llm = Ollama(model="qwen2:7b")
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
json_example = {'edges': [{'data': {'color': '#FFA07A',
|
| 17 |
+
'label': 'label 1',
|
| 18 |
+
'source': 'source 1',
|
| 19 |
+
'target': 'target 1'}},
|
| 20 |
+
{'data': {'color': '#FFA07A',
|
| 21 |
+
'label': 'label 2',
|
| 22 |
+
'source': 'source 2',
|
| 23 |
+
'target': 'target 2'}}
|
| 24 |
+
],
|
| 25 |
+
'nodes': [{'data': {'color': '#FFC0CB', 'id': 'id 1', 'label': 'label 1'}},
|
| 26 |
+
{'data': {'color': '#90EE90', 'id': 'id 2', 'label': 'label 2'}},
|
| 27 |
+
{'data': {'color': '#87CEEB', 'id': 'id 3', 'label': 'label 3'}}]}
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
__retriever_prompt = f"""
|
| 32 |
+
您是一名专门从事知识图谱创建的人工智能专家,目标是根据给定的输入或请求捕获关系。
|
| 33 |
+
基于各种形式的用户输入,如段落、电子邮件、文本文件等。
|
| 34 |
+
你的任务是根据输入创建一个知识图谱。
|
| 35 |
+
nodes必须具有label参数,并且label是来自输入的词语或短语,nodes必须具有id参数,id的格式是"id_数字",不能重复。
|
| 36 |
+
edges还必须有一个label参数,其中label是输入中的直接词语或短语,edges中的source和target取自nodes中的id。
|
| 37 |
+
仅使用JSON进行响应,其格式可以在python中进行jsonify,并直接输入cy.add(data),包括“color”属性,以在前端显示图形。
|
| 38 |
+
您可以参考给定的示例:{json_example}。存储node和edge的数组中,最后一个元素后边不要有逗号,
|
| 39 |
+
确保边的目标和源与现有节点匹配。
|
| 40 |
+
不要在JSON的上方和下方包含markdown三引号,直接用花括号括起来。
|
| 41 |
+
"""
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
def generate_graph_info(raw_text: str) -> str | None:
|
| 45 |
+
"""
|
| 46 |
+
generate graph info from raw text
|
| 47 |
+
:param raw_text:
|
| 48 |
+
:return:
|
| 49 |
+
"""
|
| 50 |
+
messages = [
|
| 51 |
+
{"role": "system", "content": "你现在扮演信息抽取的角色,要求根据用户输入和AI的回答,正确提取出信息,记得不多对实体进行翻译。"},
|
| 52 |
+
{"role": "user", "content": raw_text},
|
| 53 |
+
{"role": "user", "content": __retriever_prompt}
|
| 54 |
+
]
|
| 55 |
+
print("解析中....")
|
| 56 |
+
for i in range(3):
|
| 57 |
+
graph_info_result = llm.invoke(messages)
|
| 58 |
+
if len(graph_info_result)<10:
|
| 59 |
+
print("-------",i,"-------------------")
|
| 60 |
+
continue
|
| 61 |
+
else:
|
| 62 |
+
break
|
| 63 |
+
print(graph_info_result)
|
| 64 |
+
return graph_info_result
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
@app.route('/')
|
| 68 |
+
def index():
|
| 69 |
+
return render_template('index.html')
|
| 70 |
+
|
| 71 |
+
|
| 72 |
+
@app.route('/update_graph', methods=['POST'])
|
| 73 |
+
def update_graph():
|
| 74 |
+
raw_text = request.json.get('text', '')
|
| 75 |
+
try:
|
| 76 |
+
result = generate_graph_info(raw_text)
|
| 77 |
+
if '```' in result:
|
| 78 |
+
graph_data=json.loads(result.split('```',2)[1].replace("json", ''))
|
| 79 |
+
else:
|
| 80 |
+
graph_data=json.loads(result)
|
| 81 |
+
return graph_data
|
| 82 |
+
except Exception as e:
|
| 83 |
+
return {'error': f"Error parsing graph data: {str(e)}"}
|
| 84 |
+
|
| 85 |
+
|
| 86 |
+
if __name__ == '__main__':
|
| 87 |
+
app.run(host='0.0.0.0', port=7860)
|
requirements.txt
CHANGED
|
@@ -1 +1,10 @@
|
|
| 1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
gradio==4.29.0
|
| 2 |
+
langchain-community==0.2.6
|
| 3 |
+
langchain==0.2.6
|
| 4 |
+
langchain-core==0.2.11
|
| 5 |
+
requests
|
| 6 |
+
transformers==4.41.1
|
| 7 |
+
unstructured==0.7.12
|
| 8 |
+
funasr==1.0.24
|
| 9 |
+
modelscope
|
| 10 |
+
chromadb
|
webui-test-graph.py
ADDED
|
@@ -0,0 +1,283 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import gradio as gr
|
| 2 |
+
import threading
|
| 3 |
+
from Config.config import VECTOR_DB,DB_directory
|
| 4 |
+
|
| 5 |
+
if VECTOR_DB==1:
|
| 6 |
+
from embeding.chromadb import ChromaDB as vectorDB
|
| 7 |
+
vectordb = vectorDB(persist_directory=DB_directory)
|
| 8 |
+
elif VECTOR_DB==2:
|
| 9 |
+
from embeding.faissdb import FaissDB as vectorDB
|
| 10 |
+
vectordb = vectorDB(persist_directory=DB_directory)
|
| 11 |
+
from Ollama_api.ollama_api import *
|
| 12 |
+
from rag.rag_class import *
|
| 13 |
+
|
| 14 |
+
# 存储上传的文件
|
| 15 |
+
uploaded_files = []
|
| 16 |
+
|
| 17 |
+
# 模拟获取最新的知识库文件
|
| 18 |
+
def get_knowledge_base_files():
|
| 19 |
+
cl_dict = {}
|
| 20 |
+
cols = vectordb.get_all_collections_name()
|
| 21 |
+
for c_name in cols:
|
| 22 |
+
cl_dict[c_name] = vectordb.get_collcetion_content_files(c_name)
|
| 23 |
+
return cl_dict
|
| 24 |
+
|
| 25 |
+
knowledge_base_files = get_knowledge_base_files()
|
| 26 |
+
|
| 27 |
+
def upload_files(files):
|
| 28 |
+
if files:
|
| 29 |
+
new_files = [file.name for file in files]
|
| 30 |
+
uploaded_files.extend(new_files)
|
| 31 |
+
update_knowledge_base_files()
|
| 32 |
+
return update_file_list(), new_files, "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Upload successful!</div>"
|
| 33 |
+
update_knowledge_base_files()
|
| 34 |
+
return update_file_list(), [], "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Upload failed!</div>"
|
| 35 |
+
|
| 36 |
+
def delete_files(selected_files):
|
| 37 |
+
global uploaded_files
|
| 38 |
+
uploaded_files = [f for f in uploaded_files if f not in selected_files]
|
| 39 |
+
if selected_files:
|
| 40 |
+
update_knowledge_base_files()
|
| 41 |
+
return update_file_list(), "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Delete successful!</div>"
|
| 42 |
+
update_knowledge_base_files()
|
| 43 |
+
return update_file_list(), "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Delete failed!</div>"
|
| 44 |
+
|
| 45 |
+
def delete_collection(selected_knowledge_base):
|
| 46 |
+
if selected_knowledge_base and selected_knowledge_base != "创建知识库":
|
| 47 |
+
vectordb.delete_collection(selected_knowledge_base)
|
| 48 |
+
update_knowledge_base_files()
|
| 49 |
+
return update_knowledge_base_dropdown(), "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Collection deleted successfully!</div>"
|
| 50 |
+
return update_knowledge_base_dropdown(), "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Delete collection failed!</div>"
|
| 51 |
+
|
| 52 |
+
def create_graph(selected_files):
|
| 53 |
+
from Neo4j.neo4j_op import KnowledgeGraph
|
| 54 |
+
from Neo4j.graph_extract import update_graph
|
| 55 |
+
from Config.config import neo4j_host, neo4j_name, neo4j_pwd
|
| 56 |
+
import tqdm
|
| 57 |
+
|
| 58 |
+
kg = KnowledgeGraph(neo4j_host,neo4j_name,neo4j_pwd)
|
| 59 |
+
data = kg.split_files(selected_files)
|
| 60 |
+
for doc in tqdm.tqdm(data):
|
| 61 |
+
text = doc.page_content
|
| 62 |
+
try:
|
| 63 |
+
res = update_graph(text)
|
| 64 |
+
# 批量创建节点
|
| 65 |
+
nodes = kg.create_nodes("node", res["nodes"])
|
| 66 |
+
|
| 67 |
+
# 批量创建关系
|
| 68 |
+
relationships = kg.create_relationships([
|
| 69 |
+
("node", {"name": edge["source"]}, "node", {"name": edge["target"]}, edge["label"]) for edge in res["edges"]
|
| 70 |
+
])
|
| 71 |
+
except:
|
| 72 |
+
print("错误----------------------------------")
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
def vectorize_files(selected_files, selected_knowledge_base, new_kb_name,choice_graph, chunk_size, chunk_overlap):
|
| 76 |
+
if selected_files:
|
| 77 |
+
if selected_knowledge_base == "创建知识库":
|
| 78 |
+
knowledge_base = new_kb_name
|
| 79 |
+
vectordb.create_collection(selected_files, knowledge_base, chunk_size=chunk_size, chunk_overlap=chunk_overlap)
|
| 80 |
+
if choice_graph=='是':
|
| 81 |
+
create_graph(selected_files)
|
| 82 |
+
else:
|
| 83 |
+
knowledge_base = selected_knowledge_base
|
| 84 |
+
vectordb.add_chroma(selected_files, knowledge_base, chunk_size=chunk_size, chunk_overlap=chunk_overlap)
|
| 85 |
+
if choice_graph == '是':
|
| 86 |
+
create_graph(selected_files)
|
| 87 |
+
if knowledge_base not in knowledge_base_files:
|
| 88 |
+
knowledge_base_files[knowledge_base] = []
|
| 89 |
+
knowledge_base_files[knowledge_base].extend(selected_files)
|
| 90 |
+
|
| 91 |
+
return f"Vectorized files: {', '.join(selected_files)}\nKnowledge Base: {knowledge_base}\nUploaded Files: {', '.join(uploaded_files)}", "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Vectorization successful!</div>"
|
| 92 |
+
return "", "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Vectorization failed!</div>"
|
| 93 |
+
|
| 94 |
+
def update_file_list():
|
| 95 |
+
return gr.update(choices=uploaded_files, value=[])
|
| 96 |
+
|
| 97 |
+
def search_knowledge_base(selected_knowledge_base):
|
| 98 |
+
if selected_knowledge_base in knowledge_base_files:
|
| 99 |
+
kb_files = knowledge_base_files[selected_knowledge_base]
|
| 100 |
+
return gr.update(choices=kb_files, value=[])
|
| 101 |
+
return gr.update(choices=[], value=[])
|
| 102 |
+
|
| 103 |
+
def update_knowledge_base_files():
|
| 104 |
+
global knowledge_base_files
|
| 105 |
+
knowledge_base_files = get_knowledge_base_files()
|
| 106 |
+
|
| 107 |
+
# 处理聊天消息的函数
|
| 108 |
+
chat_history = []
|
| 109 |
+
|
| 110 |
+
def chat_response(model_dropdown, vector_dropdown, chat_knowledge_base_dropdown, chain_dropdown, message):
|
| 111 |
+
global chat_history
|
| 112 |
+
if message:
|
| 113 |
+
chat_history.append(("User", message))
|
| 114 |
+
if chat_knowledge_base_dropdown == "仅使用模型":
|
| 115 |
+
rag = RAG_class(model=model_dropdown,persist_directory=DB_directory)
|
| 116 |
+
answer = rag.mult_chat(chat_history)
|
| 117 |
+
if chat_knowledge_base_dropdown and chat_knowledge_base_dropdown != "仅使用模型":
|
| 118 |
+
rag = RAG_class(model=model_dropdown, embed=vector_dropdown, c_name=chat_knowledge_base_dropdown, persist_directory=DB_directory)
|
| 119 |
+
if chain_dropdown == "复杂召回方式":
|
| 120 |
+
questions = rag.decomposition_chain(message)
|
| 121 |
+
answer = rag.rag_chain(questions)
|
| 122 |
+
elif chain_dropdown == "简单召回方式":
|
| 123 |
+
answer = rag.simple_chain(message)
|
| 124 |
+
else:
|
| 125 |
+
answer = rag.rerank_chain(message)
|
| 126 |
+
|
| 127 |
+
response = f" {answer}"
|
| 128 |
+
chat_history.append(("Bot", response))
|
| 129 |
+
return format_chat_history(chat_history), ""
|
| 130 |
+
|
| 131 |
+
def clear_chat():
|
| 132 |
+
global chat_history
|
| 133 |
+
chat_history = []
|
| 134 |
+
return format_chat_history(chat_history)
|
| 135 |
+
|
| 136 |
+
def format_chat_history(history):
|
| 137 |
+
formatted_history = ""
|
| 138 |
+
for user, msg in history:
|
| 139 |
+
if user == "User":
|
| 140 |
+
formatted_history += f'''
|
| 141 |
+
<div style="text-align: right; margin: 10px;">
|
| 142 |
+
<div style="display: inline-block; background-color: #DCF8C6; padding: 10px; border-radius: 10px; max-width: 60%;">
|
| 143 |
+
{msg}
|
| 144 |
+
</div>
|
| 145 |
+
<b>:User</b>
|
| 146 |
+
</div>
|
| 147 |
+
'''
|
| 148 |
+
else:
|
| 149 |
+
if "```" in msg: # 检测是否包含代码片段
|
| 150 |
+
code_content = msg.split("```")[1]
|
| 151 |
+
formatted_history += f'''
|
| 152 |
+
<div style="text-align: left; margin: 10px;">
|
| 153 |
+
<b>Bot:</b>
|
| 154 |
+
<div style="display: inline-block; background-color: #F1F0F0; padding: 10px; border-radius: 10px; max-width: 60%;">
|
| 155 |
+
<pre><code>{code_content}</code></pre>
|
| 156 |
+
</div>
|
| 157 |
+
</div>
|
| 158 |
+
'''
|
| 159 |
+
else:
|
| 160 |
+
formatted_history += f'''
|
| 161 |
+
<div style="text-align: left; margin: 10px;">
|
| 162 |
+
<b>Bot:</b>
|
| 163 |
+
<div style="display: inline-block; background-color: #F1F0F0; padding: 10px; border-radius: 10px; max-width: 60%;">
|
| 164 |
+
{msg}
|
| 165 |
+
</div>
|
| 166 |
+
</div>
|
| 167 |
+
'''
|
| 168 |
+
return formatted_history
|
| 169 |
+
|
| 170 |
+
def clear_status():
|
| 171 |
+
upload_status.update("")
|
| 172 |
+
delete_status.update("")
|
| 173 |
+
vectorize_status.update("")
|
| 174 |
+
delete_collection_status.update("")
|
| 175 |
+
|
| 176 |
+
def handle_knowledge_base_selection(selected_knowledge_base):
|
| 177 |
+
if selected_knowledge_base == "创建知识库":
|
| 178 |
+
return gr.update(visible=True, interactive=True), gr.update(choices=[], value=[]), gr.update(visible=False)
|
| 179 |
+
elif selected_knowledge_base == "仅使用模型":
|
| 180 |
+
return gr.update(visible=False, interactive=False), gr.update(choices=[], value=[]), gr.update(visible=False)
|
| 181 |
+
else:
|
| 182 |
+
return gr.update(visible=False, interactive=False), search_knowledge_base(selected_knowledge_base), gr.update(visible=True)
|
| 183 |
+
|
| 184 |
+
def update_knowledge_base_dropdown():
|
| 185 |
+
global knowledge_base_files
|
| 186 |
+
choices = ["创建知识库"] + list(knowledge_base_files.keys())
|
| 187 |
+
return gr.update(choices=choices)
|
| 188 |
+
|
| 189 |
+
def update_chat_knowledge_base_dropdown():
|
| 190 |
+
global knowledge_base_files
|
| 191 |
+
choices = ["仅使用模型"] + list(knowledge_base_files.keys())
|
| 192 |
+
return gr.update(choices=choices)
|
| 193 |
+
|
| 194 |
+
# 创建 Gradio 界面
|
| 195 |
+
with gr.Blocks() as demo:
|
| 196 |
+
with gr.Column():
|
| 197 |
+
# 添加标题
|
| 198 |
+
title = gr.HTML("<h1 style='text-align: center; font-size: 32px; font-weight: bold;'>RAG精致系统</h1>")
|
| 199 |
+
# 添加公告栏
|
| 200 |
+
announcement = gr.HTML("<div style='text-align: center; font-size: 18px; color: red;'>公告栏: 欢迎使用RAG精致系统</div>")
|
| 201 |
+
|
| 202 |
+
with gr.Tabs():
|
| 203 |
+
with gr.TabItem("知识库"):
|
| 204 |
+
knowledge_base_dropdown = gr.Dropdown(choices=["创建知识库"] + list(knowledge_base_files.keys()),
|
| 205 |
+
label="选择知识库")
|
| 206 |
+
new_kb_input = gr.Textbox(label="输入新的知识库名称", visible=False, interactive=True)
|
| 207 |
+
choice_graph = gr.Radio(choices=["否", "是"], value="否",label="是否同时提取知识图谱(会比较慢)")
|
| 208 |
+
file_input = gr.Files(label="Upload files")
|
| 209 |
+
upload_btn = gr.Button("Upload")
|
| 210 |
+
file_list = gr.CheckboxGroup(label="Uploaded Files")
|
| 211 |
+
delete_btn = gr.Button("Delete Selected Files")
|
| 212 |
+
with gr.Row():
|
| 213 |
+
chunk_size_dropdown = gr.Dropdown(choices=[50, 100, 200, 300, 500, 700], label="chunk_size", value=200)
|
| 214 |
+
chunk_overlap_dropdown = gr.Dropdown(choices=[20, 50, 100, 200], label="chunk_overlap", value=50)
|
| 215 |
+
vectorize_btn = gr.Button("Vectorize Selected Files")
|
| 216 |
+
delete_collection_btn = gr.Button("Delete Collection")
|
| 217 |
+
upload_status = gr.HTML()
|
| 218 |
+
delete_status = gr.HTML()
|
| 219 |
+
vectorize_status = gr.HTML()
|
| 220 |
+
delete_collection_status = gr.HTML()
|
| 221 |
+
|
| 222 |
+
with gr.TabItem("Chat"):
|
| 223 |
+
with gr.Row():
|
| 224 |
+
model_dropdown = gr.Dropdown(choices=get_llm(), label="模型")
|
| 225 |
+
vector_dropdown = gr.Dropdown(choices=get_embeding_model(), label="向量")
|
| 226 |
+
chat_knowledge_base_dropdown = gr.Dropdown(choices=["仅使用模型"] + vectordb.get_all_collections_name(), label="知识库")
|
| 227 |
+
chain_dropdown = gr.Dropdown(choices=["复杂召回方式", "简单召回方式","rerank"], label="chain方式", visible=False)
|
| 228 |
+
chat_display = gr.HTML(label="Chat History")
|
| 229 |
+
chat_input = gr.Textbox(label="Type a message")
|
| 230 |
+
chat_btn = gr.Button("Send")
|
| 231 |
+
clear_btn = gr.Button("Clear Chat History")
|
| 232 |
+
|
| 233 |
+
def handle_upload(files):
|
| 234 |
+
upload_result, new_files, status = upload_files(files)
|
| 235 |
+
threading.Thread(target=clear_status).start()
|
| 236 |
+
return upload_result, new_files, status, update_chat_knowledge_base_dropdown()
|
| 237 |
+
|
| 238 |
+
def handle_delete(selected_knowledge_base, selected_files):
|
| 239 |
+
tmp = []
|
| 240 |
+
cols_files_tmp = vectordb.get_collcetion_content_files(c_name=selected_knowledge_base)
|
| 241 |
+
for i in selected_files:
|
| 242 |
+
if i in cols_files_tmp:
|
| 243 |
+
tmp.append(i)
|
| 244 |
+
del cols_files_tmp
|
| 245 |
+
if tmp:
|
| 246 |
+
vectordb.del_files(tmp, c_name=selected_knowledge_base)
|
| 247 |
+
del tmp
|
| 248 |
+
delete_result, status = delete_files(selected_files)
|
| 249 |
+
threading.Thread(target=clear_status).start()
|
| 250 |
+
return delete_result, status, update_chat_knowledge_base_dropdown()
|
| 251 |
+
|
| 252 |
+
def handle_vectorize(selected_files, selected_knowledge_base, new_kb_name, choice_graph,chunk_size, chunk_overlap):
|
| 253 |
+
vectorize_result, status = vectorize_files(selected_files, selected_knowledge_base, new_kb_name, choice_graph,chunk_size, chunk_overlap)
|
| 254 |
+
threading.Thread(target=clear_status).start()
|
| 255 |
+
return vectorize_result, status, update_knowledge_base_dropdown(), update_chat_knowledge_base_dropdown()
|
| 256 |
+
|
| 257 |
+
def handle_delete_collection(selected_knowledge_base):
|
| 258 |
+
result, status = delete_collection(selected_knowledge_base)
|
| 259 |
+
threading.Thread(target=clear_status).start()
|
| 260 |
+
return result, status, update_chat_knowledge_base_dropdown()
|
| 261 |
+
|
| 262 |
+
knowledge_base_dropdown.change(
|
| 263 |
+
handle_knowledge_base_selection,
|
| 264 |
+
inputs=knowledge_base_dropdown,
|
| 265 |
+
outputs=[new_kb_input, file_list, chain_dropdown]
|
| 266 |
+
)
|
| 267 |
+
upload_btn.click(handle_upload, inputs=file_input, outputs=[file_list, file_list, upload_status, chat_knowledge_base_dropdown])
|
| 268 |
+
delete_btn.click(handle_delete, inputs=[knowledge_base_dropdown, file_list], outputs=[file_list, delete_status, chat_knowledge_base_dropdown])
|
| 269 |
+
vectorize_btn.click(handle_vectorize, inputs=[file_list, knowledge_base_dropdown, new_kb_input,choice_graph, chunk_size_dropdown, chunk_overlap_dropdown],
|
| 270 |
+
outputs=[gr.Textbox(visible=False), vectorize_status, knowledge_base_dropdown, chat_knowledge_base_dropdown])
|
| 271 |
+
delete_collection_btn.click(handle_delete_collection, inputs=knowledge_base_dropdown,
|
| 272 |
+
outputs=[knowledge_base_dropdown, delete_collection_status, chat_knowledge_base_dropdown])
|
| 273 |
+
|
| 274 |
+
chat_btn.click(chat_response, inputs=[model_dropdown, vector_dropdown, chat_knowledge_base_dropdown, chain_dropdown, chat_input], outputs=[chat_display, chat_input])
|
| 275 |
+
clear_btn.click(clear_chat, outputs=chat_display)
|
| 276 |
+
|
| 277 |
+
chat_knowledge_base_dropdown.change(
|
| 278 |
+
fn=lambda selected: gr.update(visible=selected != "仅使用模型"),
|
| 279 |
+
inputs=chat_knowledge_base_dropdown,
|
| 280 |
+
outputs=chain_dropdown
|
| 281 |
+
)
|
| 282 |
+
|
| 283 |
+
demo.launch(debug=True,share=True)
|
webui-test.py
ADDED
|
@@ -0,0 +1,354 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import gradio as gr
|
| 2 |
+
import threading
|
| 3 |
+
import asyncio
|
| 4 |
+
import logging
|
| 5 |
+
from concurrent.futures import ThreadPoolExecutor
|
| 6 |
+
from functools import lru_cache
|
| 7 |
+
import requests
|
| 8 |
+
import json
|
| 9 |
+
|
| 10 |
+
# 假设这些是您的自定义模块,需要根据实际情况进行调整
|
| 11 |
+
from Config.config import VECTOR_DB, DB_directory
|
| 12 |
+
from Ollama_api.ollama_api import *
|
| 13 |
+
from rag.rag_class import *
|
| 14 |
+
|
| 15 |
+
# 设置日志
|
| 16 |
+
logging.basicConfig(level=logging.INFO)
|
| 17 |
+
logger = logging.getLogger(__name__)
|
| 18 |
+
|
| 19 |
+
# 根据VECTOR_DB选择合适的向量数据库
|
| 20 |
+
if VECTOR_DB == 1:
|
| 21 |
+
from embeding.chromadb import ChromaDB as vectorDB
|
| 22 |
+
vectordb = vectorDB(persist_directory=DB_directory)
|
| 23 |
+
elif VECTOR_DB == 2:
|
| 24 |
+
from embeding.faissdb import FaissDB as vectorDB
|
| 25 |
+
vectordb = vectorDB(persist_directory=DB_directory)
|
| 26 |
+
elif VECTOR_DB == 3:
|
| 27 |
+
from embeding.elasticsearchStore import ElsStore as vectorDB
|
| 28 |
+
vectordb = vectorDB()
|
| 29 |
+
|
| 30 |
+
# 存储上传的文件
|
| 31 |
+
uploaded_files = []
|
| 32 |
+
|
| 33 |
+
@lru_cache(maxsize=100)
|
| 34 |
+
def get_knowledge_base_files():
|
| 35 |
+
cl_dict = {}
|
| 36 |
+
cols = vectordb.get_all_collections_name()
|
| 37 |
+
for c_name in cols:
|
| 38 |
+
cl_dict[c_name] = vectordb.get_collcetion_content_files(c_name)
|
| 39 |
+
return cl_dict
|
| 40 |
+
|
| 41 |
+
knowledge_base_files = get_knowledge_base_files()
|
| 42 |
+
|
| 43 |
+
def upload_files(files):
|
| 44 |
+
if files:
|
| 45 |
+
new_files = [file.name for file in files]
|
| 46 |
+
uploaded_files.extend(new_files)
|
| 47 |
+
update_knowledge_base_files()
|
| 48 |
+
logger.info(f"Uploaded files: {new_files}")
|
| 49 |
+
return update_file_list(), new_files, "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Upload successful!</div>"
|
| 50 |
+
update_knowledge_base_files()
|
| 51 |
+
return update_file_list(), [], "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Upload failed!</div>"
|
| 52 |
+
|
| 53 |
+
def delete_files(selected_files):
|
| 54 |
+
global uploaded_files
|
| 55 |
+
uploaded_files = [f for f in uploaded_files if f not in selected_files]
|
| 56 |
+
if selected_files:
|
| 57 |
+
update_knowledge_base_files()
|
| 58 |
+
logger.info(f"Deleted files: {selected_files}")
|
| 59 |
+
return update_file_list(), "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Delete successful!</div>"
|
| 60 |
+
update_knowledge_base_files()
|
| 61 |
+
return update_file_list(), "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Delete failed!</div>"
|
| 62 |
+
|
| 63 |
+
def delete_collection(selected_knowledge_base):
|
| 64 |
+
if selected_knowledge_base and selected_knowledge_base != "创建知识库":
|
| 65 |
+
vectordb.delete_collection(selected_knowledge_base)
|
| 66 |
+
update_knowledge_base_files()
|
| 67 |
+
logger.info(f"Deleted collection: {selected_knowledge_base}")
|
| 68 |
+
return update_knowledge_base_dropdown(), "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Collection deleted successfully!</div>"
|
| 69 |
+
return update_knowledge_base_dropdown(), "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Delete collection failed!</div>"
|
| 70 |
+
|
| 71 |
+
async def async_vectorize_files(selected_files, selected_knowledge_base, new_kb_name, chunk_size, chunk_overlap):
|
| 72 |
+
if selected_files:
|
| 73 |
+
if selected_knowledge_base == "创建知识库":
|
| 74 |
+
knowledge_base = new_kb_name
|
| 75 |
+
vectordb.create_collection(selected_files, knowledge_base, chunk_size=chunk_size, chunk_overlap=chunk_overlap)
|
| 76 |
+
else:
|
| 77 |
+
knowledge_base = selected_knowledge_base
|
| 78 |
+
vectordb.add_chroma(selected_files, knowledge_base, chunk_size=chunk_size, chunk_overlap=chunk_overlap)
|
| 79 |
+
|
| 80 |
+
if knowledge_base not in knowledge_base_files:
|
| 81 |
+
knowledge_base_files[knowledge_base] = []
|
| 82 |
+
knowledge_base_files[knowledge_base].extend(selected_files)
|
| 83 |
+
|
| 84 |
+
logger.info(f"Vectorized files: {selected_files} for knowledge base: {knowledge_base}")
|
| 85 |
+
await asyncio.sleep(0) # 允许其他任务执行
|
| 86 |
+
return f"Vectorized files: {', '.join(selected_files)}\nKnowledge Base: {knowledge_base}\nUploaded Files: {', '.join(uploaded_files)}", "<div style='color: green; padding: 10px; border: 2px solid green; border-radius: 5px;'>Vectorization successful!</div>"
|
| 87 |
+
return "", "<div style='color: red; padding: 10px; border: 2px solid red; border-radius: 5px;'>Vectorization failed!</div>"
|
| 88 |
+
|
| 89 |
+
def update_file_list():
|
| 90 |
+
return gr.update(choices=uploaded_files, value=[])
|
| 91 |
+
|
| 92 |
+
def search_knowledge_base(selected_knowledge_base):
|
| 93 |
+
if selected_knowledge_base in knowledge_base_files:
|
| 94 |
+
kb_files = knowledge_base_files[selected_knowledge_base]
|
| 95 |
+
return gr.update(choices=kb_files, value=[])
|
| 96 |
+
return gr.update(choices=[], value=[])
|
| 97 |
+
|
| 98 |
+
def update_knowledge_base_files():
|
| 99 |
+
global knowledge_base_files
|
| 100 |
+
knowledge_base_files = get_knowledge_base_files()
|
| 101 |
+
|
| 102 |
+
# 处理聊天消息的函数
|
| 103 |
+
chat_history = []
|
| 104 |
+
|
| 105 |
+
def safe_chat_response(model_dropdown, vector_dropdown, chat_knowledge_base_dropdown, chain_dropdown, message):
|
| 106 |
+
try:
|
| 107 |
+
return chat_response(model_dropdown, vector_dropdown, chat_knowledge_base_dropdown, chain_dropdown, message)
|
| 108 |
+
except Exception as e:
|
| 109 |
+
logger.error(f"Error in chat response: {str(e)}")
|
| 110 |
+
return f"<div style='color: red;'>Error: {str(e)}</div>", ""
|
| 111 |
+
|
| 112 |
+
def chat_response(model_dropdown, vector_dropdown, chat_knowledge_base_dropdown, chain_dropdown, message):
|
| 113 |
+
global chat_history
|
| 114 |
+
if message:
|
| 115 |
+
chat_history.append(("User", message))
|
| 116 |
+
if chat_knowledge_base_dropdown == "仅使用模型":
|
| 117 |
+
rag = RAG_class(model=model_dropdown,persist_directory=DB_directory)
|
| 118 |
+
answer = rag.mult_chat(chat_history)
|
| 119 |
+
if chat_knowledge_base_dropdown and chat_knowledge_base_dropdown != "仅使用模型":
|
| 120 |
+
rag = RAG_class(model=model_dropdown, embed=vector_dropdown, c_name=chat_knowledge_base_dropdown, persist_directory=DB_directory)
|
| 121 |
+
if chain_dropdown == "复杂召回方式":
|
| 122 |
+
questions = rag.decomposition_chain(message)
|
| 123 |
+
answer = rag.rag_chain(questions)
|
| 124 |
+
elif chain_dropdown == "简单召回方式":
|
| 125 |
+
answer = rag.simple_chain(message)
|
| 126 |
+
else:
|
| 127 |
+
answer = rag.rerank_chain(message)
|
| 128 |
+
|
| 129 |
+
response = f" {answer}"
|
| 130 |
+
chat_history.append(("Bot", response))
|
| 131 |
+
return format_chat_history(chat_history), ""
|
| 132 |
+
|
| 133 |
+
def clear_chat():
|
| 134 |
+
global chat_history
|
| 135 |
+
chat_history = []
|
| 136 |
+
return format_chat_history(chat_history)
|
| 137 |
+
|
| 138 |
+
def format_chat_history(history):
|
| 139 |
+
formatted_history = ""
|
| 140 |
+
for user, msg in history:
|
| 141 |
+
if user == "User":
|
| 142 |
+
formatted_history += f'''
|
| 143 |
+
<div style="text-align: right; margin: 10px;">
|
| 144 |
+
<div style="display: inline-block; background-color: #DCF8C6; padding: 10px; border-radius: 10px; max-width: 60%;">
|
| 145 |
+
{msg}
|
| 146 |
+
</div>
|
| 147 |
+
<b>:User</b>
|
| 148 |
+
</div>
|
| 149 |
+
'''
|
| 150 |
+
else:
|
| 151 |
+
if "```" in msg: # 检测是否包含代码片段
|
| 152 |
+
code_content = msg.split("```")[1]
|
| 153 |
+
formatted_history += f'''
|
| 154 |
+
<div style="text-align: left; margin: 10px;">
|
| 155 |
+
<b>Bot:</b>
|
| 156 |
+
<div style="display: inline-block; background-color: #F1F0F0; padding: 10px; border-radius: 10px; max-width: 60%;">
|
| 157 |
+
<pre><code>{code_content}</code></pre>
|
| 158 |
+
</div>
|
| 159 |
+
</div>
|
| 160 |
+
'''
|
| 161 |
+
else:
|
| 162 |
+
formatted_history += f'''
|
| 163 |
+
<div style="text-align: left; margin: 10px;">
|
| 164 |
+
<b>Bot:</b>
|
| 165 |
+
<div style="display: inline-block; background-color: #F1F0F0; padding: 10px; border-radius: 10px; max-width: 60%;">
|
| 166 |
+
{msg}
|
| 167 |
+
</div>
|
| 168 |
+
</div>
|
| 169 |
+
'''
|
| 170 |
+
return formatted_history
|
| 171 |
+
|
| 172 |
+
def clear_status():
|
| 173 |
+
upload_status.update("")
|
| 174 |
+
delete_status.update("")
|
| 175 |
+
vectorize_status.update("")
|
| 176 |
+
delete_collection_status.update("")
|
| 177 |
+
|
| 178 |
+
def handle_knowledge_base_selection(selected_knowledge_base):
|
| 179 |
+
if selected_knowledge_base == "创建知识库":
|
| 180 |
+
return gr.update(visible=True, interactive=True), gr.update(choices=[], value=[]), gr.update(visible=False)
|
| 181 |
+
elif selected_knowledge_base == "仅使用模型":
|
| 182 |
+
return gr.update(visible=False, interactive=False), gr.update(choices=[], value=[]), gr.update(visible=False)
|
| 183 |
+
else:
|
| 184 |
+
return gr.update(visible=False, interactive=False), search_knowledge_base(selected_knowledge_base), gr.update(visible=True)
|
| 185 |
+
|
| 186 |
+
def update_knowledge_base_dropdown():
|
| 187 |
+
global knowledge_base_files
|
| 188 |
+
choices = ["创建知识库"] + list(knowledge_base_files.keys())
|
| 189 |
+
return gr.update(choices=choices)
|
| 190 |
+
|
| 191 |
+
def update_chat_knowledge_base_dropdown():
|
| 192 |
+
global knowledge_base_files
|
| 193 |
+
choices = ["仅使用模型"] + list(knowledge_base_files.keys())
|
| 194 |
+
return gr.update(choices=choices)
|
| 195 |
+
|
| 196 |
+
|
| 197 |
+
# SearxNG搜索函数
|
| 198 |
+
def search_searxng(query):
|
| 199 |
+
searxng_url = 'http://localhost:8080/search' # 替换为你的SearxNG实例URL
|
| 200 |
+
params = {
|
| 201 |
+
'q': query,
|
| 202 |
+
'format': 'json'
|
| 203 |
+
}
|
| 204 |
+
response = requests.get(searxng_url, params=params)
|
| 205 |
+
response.raise_for_status()
|
| 206 |
+
return response.json()
|
| 207 |
+
|
| 208 |
+
|
| 209 |
+
# Ollama总结函数
|
| 210 |
+
def summarize_with_ollama(model_dropdown,text, question):
|
| 211 |
+
prompt = """
|
| 212 |
+
根据下边的内容,回答用户问题,
|
| 213 |
+
内容为:‘{0}‘\n
|
| 214 |
+
问题为:{1}
|
| 215 |
+
""".format(text, question)
|
| 216 |
+
ollama_url = 'http://localhost:11434/api/generate' # 替换为你的Ollama实例URL
|
| 217 |
+
data = {
|
| 218 |
+
'model': model_dropdown,
|
| 219 |
+
"prompt": prompt,
|
| 220 |
+
"stream": False
|
| 221 |
+
}
|
| 222 |
+
response = requests.post(ollama_url, json=data)
|
| 223 |
+
response.raise_for_status()
|
| 224 |
+
return response.json()
|
| 225 |
+
|
| 226 |
+
|
| 227 |
+
# 处理函数
|
| 228 |
+
def ai_web_search(model_dropdown,user_query):
|
| 229 |
+
# 使用SearxNG进行搜索
|
| 230 |
+
search_results = search_searxng(user_query)
|
| 231 |
+
search_texts = [result['title'] + "\n" + result['content'] for result in search_results['results']]
|
| 232 |
+
combined_text = "\n\n".join(search_texts)
|
| 233 |
+
|
| 234 |
+
# 使用Ollama进行总结
|
| 235 |
+
summary = summarize_with_ollama(model_dropdown,combined_text, user_query)
|
| 236 |
+
# print(summary)
|
| 237 |
+
# 返回结果
|
| 238 |
+
return summary['response']
|
| 239 |
+
# 添加新的函数来处理AI网络搜索
|
| 240 |
+
# def ai_web_search(model_dropdown, query):
|
| 241 |
+
# try:
|
| 242 |
+
# # 这里添加实际的网络搜索和AI处理逻辑
|
| 243 |
+
# # 这只是一个示例,您需要根据实际情况实现
|
| 244 |
+
# search_result = f"搜索结果: {query}"
|
| 245 |
+
# ai_response = f"AI回答: 基于搜索结果,对于'{query}'的回答是..."
|
| 246 |
+
# return f"{search_result}\n\n{ai_response}"
|
| 247 |
+
# except Exception as e:
|
| 248 |
+
# logger.error(f"Error in AI web search: {str(e)}")
|
| 249 |
+
# return f"<div style='color: red;'>Error: {str(e)}</div>"
|
| 250 |
+
|
| 251 |
+
# 创建 Gradio 界面
|
| 252 |
+
with gr.Blocks() as demo:
|
| 253 |
+
with gr.Column():
|
| 254 |
+
# 添加标题
|
| 255 |
+
title = gr.HTML("<h1 style='text-align: center; font-size: 32px; font-weight: bold;'>RAG精致系统</h1>")
|
| 256 |
+
# 添加公告栏
|
| 257 |
+
announcement = gr.HTML("<div style='text-align: center; font-size: 18px; color: red;'>公告栏: 欢迎使用RAG精致系统,一个适合学习、使用、自主扩展的【检索增强生成】系统!<br/>公众号:世界大模型</div>")
|
| 258 |
+
|
| 259 |
+
with gr.Tabs():
|
| 260 |
+
with gr.TabItem("知识库"):
|
| 261 |
+
knowledge_base_dropdown = gr.Dropdown(choices=["创建知识库"] + list(knowledge_base_files.keys()),
|
| 262 |
+
label="选择知识库")
|
| 263 |
+
new_kb_input = gr.Textbox(label="输入新的知识库名称", visible=False, interactive=True)
|
| 264 |
+
file_input = gr.Files(label="Upload files")
|
| 265 |
+
upload_btn = gr.Button("Upload")
|
| 266 |
+
file_list = gr.CheckboxGroup(label="Uploaded Files")
|
| 267 |
+
delete_btn = gr.Button("Delete Selected Files")
|
| 268 |
+
with gr.Row():
|
| 269 |
+
chunk_size_dropdown = gr.Dropdown(choices=[50, 100, 200, 300, 500, 700], label="chunk_size", value=200)
|
| 270 |
+
chunk_overlap_dropdown = gr.Dropdown(choices=[20, 50, 100, 200], label="chunk_overlap", value=50)
|
| 271 |
+
vectorize_btn = gr.Button("Vectorize Selected Files")
|
| 272 |
+
delete_collection_btn = gr.Button("Delete Collection")
|
| 273 |
+
upload_status = gr.HTML()
|
| 274 |
+
delete_status = gr.HTML()
|
| 275 |
+
vectorize_status = gr.HTML()
|
| 276 |
+
delete_collection_status = gr.HTML()
|
| 277 |
+
|
| 278 |
+
with gr.TabItem("Chat"):
|
| 279 |
+
with gr.Row():
|
| 280 |
+
model_dropdown = gr.Dropdown(choices=get_llm(), label="模型")
|
| 281 |
+
vector_dropdown = gr.Dropdown(choices=get_embeding_model(), label="向量")
|
| 282 |
+
chat_knowledge_base_dropdown = gr.Dropdown(choices=["仅使用模型"] + vectordb.get_all_collections_name(), label="知识库")
|
| 283 |
+
chain_dropdown = gr.Dropdown(choices=["复杂召回方式", "简单召回方式","rerank"], label="chain方式", visible=False)
|
| 284 |
+
chat_display = gr.HTML(label="Chat History")
|
| 285 |
+
chat_input = gr.Textbox(label="Type a message")
|
| 286 |
+
chat_btn = gr.Button("Send")
|
| 287 |
+
clear_btn = gr.Button("Clear Chat History")
|
| 288 |
+
|
| 289 |
+
with gr.TabItem("AI网络搜索"):
|
| 290 |
+
with gr.Row():
|
| 291 |
+
web_search_model_dropdown = gr.Dropdown(choices=get_llm(), label="模型")
|
| 292 |
+
web_search_output = gr.Textbox(label="搜索结果和AI回答", lines=10)
|
| 293 |
+
web_search_input = gr.Textbox(label="输入搜索查询")
|
| 294 |
+
|
| 295 |
+
web_search_btn = gr.Button("搜索")
|
| 296 |
+
|
| 297 |
+
def handle_upload(files):
|
| 298 |
+
upload_result, new_files, status = upload_files(files)
|
| 299 |
+
threading.Thread(target=clear_status).start()
|
| 300 |
+
return upload_result, new_files, status, update_chat_knowledge_base_dropdown()
|
| 301 |
+
|
| 302 |
+
def handle_delete(selected_knowledge_base, selected_files):
|
| 303 |
+
tmp = []
|
| 304 |
+
cols_files_tmp = vectordb.get_collcetion_content_files(c_name=selected_knowledge_base)
|
| 305 |
+
for i in selected_files:
|
| 306 |
+
if i in cols_files_tmp:
|
| 307 |
+
tmp.append(i)
|
| 308 |
+
del cols_files_tmp
|
| 309 |
+
if tmp:
|
| 310 |
+
vectordb.del_files(tmp, c_name=selected_knowledge_base)
|
| 311 |
+
del tmp
|
| 312 |
+
delete_result, status = delete_files(selected_files)
|
| 313 |
+
threading.Thread(target=clear_status).start()
|
| 314 |
+
return delete_result, status, update_chat_knowledge_base_dropdown()
|
| 315 |
+
|
| 316 |
+
def handle_vectorize(selected_files, selected_knowledge_base, new_kb_name, chunk_size, chunk_overlap):
|
| 317 |
+
vectorize_result, status = asyncio.run(async_vectorize_files(selected_files, selected_knowledge_base, new_kb_name, chunk_size, chunk_overlap))
|
| 318 |
+
threading.Thread(target=clear_status).start()
|
| 319 |
+
return vectorize_result, status, update_knowledge_base_dropdown(), update_chat_knowledge_base_dropdown()
|
| 320 |
+
|
| 321 |
+
def handle_delete_collection(selected_knowledge_base):
|
| 322 |
+
result, status = delete_collection(selected_knowledge_base)
|
| 323 |
+
threading.Thread(target=clear_status).start()
|
| 324 |
+
return result, status, update_chat_knowledge_base_dropdown()
|
| 325 |
+
|
| 326 |
+
knowledge_base_dropdown.change(
|
| 327 |
+
handle_knowledge_base_selection,
|
| 328 |
+
inputs=knowledge_base_dropdown,
|
| 329 |
+
outputs=[new_kb_input, file_list, chain_dropdown]
|
| 330 |
+
)
|
| 331 |
+
upload_btn.click(handle_upload, inputs=file_input, outputs=[file_list, file_list, upload_status, chat_knowledge_base_dropdown])
|
| 332 |
+
delete_btn.click(handle_delete, inputs=[knowledge_base_dropdown, file_list], outputs=[file_list, delete_status, chat_knowledge_base_dropdown])
|
| 333 |
+
vectorize_btn.click(handle_vectorize, inputs=[file_list, knowledge_base_dropdown, new_kb_input, chunk_size_dropdown, chunk_overlap_dropdown],
|
| 334 |
+
outputs=[gr.Textbox(visible=False), vectorize_status, knowledge_base_dropdown, chat_knowledge_base_dropdown])
|
| 335 |
+
delete_collection_btn.click(handle_delete_collection, inputs=knowledge_base_dropdown,
|
| 336 |
+
outputs=[knowledge_base_dropdown, delete_collection_status, chat_knowledge_base_dropdown])
|
| 337 |
+
|
| 338 |
+
chat_btn.click(chat_response, inputs=[model_dropdown, vector_dropdown, chat_knowledge_base_dropdown, chain_dropdown, chat_input], outputs=[chat_display, chat_input])
|
| 339 |
+
clear_btn.click(clear_chat, outputs=chat_display)
|
| 340 |
+
|
| 341 |
+
chat_knowledge_base_dropdown.change(
|
| 342 |
+
fn=lambda selected: gr.update(visible=selected != "仅使用模型"),
|
| 343 |
+
inputs=chat_knowledge_base_dropdown,
|
| 344 |
+
outputs=chain_dropdown
|
| 345 |
+
)
|
| 346 |
+
|
| 347 |
+
# 添加新的点击事件处理
|
| 348 |
+
web_search_btn.click(
|
| 349 |
+
ai_web_search,
|
| 350 |
+
inputs=[web_search_model_dropdown, web_search_input],
|
| 351 |
+
outputs=web_search_output
|
| 352 |
+
)
|
| 353 |
+
|
| 354 |
+
demo.launch(debug=True,share=True)
|