Rulga commited on
Commit
cb3ba0c
·
1 Parent(s): 8db23d8

Update README and settings.py for improved project structure and model paths

Browse files
Files changed (2) hide show
  1. README.md +74 -45
  2. config/settings.py +6 -4
README.md CHANGED
@@ -9,76 +9,105 @@ app_file: app.py
9
  pinned: false
10
  ---
11
 
12
- An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).
13
-
14
  # Status Law Assistant
15
 
16
- Чат-бот на базе Hugging Face и LangChain для юридической консультации на основе информации с сайта компании Status Law.
17
-
18
- ## 📝 Описание
19
-
20
- Status Law Assistant — это интеллектуальный чат-бот, который отвечает на вопросы пользователей о юридических услугах компании Status Law. Бот использует технологию RAG (Retrieval-Augmented Generation), чтобы находить релевантную информацию в базе знаний, созданной на основе содержимого официального сайта компании, и генерировать на её основе ответы с помощью языковой модели.
21
-
22
- ## ✨ Возможности
23
 
24
- - Автоматическое создание и обновление базы знаний на основе контента сайта status.law
25
- - Поиск релевантной информации для ответа на вопросы пользователей
26
- - Генерация ответов с использованием контекстно-ориентированного подхода
27
- - Поддержка многоязычных запросов (отвечает на языке вопроса)
28
- - Настраиваемые параметры генерации текста (температура, количество токенов и т.д.)
29
 
30
- ## 🚀 Технологии
31
 
32
- - **LangChain**: для создания цепочек обработки запросов и управления базой знаний
33
- - **Hugging Face**: для доступа к языковым моделям и хостинга приложения
34
- - **FAISS**: для эффективного векторного поиска
35
- - **Gradio**: для создания пользовательского интерфейса
36
- - **BeautifulSoup**: для извлечения информации с веб-страниц
37
 
38
- ## 🏗️ Структура проекта
 
 
 
 
39
 
40
- - `app.py`: основной файл приложения, в котором определен интерфейс и логика обработки запросов
41
- - `config/`: директория с конфигурационными файлами
42
- - `src/`: директория с исходным кодом
43
- - `knowledge_base/`: модуль для работы с базой знаний
44
- - `models/`: модуль для работы с моделями
45
- # Status Law Knowledge Base Dataset
46
 
47
- This dataset serves as a storage for the Status Law Assistant chatbot, containing vector embeddings and chat history.
 
 
 
 
48
 
49
- ## 📁 Structure
50
 
51
  ```
52
- status-law-knowledge-base/
53
- ├── vector_store/
54
- ├── index.faiss # FAISS vector store for document embeddings
55
- │ └── index.pkl # Metadata and configuration for the vector store
56
-
57
- └── chat_history/
58
- └── logs.json # Chat history logs
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
  ```
60
 
61
- ## 🔍 Description
62
 
63
- - `vector_store/`: Contains FAISS embeddings of legal documents from status.law website
64
- - `index.faiss`: Vector embeddings for semantic search
65
- - `index.pkl`: Metadata and configuration information
66
 
67
- - `chat_history/`: Stores conversation logs
68
- - `logs.json`: JSON file containing chat history and metadata
69
 
70
  ## 🚀 Usage
71
 
72
- This dataset is used by the Status Law Assistant chatbot to:
73
  1. Store and retrieve document embeddings for context-aware responses
74
  2. Maintain chat history for conversation continuity
75
  3. Track user interactions and improve response quality
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
 
77
  ## 🔗 Related Links
78
 
79
  - [Status Law Website](https://status.law)
80
- - [Status Law Assistant Repository](https://huggingface.co/spaces/Rulga/status-law-assistant)
81
 
82
  ## 📝 License
83
 
84
- Private dataset for Status Law Assistant usage only.
 
9
  pinned: false
10
  ---
11
 
 
 
12
  # Status Law Assistant
13
 
14
+ An intelligent chatbot based on Hugging Face and LangChain for legal consultations using information from the Status Law company website.
 
 
 
 
 
 
15
 
16
+ ## 📝 Description
 
 
 
 
17
 
18
+ Status Law Assistant is a smart chatbot that answers user questions about Status Law company's legal services. The bot uses RAG (Retrieval-Augmented Generation) technology to find relevant information in a knowledge base created from the official website content and generates responses using a language model.
19
 
20
+ ## Features
 
 
 
 
21
 
22
+ - Automatic creation and updating of knowledge base from status.law website content
23
+ - Relevant information search for user queries
24
+ - Context-aware response generation
25
+ - Multi-language query support (responds in the language of the question)
26
+ - Customizable text generation parameters (temperature, token count, etc.)
27
 
28
+ ## 🚀 Technologies
 
 
 
 
 
29
 
30
+ - **LangChain**: For query processing chains and knowledge base management
31
+ - **Hugging Face**: For language model access and application hosting
32
+ - **FAISS**: For efficient vector search
33
+ - **Gradio**: For user interface creation
34
+ - **BeautifulSoup**: For web page information extraction
35
 
36
+ ## 🏗️ Project Structure
37
 
38
  ```
39
+ status-law-gbot/
40
+ ├── app.py # Main application file with interface and request handling logic
41
+ ├── requirements.txt # Project dependencies
42
+ ├── config/ # Configuration files
43
+ ├── settings.py # Application settings
44
+ └── constants.py # Constants and default values
45
+ ├── src/ # Source code
46
+ │ ├── analytics/ # Analytics module
47
+ │ │ └── chat_analyzer.py
48
+ │ ├���─ knowledge_base/ # Knowledge base management
49
+ │ │ ├── loader.py
50
+ │ │ └── vector_store.py
51
+ │ ├── training/ # Model training module
52
+ │ │ ├── fine_tuner.py
53
+ │ │ └── model_manager.py
54
+ │ └── models/ # Model-related code
55
+ ├── web/ # Web interface components
56
+ │ └── training_interface.py
57
+ └── data/ # Data storage
58
+ ├── vector_store/ # FAISS vector storage
59
+ │ ├── index.faiss
60
+ │ └── index.pkl
61
+ └── chat_history/ # Conversation logs
62
+ └── logs.json
63
  ```
64
 
65
+ ## 💾 Data Storage
66
 
67
+ ### Vector Store
68
+ - `data/vector_store/index.faiss`: FAISS vector store for document embeddings
69
+ - `data/vector_store/index.pkl`: Metadata and configuration for the vector store
70
 
71
+ ### Chat History
72
+ - `data/chat_history/logs.json`: JSON file containing chat history and metadata
73
 
74
  ## 🚀 Usage
75
 
76
+ The Status Law Assistant chatbot uses this structure to:
77
  1. Store and retrieve document embeddings for context-aware responses
78
  2. Maintain chat history for conversation continuity
79
  3. Track user interactions and improve response quality
80
+ 4. Fine-tune models based on conversation history
81
+
82
+ ## 🛠️ Setup
83
+
84
+ 1. Clone the repository:
85
+ ```bash
86
+ git clone https://github.com/yourusername/status-law-gbot.git
87
+ cd status-law-gbot
88
+ ```
89
+
90
+ 2. Install dependencies:
91
+ ```bash
92
+ pip install -r requirements.txt
93
+ ```
94
+
95
+ 3. Set up environment variables:
96
+ ```bash
97
+ cp .env.example .env
98
+ # Edit .env with your configuration
99
+ ```
100
+
101
+ 4. Run the application:
102
+ ```bash
103
+ python app.py
104
+ ```
105
 
106
  ## 🔗 Related Links
107
 
108
  - [Status Law Website](https://status.law)
109
+ - [Status Law Assistant on Hugging Face](https://huggingface.co/spaces/Rulga/status-law-assistant)
110
 
111
  ## 📝 License
112
 
113
+ Private repository for Status Law Assistant usage only.
config/settings.py CHANGED
@@ -2,10 +2,10 @@ import os
2
  from dotenv import load_dotenv
3
 
4
  # Debug information
5
- print("Current directory:", os.getcwd())
6
  env_path = os.path.join(os.getcwd(), '.env')
7
- print("Path to .env:", env_path)
8
- print(".env file exists:", os.path.exists(env_path))
9
 
10
  if os.path.exists(env_path):
11
  with open(env_path, 'r') as f:
@@ -18,14 +18,16 @@ load_dotenv(verbose=True)
18
  BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
19
  VECTOR_STORE_PATH = os.path.join(BASE_DIR, "data", "vector_store")
20
 
21
- # Add missing paths for training models
22
  MODEL_PATH = os.path.join(BASE_DIR, "models")
23
  TRAINING_OUTPUT_DIR = os.path.join(BASE_DIR, "models", "trained")
 
24
 
25
  # Create directories if they don't exist
26
  os.makedirs(VECTOR_STORE_PATH, exist_ok=True)
27
  os.makedirs(MODEL_PATH, exist_ok=True)
28
  os.makedirs(TRAINING_OUTPUT_DIR, exist_ok=True)
 
29
 
30
  # Model settings
31
  EMBEDDING_MODEL = "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
 
2
  from dotenv import load_dotenv
3
 
4
  # Debug information
5
+ #print("Current directory:", os.getcwd())
6
  env_path = os.path.join(os.getcwd(), '.env')
7
+ #print("Path to .env:", env_path)
8
+ #print(".env file exists:", os.path.exists(env_path))
9
 
10
  if os.path.exists(env_path):
11
  with open(env_path, 'r') as f:
 
18
  BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
19
  VECTOR_STORE_PATH = os.path.join(BASE_DIR, "data", "vector_store")
20
 
21
+ # Добавляем недостающие пути для обучения моделей
22
  MODEL_PATH = os.path.join(BASE_DIR, "models")
23
  TRAINING_OUTPUT_DIR = os.path.join(BASE_DIR, "models", "trained")
24
+ MODELS_REGISTRY_PATH = os.path.join(BASE_DIR, "data", "models_registry.json")
25
 
26
  # Create directories if they don't exist
27
  os.makedirs(VECTOR_STORE_PATH, exist_ok=True)
28
  os.makedirs(MODEL_PATH, exist_ok=True)
29
  os.makedirs(TRAINING_OUTPUT_DIR, exist_ok=True)
30
+ os.makedirs(os.path.dirname(MODELS_REGISTRY_PATH), exist_ok=True)
31
 
32
  # Model settings
33
  EMBEDDING_MODEL = "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"