Spaces:

Rulga
/

status-law-gbot

Running

App Files Files Community

Rulga commited on Mar 26

Commit

cb3ba0c

1 Parent(s): 8db23d8

Update README and settings.py for improved project structure and model paths

Browse files

Files changed (2) hide show

README.md +74 -45
config/settings.py +6 -4

README.md CHANGED Viewed

@@ -9,76 +9,105 @@ app_file: app.py
 pinned: false
 ---
-An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).
 # Status Law Assistant
-Чат-бот на базе Hugging Face и LangChain для юридической консультации на основе информации с сайта компании Status Law.
-## 📝 Описание
-Status Law Assistant — это интеллектуальный чат-бот, который отвечает на вопросы пользователей о юридических услугах компании Status Law. Бот использует технологию RAG (Retrieval-Augmented Generation), чтобы находить релевантную информацию в базе знаний, созданной на основе содержимого официального сайта компании, и генерировать на её основе ответы с помощью языковой модели.
-## ✨ Возможности
-- Автоматическое создание и обновление базы знаний на основе контента сайта status.law
-- Поиск релевантной информации для ответа на вопросы пользователей
-- Генерация ответов с использованием контекстно-ориентированного подхода
-- Поддержка многоязычных запросов (отвечает на языке вопроса)
-- Настраиваемые параметры генерации текста (температура, количество токенов и т.д.)
-## 🚀 Технологии
-- **LangChain**: для создания цепочек обработки запросов и управления базой знаний
-- **Hugging Face**: для доступа к языковым моделям и хостинга приложения
-- **FAISS**: для эффективного векторного поиска
-- **Gradio**: для создания пользовательского интерфейса
-- **BeautifulSoup**: для извлечения информации с веб-страниц
-## 🏗️ Структура проекта
-- `app.py`: основной файл приложения, в котором определен интерфейс и логика обработки запросов
-- `config/`: директория с конфигурационными файлами
-- `src/`: директория с исходным кодом
-  - `knowledge_base/`: модуль для работы с базой знаний
-  - `models/`: модуль для работы с моделями
-# Status Law Knowledge Base Dataset
-This dataset serves as a storage for the Status Law Assistant chatbot, containing vector embeddings and chat history.
-## 📁 Structure
 ```
-status-law-knowledge-base/
-├── vector_store/
-│   ├── index.faiss     # FAISS vector store for document embeddings
-│   └── index.pkl       # Metadata and configuration for the vector store
-│
-└── chat_history/
-    └── logs.json       # Chat history logs
 ```
-## 🔍 Description
-- `vector_store/`: Contains FAISS embeddings of legal documents from status.law website
-  - `index.faiss`: Vector embeddings for semantic search
-  - `index.pkl`: Metadata and configuration information
-- `chat_history/`: Stores conversation logs
-  - `logs.json`: JSON file containing chat history and metadata
 ## 🚀 Usage
-This dataset is used by the Status Law Assistant chatbot to:
 1. Store and retrieve document embeddings for context-aware responses
 2. Maintain chat history for conversation continuity
 3. Track user interactions and improve response quality
 ## 🔗 Related Links
 - [Status Law Website](https://status.law)
-- [Status Law Assistant Repository](https://huggingface.co/spaces/Rulga/status-law-assistant)
 ## 📝 License
-Private dataset for Status Law Assistant usage only.

 pinned: false
 ---
 # Status Law Assistant
+An intelligent chatbot based on Hugging Face and LangChain for legal consultations using information from the Status Law company website.
+## 📝 Description
+Status Law Assistant is a smart chatbot that answers user questions about Status Law company's legal services. The bot uses RAG (Retrieval-Augmented Generation) technology to find relevant information in a knowledge base created from the official website content and generates responses using a language model.
+## ✨ Features
+- Automatic creation and updating of knowledge base from status.law website content
+- Relevant information search for user queries
+- Context-aware response generation
+- Multi-language query support (responds in the language of the question)
+- Customizable text generation parameters (temperature, token count, etc.)
+## 🚀 Technologies
+- **LangChain**: For query processing chains and knowledge base management
+- **Hugging Face**: For language model access and application hosting
+- **FAISS**: For efficient vector search
+- **Gradio**: For user interface creation
+- **BeautifulSoup**: For web page information extraction
+## 🏗️ Project Structure
 ```
+status-law-gbot/
+├── app.py                 # Main application file with interface and request handling logic
+├── requirements.txt       # Project dependencies
+├── config/               # Configuration files
+│   ├── settings.py       # Application settings
+│   └── constants.py      # Constants and default values
+├── src/                  # Source code
+│   ├── analytics/        # Analytics module
+│   │   └── chat_analyzer.py
+│   ├���─ knowledge_base/   # Knowledge base management
+│   │   ├── loader.py
+│   │   └── vector_store.py
+│   ├── training/         # Model training module
+│   │   ├── fine_tuner.py
+│   │   └── model_manager.py
+│   └── models/          # Model-related code
+├── web/                 # Web interface components
+│   └── training_interface.py
+└── data/               # Data storage
+    ├── vector_store/   # FAISS vector storage
+    │   ├── index.faiss
+    │   └── index.pkl
+    └── chat_history/   # Conversation logs
+        └── logs.json
 ```
+## 💾 Data Storage
+### Vector Store
+- `data/vector_store/index.faiss`: FAISS vector store for document embeddings
+- `data/vector_store/index.pkl`: Metadata and configuration for the vector store
+### Chat History
+- `data/chat_history/logs.json`: JSON file containing chat history and metadata
 ## 🚀 Usage
+The Status Law Assistant chatbot uses this structure to:
 1. Store and retrieve document embeddings for context-aware responses
 2. Maintain chat history for conversation continuity
 3. Track user interactions and improve response quality
+4. Fine-tune models based on conversation history
+## 🛠️ Setup
+1. Clone the repository:
+```bash
+git clone https://github.com/yourusername/status-law-gbot.git
+cd status-law-gbot
+```
+2. Install dependencies:
+```bash
+pip install -r requirements.txt
+```
+3. Set up environment variables:
+```bash
+cp .env.example .env
+# Edit .env with your configuration
+```
+4. Run the application:
+```bash
+python app.py
+```
 ## 🔗 Related Links
 - [Status Law Website](https://status.law)
+- [Status Law Assistant on Hugging Face](https://huggingface.co/spaces/Rulga/status-law-assistant)
 ## 📝 License
+Private repository for Status Law Assistant usage only.

config/settings.py CHANGED Viewed

@@ -2,10 +2,10 @@ import os
 from dotenv import load_dotenv
 # Debug information
-print("Current directory:", os.getcwd())
 env_path = os.path.join(os.getcwd(), '.env')
-print("Path to .env:", env_path)
-print(".env file exists:", os.path.exists(env_path))
 if os.path.exists(env_path):
     with open(env_path, 'r') as f:
@@ -18,14 +18,16 @@ load_dotenv(verbose=True)
 BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
 VECTOR_STORE_PATH = os.path.join(BASE_DIR, "data", "vector_store")
-# Add missing paths for training models
 MODEL_PATH = os.path.join(BASE_DIR, "models")
 TRAINING_OUTPUT_DIR = os.path.join(BASE_DIR, "models", "trained")
 # Create directories if they don't exist
 os.makedirs(VECTOR_STORE_PATH, exist_ok=True)
 os.makedirs(MODEL_PATH, exist_ok=True)
 os.makedirs(TRAINING_OUTPUT_DIR, exist_ok=True)
 # Model settings
 EMBEDDING_MODEL = "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"

 from dotenv import load_dotenv
 # Debug information
+#print("Current directory:", os.getcwd())
 env_path = os.path.join(os.getcwd(), '.env')
+#print("Path to .env:", env_path)
+#print(".env file exists:", os.path.exists(env_path))
 if os.path.exists(env_path):
     with open(env_path, 'r') as f:
 BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
 VECTOR_STORE_PATH = os.path.join(BASE_DIR, "data", "vector_store")
+# Добавляем недостающие пути для обучения моделей
 MODEL_PATH = os.path.join(BASE_DIR, "models")
 TRAINING_OUTPUT_DIR = os.path.join(BASE_DIR, "models", "trained")
+MODELS_REGISTRY_PATH = os.path.join(BASE_DIR, "data", "models_registry.json")
 # Create directories if they don't exist
 os.makedirs(VECTOR_STORE_PATH, exist_ok=True)
 os.makedirs(MODEL_PATH, exist_ok=True)
 os.makedirs(TRAINING_OUTPUT_DIR, exist_ok=True)
+os.makedirs(os.path.dirname(MODELS_REGISTRY_PATH), exist_ok=True)
 # Model settings
 EMBEDDING_MODEL = "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"