Spaces:
Running
Running
| title: Status Law Gbot | |
| emoji: π¬ | |
| colorFrom: yellow | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.23.0 | |
| app_file: app.py | |
| pinned: false | |
| # Status Law Assistant | |
| An intelligent chatbot based on Hugging Face and LangChain for legal consultations using information from the Status Law company website. | |
| ## π Description | |
| Status Law Assistant is a smart chatbot that answers user questions about Status Law company's legal services. The bot uses RAG (Retrieval-Augmented Generation) technology to find relevant information in a knowledge base created from the official website content. | |
| ## β¨ Key Features | |
| - Knowledge Base Management: | |
| - Dynamic URL management for knowledge base sources | |
| - Ability to add custom URLs for information extraction | |
| - Selective source inclusion/exclusion | |
| - Two modes of knowledge base updates: | |
| - Update Mode: Adds new information while preserving existing knowledge | |
| - Rebuild Mode: Complete recreation of knowledge base from selected sources | |
| - Real-time status tracking for knowledge base operations | |
| - Automatic metadata management and versioning | |
| - Automatic creation and updating of knowledge base from status.law website content | |
| - Intelligent information retrieval for query responses | |
| - Context-aware response generation | |
| - Advanced multilingual support: | |
| - Automatic language detection | |
| - Native language response generation | |
| - Built-in translation system with fallback mechanism | |
| - Support for 30+ languages | |
| - Customizable text generation parameters | |
| - Model switching system with automatic fallback | |
| - Fine-tuning capabilities based on chat history | |
| - Multiple model support: | |
| - Zephyr 7B: Enhanced performance and response quality | |
| - TinyLlama 1.1B Chat: Lightweight model for resource-constrained environments | |
| - Neural Mistral 7B: Superior reasoning and instruction following capabilities | |
| - Mixtral 8x7B: Advanced mixture-of-experts architecture | |
| ## π Technologies | |
| - **LangChain**: Query processing chains and knowledge base management | |
| - **Hugging Face**: Language model access and hosting | |
| - **FAISS**: Efficient vector search | |
| - **Gradio**: User interface creation | |
| - **BeautifulSoup**: Web page information extraction | |
| - **PEFT**: Efficient fine-tuning using LoRA | |
| - **SentencePiece**: Tokenization | |
| ## ποΈ Project Structure | |
| ``` | |
| status-law-gbot/ | |
| βββ app.py # Main application file | |
| βββ requirements.txt # Project dependencies | |
| βββ config/ # Configuration files | |
| β βββ settings.py # Application and model settings | |
| β βββ constants.py # Constants and default values | |
| βββ src/ # Source code | |
| β βββ analytics/ # Analytics module | |
| β β βββ chat_analyzer.py | |
| β βββ knowledge_base/ # Knowledge base management | |
| β β βββ loader.py | |
| β β βββ vector_store.py | |
| β βββ training/ # Training module | |
| β βββ fine_tuner.py | |
| β βββ model_manager.py | |
| βββ dataset/ # HuggingFace dataset structure | |
| βββ annotations/ # Conversation annotations | |
| βββ chat_history/ # Chat logs and conversations | |
| βββ fine_tuned_models/ # Fine-tuned model storage | |
| βββ preferences/ # User preferences | |
| βββ training_data/ # Processed training data | |
| βββ training_logs/ # Training process logs | |
| βββ vector_store/ # FAISS vector storage | |
| ``` | |
| ## πΎ Data Storage | |
| ### Dataset Organization | |
| - `annotations/`: Conversation quality metrics and annotations | |
| - `chat_history/`: JSON files containing chat conversations | |
| - `fine_tuned_models/`: Storage for LoRA adapters and model checkpoints | |
| - `preferences/`: User preferences and settings | |
| - `training_data/`: Processed data ready for model training | |
| - `training_logs/`: Detailed training process logs | |
| - `vector_store/`: FAISS indexes for semantic search | |
| ## π οΈ Setup | |
| 1. Clone the repository: | |
| ```bash | |
| git clone https://github.com/PtOlga/status-law-gbot.git | |
| cd status-law-gbot | |
| ``` | |
| 2. Install dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 3. Set up environment variables: | |
| ```bash | |
| cp .env.example .env | |
| # Edit .env with your configuration, including HUGGINGFACE_TOKEN | |
| ``` | |
| 4. Run the application: | |
| ```bash | |
| python app.py | |
| ``` | |
| ## π§ Model Fine-tuning | |
| To fine-tune the model on your chat history: | |
| ```python | |
| from src.training.fine_tuner import finetune_from_chat_history | |
| success, message = finetune_from_chat_history(epochs=3) | |
| print(message) | |
| ``` | |
| The fine-tuning process uses LoRA (Low-Rank Adaptation) for efficient training with minimal resource requirements. | |
| ## π Model Switching | |
| The application supports multiple models with automatic fallback: | |
| - Zephyr 7B: Enhanced performance and response quality | |
| - TinyLlama 1.1B Chat: Lightweight model for resource-constrained environments | |
| - Neural Mistral 7B: Superior reasoning and instruction following capabilities | |
| - Mixtral 8x7B: Advanced mixture-of-experts architecture | |
| Models can be switched dynamically through the interface or programmatically: | |
| ```python | |
| from src.training.model_manager import switch_to_model | |
| switch_to_model("zephyr-7b") # or "tinyllama-1.1b", "neural-mistral-7b", "mixtral-8x7b" | |
| ``` | |
| ## π Knowledge Base Management | |
| The application provides a flexible interface for managing knowledge sources: | |
| 1. **Source Management**: | |
| - View and edit the list of source URLs | |
| - Enable/disable specific sources | |
| - Add custom URLs for information extraction | |
| - Monitor source status and availability | |
| 2. **Update Operations**: | |
| - **Update Knowledge Base**: Incrementally add new information while preserving existing knowledge | |
| - **Rebuild Knowledge Base**: Completely recreate the knowledge base using only selected sources | |
| - Real-time operation status tracking | |
| - Automatic backup of previous versions | |
| 3. **Usage**: | |
| ```python | |
| # Add new URL to knowledge base | |
| sources_df.append({"URL": "https://example.com", "Include": True, "Status": "Ready"}) | |
| # Update knowledge base with selected sources | |
| update_kb_with_selected(sources_df) | |
| # Rebuild knowledge base from scratch | |
| rebuild_kb_with_selected(sources_df) | |
| ``` | |
| All changes to the knowledge base are automatically synchronized with the Hugging Face dataset, ensuring data persistence and version control. | |
| ## π Related Links | |
| - [Status Law Website](https://status.law) | |
| - [Status Law Assistant on Hugging Face](https://huggingface.co/spaces/Rulga/status-law-gbot) | |
| ## π License | |
| Public repository for Status Law Assistant. | |