Spaces:
Running
Running
File size: 6,581 Bytes
5c0a82c 1559b48 5c0a82c 60cfc29 5c0a82c 0f93e9d cb3ba0c 0f93e9d cb3ba0c 0f93e9d 9a1d867 0f93e9d 9a1d867 0f93e9d 0e0b35b cb3ba0c 9a1d867 cb3ba0c 9a1d867 a442043 7a5f6c0 9a1d867 e6ceacc 0f93e9d cb3ba0c 0d2d420 9a1d867 0d2d420 cb3ba0c 0d2d420 cb3ba0c 9a1d867 cb3ba0c 9a1d867 cb3ba0c 9a1d867 0dd9926 0d2d420 cb3ba0c 0d2d420 0dd9926 a442043 cb3ba0c 9a1d867 cb3ba0c a442043 cb3ba0c 0d2d420 a442043 9a1d867 e6ceacc a442043 e6ceacc a442043 0e0b35b 0d2d420 9a1d867 0d2d420 9a1d867 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
---
title: Status Law Gbot
emoji: π¬
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.23.0
app_file: app.py
pinned: false
---
# Status Law Assistant
An intelligent chatbot based on Hugging Face and LangChain for legal consultations using information from the Status Law company website.
## π Description
Status Law Assistant is a smart chatbot that answers user questions about Status Law company's legal services. The bot uses RAG (Retrieval-Augmented Generation) technology to find relevant information in a knowledge base created from the official website content.
## β¨ Key Features
- Knowledge Base Management:
- Dynamic URL management for knowledge base sources
- Ability to add custom URLs for information extraction
- Selective source inclusion/exclusion
- Two modes of knowledge base updates:
- Update Mode: Adds new information while preserving existing knowledge
- Rebuild Mode: Complete recreation of knowledge base from selected sources
- Real-time status tracking for knowledge base operations
- Automatic metadata management and versioning
- Automatic creation and updating of knowledge base from status.law website content
- Intelligent information retrieval for query responses
- Context-aware response generation
- Advanced multilingual support:
- Automatic language detection
- Native language response generation
- Built-in translation system with fallback mechanism
- Support for 30+ languages
- Customizable text generation parameters
- Model switching system with automatic fallback
- Fine-tuning capabilities based on chat history
- Multiple model support:
- Zephyr 7B: Enhanced performance and response quality
- TinyLlama 1.1B Chat: Lightweight model for resource-constrained environments
- Neural Mistral 7B: Superior reasoning and instruction following capabilities
- Mixtral 8x7B: Advanced mixture-of-experts architecture
## π Technologies
- **LangChain**: Query processing chains and knowledge base management
- **Hugging Face**: Language model access and hosting
- **FAISS**: Efficient vector search
- **Gradio**: User interface creation
- **BeautifulSoup**: Web page information extraction
- **PEFT**: Efficient fine-tuning using LoRA
- **SentencePiece**: Tokenization
## ποΈ Project Structure
```
status-law-gbot/
βββ app.py # Main application file
βββ requirements.txt # Project dependencies
βββ config/ # Configuration files
β βββ settings.py # Application and model settings
β βββ constants.py # Constants and default values
βββ src/ # Source code
β βββ analytics/ # Analytics module
β β βββ chat_analyzer.py
β βββ knowledge_base/ # Knowledge base management
β β βββ loader.py
β β βββ vector_store.py
β βββ training/ # Training module
β βββ fine_tuner.py
β βββ model_manager.py
βββ dataset/ # HuggingFace dataset structure
βββ annotations/ # Conversation annotations
βββ chat_history/ # Chat logs and conversations
βββ fine_tuned_models/ # Fine-tuned model storage
βββ preferences/ # User preferences
βββ training_data/ # Processed training data
βββ training_logs/ # Training process logs
βββ vector_store/ # FAISS vector storage
```
## πΎ Data Storage
### Dataset Organization
- `annotations/`: Conversation quality metrics and annotations
- `chat_history/`: JSON files containing chat conversations
- `fine_tuned_models/`: Storage for LoRA adapters and model checkpoints
- `preferences/`: User preferences and settings
- `training_data/`: Processed data ready for model training
- `training_logs/`: Detailed training process logs
- `vector_store/`: FAISS indexes for semantic search
## π οΈ Setup
1. Clone the repository:
```bash
git clone https://github.com/PtOlga/status-law-gbot.git
cd status-law-gbot
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Set up environment variables:
```bash
cp .env.example .env
# Edit .env with your configuration, including HUGGINGFACE_TOKEN
```
4. Run the application:
```bash
python app.py
```
## π§ Model Fine-tuning
To fine-tune the model on your chat history:
```python
from src.training.fine_tuner import finetune_from_chat_history
success, message = finetune_from_chat_history(epochs=3)
print(message)
```
The fine-tuning process uses LoRA (Low-Rank Adaptation) for efficient training with minimal resource requirements.
## π Model Switching
The application supports multiple models with automatic fallback:
- Zephyr 7B: Enhanced performance and response quality
- TinyLlama 1.1B Chat: Lightweight model for resource-constrained environments
- Neural Mistral 7B: Superior reasoning and instruction following capabilities
- Mixtral 8x7B: Advanced mixture-of-experts architecture
Models can be switched dynamically through the interface or programmatically:
```python
from src.training.model_manager import switch_to_model
switch_to_model("zephyr-7b") # or "tinyllama-1.1b", "neural-mistral-7b", "mixtral-8x7b"
```
## π Knowledge Base Management
The application provides a flexible interface for managing knowledge sources:
1. **Source Management**:
- View and edit the list of source URLs
- Enable/disable specific sources
- Add custom URLs for information extraction
- Monitor source status and availability
2. **Update Operations**:
- **Update Knowledge Base**: Incrementally add new information while preserving existing knowledge
- **Rebuild Knowledge Base**: Completely recreate the knowledge base using only selected sources
- Real-time operation status tracking
- Automatic backup of previous versions
3. **Usage**:
```python
# Add new URL to knowledge base
sources_df.append({"URL": "https://example.com", "Include": True, "Status": "Ready"})
# Update knowledge base with selected sources
update_kb_with_selected(sources_df)
# Rebuild knowledge base from scratch
rebuild_kb_with_selected(sources_df)
```
All changes to the knowledge base are automatically synchronized with the Hugging Face dataset, ensuring data persistence and version control.
## π Related Links
- [Status Law Website](https://status.law)
- [Status Law Assistant on Hugging Face](https://huggingface.co/spaces/Rulga/status-law-gbot)
## π License
Public repository for Status Law Assistant.
|