Spaces:
Running
Running
Refine README.md for clarity and accuracy in feature descriptions, enhance multilingual support details, and update project structure information.
Browse files
README.md
CHANGED
|
@@ -15,41 +15,45 @@ An intelligent chatbot based on Hugging Face and LangChain for legal consultatio
|
|
| 15 |
|
| 16 |
## π Description
|
| 17 |
|
| 18 |
-
Status Law Assistant is a smart chatbot that answers user questions about Status Law company's legal services. The bot uses RAG (Retrieval-Augmented Generation) technology to find relevant information in a knowledge base created from the official website content
|
| 19 |
|
| 20 |
-
## β¨ Features
|
| 21 |
|
| 22 |
- Automatic creation and updating of knowledge base from status.law website content
|
| 23 |
-
-
|
| 24 |
- Context-aware response generation
|
| 25 |
-
-
|
| 26 |
-
-
|
| 27 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
- Fine-tuning capabilities based on chat history
|
| 29 |
- Multiple model support:
|
| 30 |
-
- Llama 2 7B Chat:
|
| 31 |
-
- Zephyr 7B:
|
| 32 |
-
- Mistral 7B Instruct v0.2:
|
| 33 |
-
- XGLM 7.5B: Specialized
|
| 34 |
|
| 35 |
## π Technologies
|
| 36 |
|
| 37 |
-
- **LangChain**:
|
| 38 |
-
- **Hugging Face**:
|
| 39 |
-
- **FAISS**:
|
| 40 |
-
- **Gradio**:
|
| 41 |
-
- **BeautifulSoup**:
|
| 42 |
-
- **PEFT**:
|
| 43 |
-
- **SentencePiece**:
|
| 44 |
|
| 45 |
## ποΈ Project Structure
|
| 46 |
|
| 47 |
```
|
| 48 |
status-law-gbot/
|
| 49 |
-
βββ app.py # Main application file
|
| 50 |
βββ requirements.txt # Project dependencies
|
| 51 |
βββ config/ # Configuration files
|
| 52 |
-
β βββ settings.py # Application
|
| 53 |
β βββ constants.py # Constants and default values
|
| 54 |
βββ src/ # Source code
|
| 55 |
β βββ analytics/ # Analytics module
|
|
@@ -57,26 +61,24 @@ status-law-gbot/
|
|
| 57 |
β βββ knowledge_base/ # Knowledge base management
|
| 58 |
β β βββ loader.py
|
| 59 |
β β βββ vector_store.py
|
| 60 |
-
β
|
| 61 |
-
β
|
| 62 |
-
β
|
| 63 |
-
β βββ models/ # Model storage
|
| 64 |
-
β βββ fine_tuned/ # Fine-tuned model storage
|
| 65 |
-
βββ web/ # Web interface components
|
| 66 |
-
β βββ training_interface.py
|
| 67 |
βββ data/ # Data storage
|
| 68 |
βββ vector_store/ # FAISS vector storage
|
| 69 |
β βββ index.faiss
|
| 70 |
β βββ index.pkl
|
| 71 |
-
|
| 72 |
-
|
|
|
|
|
|
|
| 73 |
```
|
| 74 |
|
| 75 |
## πΎ Data Storage
|
| 76 |
|
| 77 |
### Vector Store
|
| 78 |
- `data/vector_store/index.faiss`: FAISS vector store for document embeddings
|
| 79 |
-
- `data/vector_store/index.pkl`: Metadata and configuration for
|
| 80 |
|
| 81 |
### Chat History
|
| 82 |
- `data/chat_history/logs.json`: JSON file containing chat history and metadata
|
|
@@ -85,21 +87,11 @@ status-law-gbot/
|
|
| 85 |
- `src/models/fine_tuned/`: Directory for storing fine-tuned models
|
| 86 |
- `src/models/registry.json`: Model registry and configuration
|
| 87 |
|
| 88 |
-
## π Usage
|
| 89 |
-
|
| 90 |
-
The Status Law Assistant chatbot uses this structure to:
|
| 91 |
-
1. Store and retrieve document embeddings for context-aware responses
|
| 92 |
-
2. Maintain chat history for conversation continuity
|
| 93 |
-
3. Track user interactions and improve response quality
|
| 94 |
-
4. Fine-tune models based on conversation history
|
| 95 |
-
5. Provide automatic model fallback in case of API errors
|
| 96 |
-
6. Support multiple language models with easy switching
|
| 97 |
-
|
| 98 |
## π οΈ Setup
|
| 99 |
|
| 100 |
1. Clone the repository:
|
| 101 |
```bash
|
| 102 |
-
git clone https://github.com/
|
| 103 |
cd status-law-gbot
|
| 104 |
```
|
| 105 |
|
|
@@ -136,23 +128,24 @@ The fine-tuning process uses LoRA (Low-Rank Adaptation) for efficient training w
|
|
| 136 |
|
| 137 |
The application supports multiple models with automatic fallback:
|
| 138 |
|
| 139 |
-
- Llama 2 7B Chat (default)
|
| 140 |
-
- Zephyr 7B
|
| 141 |
-
-
|
|
|
|
| 142 |
|
| 143 |
Models can be switched dynamically through the interface or programmatically:
|
| 144 |
|
| 145 |
```python
|
| 146 |
from src.training.model_manager import switch_to_model
|
| 147 |
|
| 148 |
-
switch_to_model("llama-7b") # or "zephyr-7b"
|
| 149 |
```
|
| 150 |
|
| 151 |
## π Related Links
|
| 152 |
|
| 153 |
- [Status Law Website](https://status.law)
|
| 154 |
-
- [Status Law Assistant on Hugging Face](https://huggingface.co/spaces/Rulga/status-law-
|
| 155 |
|
| 156 |
## π License
|
| 157 |
|
| 158 |
-
|
|
|
|
| 15 |
|
| 16 |
## π Description
|
| 17 |
|
| 18 |
+
Status Law Assistant is a smart chatbot that answers user questions about Status Law company's legal services. The bot uses RAG (Retrieval-Augmented Generation) technology to find relevant information in a knowledge base created from the official website content.
|
| 19 |
|
| 20 |
+
## β¨ Key Features
|
| 21 |
|
| 22 |
- Automatic creation and updating of knowledge base from status.law website content
|
| 23 |
+
- Intelligent information retrieval for query responses
|
| 24 |
- Context-aware response generation
|
| 25 |
+
- Advanced multilingual support:
|
| 26 |
+
- Automatic language detection
|
| 27 |
+
- Native language response generation
|
| 28 |
+
- Built-in translation system with fallback mechanism
|
| 29 |
+
- Support for 30+ languages
|
| 30 |
+
- Customizable text generation parameters
|
| 31 |
+
- Model switching system with automatic fallback
|
| 32 |
- Fine-tuning capabilities based on chat history
|
| 33 |
- Multiple model support:
|
| 34 |
+
- Llama 2 7B Chat (primary): Optimized for dialogues
|
| 35 |
+
- Zephyr 7B: Enhanced performance and response quality
|
| 36 |
+
- Mistral 7B Instruct v0.2: Superior multilingual capabilities
|
| 37 |
+
- XGLM 7.5B: Specialized cross-lingual generation model (requires paid API access)
|
| 38 |
|
| 39 |
## π Technologies
|
| 40 |
|
| 41 |
+
- **LangChain**: Query processing chains and knowledge base management
|
| 42 |
+
- **Hugging Face**: Language model access and hosting
|
| 43 |
+
- **FAISS**: Efficient vector search
|
| 44 |
+
- **Gradio**: User interface creation
|
| 45 |
+
- **BeautifulSoup**: Web page information extraction
|
| 46 |
+
- **PEFT**: Efficient fine-tuning using LoRA
|
| 47 |
+
- **SentencePiece**: Tokenization
|
| 48 |
|
| 49 |
## ποΈ Project Structure
|
| 50 |
|
| 51 |
```
|
| 52 |
status-law-gbot/
|
| 53 |
+
βββ app.py # Main application file
|
| 54 |
βββ requirements.txt # Project dependencies
|
| 55 |
βββ config/ # Configuration files
|
| 56 |
+
β βββ settings.py # Application and model settings
|
| 57 |
β βββ constants.py # Constants and default values
|
| 58 |
βββ src/ # Source code
|
| 59 |
β βββ analytics/ # Analytics module
|
|
|
|
| 61 |
β βββ knowledge_base/ # Knowledge base management
|
| 62 |
β β βββ loader.py
|
| 63 |
β β βββ vector_store.py
|
| 64 |
+
β βββ training/ # Training module
|
| 65 |
+
β βββ fine_tuner.py
|
| 66 |
+
β βββ model_manager.py
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
βββ data/ # Data storage
|
| 68 |
βββ vector_store/ # FAISS vector storage
|
| 69 |
β βββ index.faiss
|
| 70 |
β βββ index.pkl
|
| 71 |
+
βββ chat_history/ # Conversation logs
|
| 72 |
+
β βββ logs.json
|
| 73 |
+
βββ fine_tuned_models/ # Fine-tuned model storage
|
| 74 |
+
βββ model_registry.json
|
| 75 |
```
|
| 76 |
|
| 77 |
## πΎ Data Storage
|
| 78 |
|
| 79 |
### Vector Store
|
| 80 |
- `data/vector_store/index.faiss`: FAISS vector store for document embeddings
|
| 81 |
+
- `data/vector_store/index.pkl`: Metadata and configuration for vector store
|
| 82 |
|
| 83 |
### Chat History
|
| 84 |
- `data/chat_history/logs.json`: JSON file containing chat history and metadata
|
|
|
|
| 87 |
- `src/models/fine_tuned/`: Directory for storing fine-tuned models
|
| 88 |
- `src/models/registry.json`: Model registry and configuration
|
| 89 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
## π οΈ Setup
|
| 91 |
|
| 92 |
1. Clone the repository:
|
| 93 |
```bash
|
| 94 |
+
git clone https://github.com/PtOlga/status-law-gbot.git
|
| 95 |
cd status-law-gbot
|
| 96 |
```
|
| 97 |
|
|
|
|
| 128 |
|
| 129 |
The application supports multiple models with automatic fallback:
|
| 130 |
|
| 131 |
+
- Llama 2 7B Chat (default): Optimized for dialogues
|
| 132 |
+
- Zephyr 7B: Enhanced performance and response quality
|
| 133 |
+
- Mistral 7B Instruct v0.2: Superior multilingual capabilities
|
| 134 |
+
- XGLM 7.5B: Specialized cross-lingual generation model (requires paid API access)
|
| 135 |
|
| 136 |
Models can be switched dynamically through the interface or programmatically:
|
| 137 |
|
| 138 |
```python
|
| 139 |
from src.training.model_manager import switch_to_model
|
| 140 |
|
| 141 |
+
switch_to_model("llama-7b") # or "zephyr-7b", "mistral-7b", "xglm-7b"
|
| 142 |
```
|
| 143 |
|
| 144 |
## π Related Links
|
| 145 |
|
| 146 |
- [Status Law Website](https://status.law)
|
| 147 |
+
- [Status Law Assistant on Hugging Face](https://huggingface.co/spaces/Rulga/status-law-gbot)
|
| 148 |
|
| 149 |
## π License
|
| 150 |
|
| 151 |
+
Public repository for Status Law Assistant.
|