Rulga commited on
Commit
9a1d867
Β·
1 Parent(s): 5757546

Refine README.md for clarity and accuracy in feature descriptions, enhance multilingual support details, and update project structure information.

Browse files
Files changed (1) hide show
  1. README.md +39 -46
README.md CHANGED
@@ -15,41 +15,45 @@ An intelligent chatbot based on Hugging Face and LangChain for legal consultatio
15
 
16
  ## πŸ“ Description
17
 
18
- Status Law Assistant is a smart chatbot that answers user questions about Status Law company's legal services. The bot uses RAG (Retrieval-Augmented Generation) technology to find relevant information in a knowledge base created from the official website content and generates responses using a language model.
19
 
20
- ## ✨ Features
21
 
22
  - Automatic creation and updating of knowledge base from status.law website content
23
- - Relevant information search for user queries
24
  - Context-aware response generation
25
- - Multi-language query support (responds in the language of the question)
26
- - Customizable text generation parameters (temperature, token count, etc.)
27
- - Model switching with fallback mechanism
 
 
 
 
28
  - Fine-tuning capabilities based on chat history
29
  - Multiple model support:
30
- - Llama 2 7B Chat: Meta's general-purpose model optimized for chat
31
- - Zephyr 7B: State-of-the-art model with strong performance
32
- - Mistral 7B Instruct v0.2: Advanced model with superior multilingual capabilities
33
- - XGLM 7.5B: Specialized model for cross-lingual generation with 30+ language support
34
 
35
  ## πŸš€ Technologies
36
 
37
- - **LangChain**: For query processing chains and knowledge base management
38
- - **Hugging Face**: For language model access and application hosting
39
- - **FAISS**: For efficient vector search
40
- - **Gradio**: For user interface creation
41
- - **BeautifulSoup**: For web page information extraction
42
- - **PEFT**: For efficient fine-tuning using LoRA
43
- - **SentencePiece**: For tokenization
44
 
45
  ## πŸ—οΈ Project Structure
46
 
47
  ```
48
  status-law-gbot/
49
- β”œβ”€β”€ app.py # Main application file with interface and request handling logic
50
  β”œβ”€β”€ requirements.txt # Project dependencies
51
  β”œβ”€β”€ config/ # Configuration files
52
- β”‚ β”œβ”€β”€ settings.py # Application settings and model configurations
53
  β”‚ └── constants.py # Constants and default values
54
  β”œβ”€β”€ src/ # Source code
55
  β”‚ β”œβ”€β”€ analytics/ # Analytics module
@@ -57,26 +61,24 @@ status-law-gbot/
57
  β”‚ β”œβ”€β”€ knowledge_base/ # Knowledge base management
58
  β”‚ β”‚ β”œβ”€β”€ loader.py
59
  β”‚ β”‚ └── vector_store.py
60
- β”‚ β”œβ”€β”€ training/ # Model training module
61
- β”‚ β”‚ β”œβ”€β”€ fine_tuner.py # LoRA fine-tuning implementation
62
- β”‚ β”‚ └── model_manager.py # Model switching and management
63
- β”‚ └── models/ # Model storage
64
- β”‚ └── fine_tuned/ # Fine-tuned model storage
65
- β”œβ”€β”€ web/ # Web interface components
66
- β”‚ └── training_interface.py
67
  └── data/ # Data storage
68
  β”œβ”€β”€ vector_store/ # FAISS vector storage
69
  β”‚ β”œβ”€β”€ index.faiss
70
  β”‚ └── index.pkl
71
- └── chat_history/ # Conversation logs
72
- └── logs.json
 
 
73
  ```
74
 
75
  ## πŸ’Ύ Data Storage
76
 
77
  ### Vector Store
78
  - `data/vector_store/index.faiss`: FAISS vector store for document embeddings
79
- - `data/vector_store/index.pkl`: Metadata and configuration for the vector store
80
 
81
  ### Chat History
82
  - `data/chat_history/logs.json`: JSON file containing chat history and metadata
@@ -85,21 +87,11 @@ status-law-gbot/
85
  - `src/models/fine_tuned/`: Directory for storing fine-tuned models
86
  - `src/models/registry.json`: Model registry and configuration
87
 
88
- ## πŸš€ Usage
89
-
90
- The Status Law Assistant chatbot uses this structure to:
91
- 1. Store and retrieve document embeddings for context-aware responses
92
- 2. Maintain chat history for conversation continuity
93
- 3. Track user interactions and improve response quality
94
- 4. Fine-tune models based on conversation history
95
- 5. Provide automatic model fallback in case of API errors
96
- 6. Support multiple language models with easy switching
97
-
98
  ## πŸ› οΈ Setup
99
 
100
  1. Clone the repository:
101
  ```bash
102
- git clone https://github.com/yourusername/status-law-gbot.git
103
  cd status-law-gbot
104
  ```
105
 
@@ -136,23 +128,24 @@ The fine-tuning process uses LoRA (Low-Rank Adaptation) for efficient training w
136
 
137
  The application supports multiple models with automatic fallback:
138
 
139
- - Llama 2 7B Chat (default)
140
- - Zephyr 7B
141
- - Custom fine-tuned versions
 
142
 
143
  Models can be switched dynamically through the interface or programmatically:
144
 
145
  ```python
146
  from src.training.model_manager import switch_to_model
147
 
148
- switch_to_model("llama-7b") # or "zephyr-7b"
149
  ```
150
 
151
  ## πŸ”— Related Links
152
 
153
  - [Status Law Website](https://status.law)
154
- - [Status Law Assistant on Hugging Face](https://huggingface.co/spaces/Rulga/status-law-assistant)
155
 
156
  ## πŸ“ License
157
 
158
- Private repository for Status Law Assistant usage only.
 
15
 
16
  ## πŸ“ Description
17
 
18
+ Status Law Assistant is a smart chatbot that answers user questions about Status Law company's legal services. The bot uses RAG (Retrieval-Augmented Generation) technology to find relevant information in a knowledge base created from the official website content.
19
 
20
+ ## ✨ Key Features
21
 
22
  - Automatic creation and updating of knowledge base from status.law website content
23
+ - Intelligent information retrieval for query responses
24
  - Context-aware response generation
25
+ - Advanced multilingual support:
26
+ - Automatic language detection
27
+ - Native language response generation
28
+ - Built-in translation system with fallback mechanism
29
+ - Support for 30+ languages
30
+ - Customizable text generation parameters
31
+ - Model switching system with automatic fallback
32
  - Fine-tuning capabilities based on chat history
33
  - Multiple model support:
34
+ - Llama 2 7B Chat (primary): Optimized for dialogues
35
+ - Zephyr 7B: Enhanced performance and response quality
36
+ - Mistral 7B Instruct v0.2: Superior multilingual capabilities
37
+ - XGLM 7.5B: Specialized cross-lingual generation model (requires paid API access)
38
 
39
  ## πŸš€ Technologies
40
 
41
+ - **LangChain**: Query processing chains and knowledge base management
42
+ - **Hugging Face**: Language model access and hosting
43
+ - **FAISS**: Efficient vector search
44
+ - **Gradio**: User interface creation
45
+ - **BeautifulSoup**: Web page information extraction
46
+ - **PEFT**: Efficient fine-tuning using LoRA
47
+ - **SentencePiece**: Tokenization
48
 
49
  ## πŸ—οΈ Project Structure
50
 
51
  ```
52
  status-law-gbot/
53
+ β”œβ”€β”€ app.py # Main application file
54
  β”œβ”€β”€ requirements.txt # Project dependencies
55
  β”œβ”€β”€ config/ # Configuration files
56
+ β”‚ β”œβ”€β”€ settings.py # Application and model settings
57
  β”‚ └── constants.py # Constants and default values
58
  β”œβ”€β”€ src/ # Source code
59
  β”‚ β”œβ”€β”€ analytics/ # Analytics module
 
61
  β”‚ β”œβ”€β”€ knowledge_base/ # Knowledge base management
62
  β”‚ β”‚ β”œβ”€β”€ loader.py
63
  β”‚ β”‚ └── vector_store.py
64
+ β”‚ └── training/ # Training module
65
+ β”‚ β”œβ”€β”€ fine_tuner.py
66
+ β”‚ └── model_manager.py
 
 
 
 
67
  └── data/ # Data storage
68
  β”œβ”€β”€ vector_store/ # FAISS vector storage
69
  β”‚ β”œβ”€β”€ index.faiss
70
  β”‚ └── index.pkl
71
+ β”œβ”€β”€ chat_history/ # Conversation logs
72
+ β”‚ └── logs.json
73
+ └── fine_tuned_models/ # Fine-tuned model storage
74
+ └── model_registry.json
75
  ```
76
 
77
  ## πŸ’Ύ Data Storage
78
 
79
  ### Vector Store
80
  - `data/vector_store/index.faiss`: FAISS vector store for document embeddings
81
+ - `data/vector_store/index.pkl`: Metadata and configuration for vector store
82
 
83
  ### Chat History
84
  - `data/chat_history/logs.json`: JSON file containing chat history and metadata
 
87
  - `src/models/fine_tuned/`: Directory for storing fine-tuned models
88
  - `src/models/registry.json`: Model registry and configuration
89
 
 
 
 
 
 
 
 
 
 
 
90
  ## πŸ› οΈ Setup
91
 
92
  1. Clone the repository:
93
  ```bash
94
+ git clone https://github.com/PtOlga/status-law-gbot.git
95
  cd status-law-gbot
96
  ```
97
 
 
128
 
129
  The application supports multiple models with automatic fallback:
130
 
131
+ - Llama 2 7B Chat (default): Optimized for dialogues
132
+ - Zephyr 7B: Enhanced performance and response quality
133
+ - Mistral 7B Instruct v0.2: Superior multilingual capabilities
134
+ - XGLM 7.5B: Specialized cross-lingual generation model (requires paid API access)
135
 
136
  Models can be switched dynamically through the interface or programmatically:
137
 
138
  ```python
139
  from src.training.model_manager import switch_to_model
140
 
141
+ switch_to_model("llama-7b") # or "zephyr-7b", "mistral-7b", "xglm-7b"
142
  ```
143
 
144
  ## πŸ”— Related Links
145
 
146
  - [Status Law Website](https://status.law)
147
+ - [Status Law Assistant on Hugging Face](https://huggingface.co/spaces/Rulga/status-law-gbot)
148
 
149
  ## πŸ“ License
150
 
151
+ Public repository for Status Law Assistant.