Spaces:
Running
Running
metadata
title: Status Law Gbot
emoji: π¬
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.23.0
app_file: app.py
pinned: false
Status Law Assistant
An intelligent chatbot based on Hugging Face and LangChain for legal consultations using information from the Status Law company website.
π Description
Status Law Assistant is a smart chatbot that answers user questions about Status Law company's legal services. The bot uses RAG (Retrieval-Augmented Generation) technology to find relevant information in a knowledge base created from the official website content.
β¨ Key Features
- Automatic creation and updating of knowledge base from status.law website content
- Intelligent information retrieval for query responses
- Context-aware response generation
- Advanced multilingual support:
- Automatic language detection
- Native language response generation
- Built-in translation system with fallback mechanism
- Support for 30+ languages
- Customizable text generation parameters
- Model switching system with automatic fallback
- Fine-tuning capabilities based on chat history
- Multiple model support:
- Llama 2 7B Chat (primary): Optimized for dialogues
- Zephyr 7B: Enhanced performance and response quality
- Mistral 7B Instruct v0.2: Superior multilingual capabilities
- XGLM 7.5B: Specialized cross-lingual generation model (requires paid API access)
π Technologies
- LangChain: Query processing chains and knowledge base management
- Hugging Face: Language model access and hosting
- FAISS: Efficient vector search
- Gradio: User interface creation
- BeautifulSoup: Web page information extraction
- PEFT: Efficient fine-tuning using LoRA
- SentencePiece: Tokenization
ποΈ Project Structure
status-law-gbot/
βββ app.py # Main application file
βββ requirements.txt # Project dependencies
βββ config/ # Configuration files
β βββ settings.py # Application and model settings
β βββ constants.py # Constants and default values
βββ src/ # Source code
β βββ analytics/ # Analytics module
β β βββ chat_analyzer.py
β βββ knowledge_base/ # Knowledge base management
β β βββ loader.py
β β βββ vector_store.py
β βββ training/ # Training module
β βββ fine_tuner.py
β βββ model_manager.py
βββ data/ # Data storage
βββ vector_store/ # FAISS vector storage
β βββ index.faiss
β βββ index.pkl
βββ chat_history/ # Conversation logs
β βββ logs.json
βββ fine_tuned_models/ # Fine-tuned model storage
βββ model_registry.json
πΎ Data Storage
Vector Store
data/vector_store/index.faiss: FAISS vector store for document embeddingsdata/vector_store/index.pkl: Metadata and configuration for vector store
Chat History
data/chat_history/logs.json: JSON file containing chat history and metadata
Models
src/models/fine_tuned/: Directory for storing fine-tuned modelssrc/models/registry.json: Model registry and configuration
π οΈ Setup
- Clone the repository:
git clone https://github.com/PtOlga/status-law-gbot.git
cd status-law-gbot
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables:
cp .env.example .env
# Edit .env with your configuration, including HUGGINGFACE_TOKEN
- Run the application:
python app.py
π§ Model Fine-tuning
To fine-tune the model on your chat history:
from src.training.fine_tuner import finetune_from_chat_history
success, message = finetune_from_chat_history(epochs=3)
print(message)
The fine-tuning process uses LoRA (Low-Rank Adaptation) for efficient training with minimal resource requirements.
π Model Switching
The application supports multiple models with automatic fallback:
- Llama 2 7B Chat (default): Optimized for dialogues
- Zephyr 7B: Enhanced performance and response quality
- Mistral 7B Instruct v0.2: Superior multilingual capabilities
- XGLM 7.5B: Specialized cross-lingual generation model (requires paid API access)
Models can be switched dynamically through the interface or programmatically:
from src.training.model_manager import switch_to_model
switch_to_model("llama-7b") # or "zephyr-7b", "mistral-7b", "xglm-7b"
π Related Links
π License
Public repository for Status Law Assistant.