Spaces:

Rulga
/

status-law-gbot

Running

status-law-gbot / README.md

Refactor settings.py: Replace Phi-2 model configuration with Neural Mistral 7B, enhancing reasoning and instruction following capabilities

e6ceacc 8 months ago

preview code

raw

history blame contribute delete

6.58 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

metadata

title: Status Law Gbot
emoji: 💬
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.23.0
app_file: app.py
pinned: false

Status Law Assistant

An intelligent chatbot based on Hugging Face and LangChain for legal consultations using information from the Status Law company website.

📝 Description

Status Law Assistant is a smart chatbot that answers user questions about Status Law company's legal services. The bot uses RAG (Retrieval-Augmented Generation) technology to find relevant information in a knowledge base created from the official website content.

✨ Key Features

Knowledge Base Management:
- Dynamic URL management for knowledge base sources
- Ability to add custom URLs for information extraction
- Selective source inclusion/exclusion
- Two modes of knowledge base updates:
  - Update Mode: Adds new information while preserving existing knowledge
  - Rebuild Mode: Complete recreation of knowledge base from selected sources
- Real-time status tracking for knowledge base operations
- Automatic metadata management and versioning
Automatic creation and updating of knowledge base from status.law website content
Intelligent information retrieval for query responses
Context-aware response generation
Advanced multilingual support:
- Automatic language detection
- Native language response generation
- Built-in translation system with fallback mechanism
- Support for 30+ languages
Customizable text generation parameters
Model switching system with automatic fallback
Fine-tuning capabilities based on chat history
Multiple model support:
- Zephyr 7B: Enhanced performance and response quality
- TinyLlama 1.1B Chat: Lightweight model for resource-constrained environments
- Neural Mistral 7B: Superior reasoning and instruction following capabilities
- Mixtral 8x7B: Advanced mixture-of-experts architecture

🚀 Technologies

LangChain: Query processing chains and knowledge base management
Hugging Face: Language model access and hosting
FAISS: Efficient vector search
Gradio: User interface creation
BeautifulSoup: Web page information extraction
PEFT: Efficient fine-tuning using LoRA
SentencePiece: Tokenization

🏗️ Project Structure

status-law-gbot/
├── app.py                 # Main application file
├── requirements.txt       # Project dependencies
├── config/               # Configuration files
│   ├── settings.py       # Application and model settings
│   └── constants.py      # Constants and default values
├── src/                  # Source code
│   ├── analytics/        # Analytics module
│   │   └── chat_analyzer.py
│   ├── knowledge_base/   # Knowledge base management
│   │   ├── loader.py
│   │   └── vector_store.py
│   └── training/         # Training module
│       ├── fine_tuner.py
│       └── model_manager.py
└── dataset/             # HuggingFace dataset structure
    ├── annotations/     # Conversation annotations
    ├── chat_history/    # Chat logs and conversations
    ├── fine_tuned_models/ # Fine-tuned model storage
    ├── preferences/     # User preferences
    ├── training_data/   # Processed training data
    ├── training_logs/   # Training process logs
    └── vector_store/    # FAISS vector storage

💾 Data Storage

Dataset Organization

annotations/: Conversation quality metrics and annotations
chat_history/: JSON files containing chat conversations
fine_tuned_models/: Storage for LoRA adapters and model checkpoints
preferences/: User preferences and settings
training_data/: Processed data ready for model training
training_logs/: Detailed training process logs
vector_store/: FAISS indexes for semantic search

🛠️ Setup

Clone the repository:

git clone https://github.com/PtOlga/status-law-gbot.git
cd status-law-gbot

Install dependencies:

pip install -r requirements.txt

Set up environment variables:

cp .env.example .env
# Edit .env with your configuration, including HUGGINGFACE_TOKEN

Run the application:

python app.py

🔧 Model Fine-tuning

To fine-tune the model on your chat history:

from src.training.fine_tuner import finetune_from_chat_history

success, message = finetune_from_chat_history(epochs=3)
print(message)

The fine-tuning process uses LoRA (Low-Rank Adaptation) for efficient training with minimal resource requirements.

🔄 Model Switching

The application supports multiple models with automatic fallback:

Zephyr 7B: Enhanced performance and response quality
TinyLlama 1.1B Chat: Lightweight model for resource-constrained environments
Neural Mistral 7B: Superior reasoning and instruction following capabilities
Mixtral 8x7B: Advanced mixture-of-experts architecture

Models can be switched dynamically through the interface or programmatically:

from src.training.model_manager import switch_to_model

switch_to_model("zephyr-7b")  # or "tinyllama-1.1b", "neural-mistral-7b", "mixtral-8x7b"

🔄 Knowledge Base Management

The application provides a flexible interface for managing knowledge sources:

Source Management:
- View and edit the list of source URLs
- Enable/disable specific sources
- Add custom URLs for information extraction
- Monitor source status and availability
Update Operations:
- Update Knowledge Base: Incrementally add new information while preserving existing knowledge
- Rebuild Knowledge Base: Completely recreate the knowledge base using only selected sources
- Real-time operation status tracking
- Automatic backup of previous versions

Usage:

# Add new URL to knowledge base
sources_df.append({"URL": "https://example.com", "Include": True, "Status": "Ready"})

# Update knowledge base with selected sources
update_kb_with_selected(sources_df)

# Rebuild knowledge base from scratch
rebuild_kb_with_selected(sources_df)

All changes to the knowledge base are automatically synchronized with the Hugging Face dataset, ensuring data persistence and version control.

🔗 Related Links

📝 License

Public repository for Status Law Assistant.