Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.7.0
metadata
title: Business_Chatbot
app_file: run_gui.py
sdk: gradio
sdk_version: 5.34.2
LLM Chatbot with SQL Database and Vector Store
A research-grade chatbot system that processes user queries to extract entities, store transactions in SQL database, and maintain semantic search capabilities through vector storage.
Features
- Entity Extraction: Automatically extracts products, quantities, suppliers, customers, and prices from natural language
- Interactive Transaction Completion: LLM-powered clarification for missing information
- SQL Database: Stores structured sales and purchase data with relationships
- Natural Language to SQL: Uses OpenAI GPT to convert plain English queries to SQL
- RAG (Retrieval-Augmented Generation): Intelligent responses using LLM with retrieved context
- Vector Store: Enables semantic search of events and transactions
- Query Validation: Ensures generated SQL queries are safe and valid
Architecture
User Input: "Add a purchase of 20 USB drives from TechMart at β¬5 each"
β
1. Entity Extraction β product: USB drives, quantity: 20, supplier: TechMart, unit price: β¬5
β
2. SQL Generation β INSERT INTO purchases (supplier_id, product_id, quantity, unit_price, total_cost)
β
3. Vector Storage β Embed and store semantic summary of the event
Setup
- Install dependencies:
pip install -r requirements.txt
- Download spaCy model:
python -m spacy download en_core_web_sm
- Set up OpenAI API key:
export OPENAI_API_KEY='your-openai-api-key-here'
- Run the chatbot:
python main.py
- Test the system:
python test_chatbot.py
python test_nl_search.py # Test natural language search
python test_intent_classifier.py # Test intent classification
python test_rag_search.py # Test RAG functionality
python test_interactive_transactions.py # Test interactive transaction completion
Usage Examples
Adding Transactions (Interactive)
"Add a purchase of 20 USB drives from TechMart at β¬5 each"(Complete)"I bought some laptops"(Will ask for: quantity, supplier, price)"Sold items to a customer"(Will ask for: product, quantity, customer, price)- User can respond with specific details or say "N/A" for optional fields
Querying Data (Natural Language to SQL)
"How many USB drives did we purchase?""What's the total value of all purchases?""Show me all sales to John Smith""Which suppliers have we bought from?""What's our total spending on electronics?""Show me the most expensive purchases"
Semantic Search (RAG-powered)
"When is my meeting with George?""What do we know about TechMart as a supplier?""Tell me about recent meetings and discussions""Show me customer feedback and satisfaction information"
General Information
"Meeting with new supplier scheduled for next week""Important: Check inventory levels before next order"
Database Schema
- suppliers: Company information
- customers: Customer details
- products: Product catalog
- purchases: Purchase transactions
- sales: Sales transactions
Vector Store
Uses ChromaDB with sentence transformers for semantic similarity search of:
- Transaction summaries
- General business events
- Meeting notes and reminders
Files Structure
βββ src/
β βββ models.py # Data models and schemas
β βββ entity_extractor.py # NLP entity extraction
β βββ database_manager.py # SQL database operations
β βββ vector_store.py # Semantic search functionality
β βββ nl_to_sql.py # OpenAI-powered natural language to SQL
β βββ intent_classifier.py # OpenAI-powered intent classification
β βββ rag_handler.py # RAG (Retrieval-Augmented Generation)
β βββ transaction_clarifier.py # Interactive transaction completion
β βββ chatbot.py # Main chatbot logic
βββ database/
β βββ schema.sql # Database schema
βββ main.py # Interactive chatbot interface
βββ test_chatbot.py # Test suite
βββ test_nl_search.py # Natural language search tests
βββ test_intent_classifier.py # Intent classification tests
βββ test_rag_search.py # RAG functionality tests
βββ test_interactive_transactions.py # Interactive transaction tests
βββ .env.example # Environment variables template
βββ requirements.txt # Python dependencies
Research Applications
This system demonstrates:
- Multi-modal data storage (structured + vector)
- LLM-powered natural language to SQL conversion
- RAG (Retrieval-Augmented Generation) for intelligent responses
- Interactive transaction completion with missing information handling
- OpenAI-based intent classification
- Multi-turn conversation state management
- Semantic similarity search with embeddings
- Named entity recognition and extraction
- Query validation and SQL injection prevention
- Conversational business process automation
Future Works
Intent Classification Improvements
The current system uses OpenAI API for intent classification, which provides excellent accuracy but has some limitations:
Current Limitations:
- Requires internet connectivity and API calls for each message
- Dependent on OpenAI service availability and costs
- May have latency for real-time applications
- Limited customization for domain-specific intents
Potential Improvements:
Fine-tuned Classification Models
- Train a smaller, specialized model on business transaction data
- Use frameworks like Hugging Face Transformers with custom datasets
- Deploy locally for faster inference and offline capability
- Examples: DistilBERT, RoBERTa fine-tuned on business intent data
Local LLM Integration
- Replace OpenAI API with local models (Llama, Mistral, etc.)
- Use frameworks like Ollama, LangChain, or vLLM for local deployment
- Maintain privacy while reducing external dependencies
- Cost-effective for high-volume applications
Intent Embedding Approaches
- Create vector embeddings for known intent patterns
- Use similarity search instead of generative classification
- Combine with few-shot learning for new intent types
- More efficient for simple intent detection scenarios
Hybrid Approaches
- Combine rule-based filtering with LLM classification
- Use confidence thresholds to decide when to query LLM
- Cache common patterns to reduce API calls
- Implement progressive enhancement from simple to complex classification
Domain-Specific Enhancements
- Add business context and domain knowledge
- Implement multi-intent detection for complex queries
- Add conversation history context for better classification
- Support for industry-specific terminology and patterns