Spaces:
Sleeping
Sleeping
| title: Business_Chatbot | |
| app_file: run_gui.py | |
| sdk: gradio | |
| sdk_version: 5.34.2 | |
| # LLM Chatbot with SQL Database and Vector Store | |
| A research-grade chatbot system that processes user queries to extract entities, store transactions in SQL database, and maintain semantic search capabilities through vector storage. | |
| ## Features | |
| - **Entity Extraction**: Automatically extracts products, quantities, suppliers, customers, and prices from natural language | |
| - **Interactive Transaction Completion**: LLM-powered clarification for missing information | |
| - **SQL Database**: Stores structured sales and purchase data with relationships | |
| - **Natural Language to SQL**: Uses OpenAI GPT to convert plain English queries to SQL | |
| - **RAG (Retrieval-Augmented Generation)**: Intelligent responses using LLM with retrieved context | |
| - **Vector Store**: Enables semantic search of events and transactions | |
| - **Query Validation**: Ensures generated SQL queries are safe and valid | |
| ## Architecture | |
| ``` | |
| User Input: "Add a purchase of 20 USB drives from TechMart at β¬5 each" | |
| β | |
| 1. Entity Extraction β product: USB drives, quantity: 20, supplier: TechMart, unit price: β¬5 | |
| β | |
| 2. SQL Generation β INSERT INTO purchases (supplier_id, product_id, quantity, unit_price, total_cost) | |
| β | |
| 3. Vector Storage β Embed and store semantic summary of the event | |
| ``` | |
| ## Setup | |
| 1. Install dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 2. Download spaCy model: | |
| ```bash | |
| python -m spacy download en_core_web_sm | |
| ``` | |
| 3. Set up OpenAI API key: | |
| ```bash | |
| export OPENAI_API_KEY='your-openai-api-key-here' | |
| ``` | |
| 4. Run the chatbot: | |
| ```bash | |
| python main.py | |
| ``` | |
| 5. Test the system: | |
| ```bash | |
| python test_chatbot.py | |
| python test_nl_search.py # Test natural language search | |
| python test_intent_classifier.py # Test intent classification | |
| python test_rag_search.py # Test RAG functionality | |
| python test_interactive_transactions.py # Test interactive transaction completion | |
| ``` | |
| ## Usage Examples | |
| ### Adding Transactions (Interactive) | |
| - `"Add a purchase of 20 USB drives from TechMart at β¬5 each"` (Complete) | |
| - `"I bought some laptops"` (Will ask for: quantity, supplier, price) | |
| - `"Sold items to a customer"` (Will ask for: product, quantity, customer, price) | |
| - User can respond with specific details or say "N/A" for optional fields | |
| ### Querying Data (Natural Language to SQL) | |
| - `"How many USB drives did we purchase?"` | |
| - `"What's the total value of all purchases?"` | |
| - `"Show me all sales to John Smith"` | |
| - `"Which suppliers have we bought from?"` | |
| - `"What's our total spending on electronics?"` | |
| - `"Show me the most expensive purchases"` | |
| ### Semantic Search (RAG-powered) | |
| - `"When is my meeting with George?"` | |
| - `"What do we know about TechMart as a supplier?"` | |
| - `"Tell me about recent meetings and discussions"` | |
| - `"Show me customer feedback and satisfaction information"` | |
| ### General Information | |
| - `"Meeting with new supplier scheduled for next week"` | |
| - `"Important: Check inventory levels before next order"` | |
| ## Database Schema | |
| - **suppliers**: Company information | |
| - **customers**: Customer details | |
| - **products**: Product catalog | |
| - **purchases**: Purchase transactions | |
| - **sales**: Sales transactions | |
| ## Vector Store | |
| Uses ChromaDB with sentence transformers for semantic similarity search of: | |
| - Transaction summaries | |
| - General business events | |
| - Meeting notes and reminders | |
| ## Files Structure | |
| ``` | |
| βββ src/ | |
| β βββ models.py # Data models and schemas | |
| β βββ entity_extractor.py # NLP entity extraction | |
| β βββ database_manager.py # SQL database operations | |
| β βββ vector_store.py # Semantic search functionality | |
| β βββ nl_to_sql.py # OpenAI-powered natural language to SQL | |
| β βββ intent_classifier.py # OpenAI-powered intent classification | |
| β βββ rag_handler.py # RAG (Retrieval-Augmented Generation) | |
| β βββ transaction_clarifier.py # Interactive transaction completion | |
| β βββ chatbot.py # Main chatbot logic | |
| βββ database/ | |
| β βββ schema.sql # Database schema | |
| βββ main.py # Interactive chatbot interface | |
| βββ test_chatbot.py # Test suite | |
| βββ test_nl_search.py # Natural language search tests | |
| βββ test_intent_classifier.py # Intent classification tests | |
| βββ test_rag_search.py # RAG functionality tests | |
| βββ test_interactive_transactions.py # Interactive transaction tests | |
| βββ .env.example # Environment variables template | |
| βββ requirements.txt # Python dependencies | |
| ``` | |
| ## Research Applications | |
| This system demonstrates: | |
| - Multi-modal data storage (structured + vector) | |
| - LLM-powered natural language to SQL conversion | |
| - RAG (Retrieval-Augmented Generation) for intelligent responses | |
| - Interactive transaction completion with missing information handling | |
| - OpenAI-based intent classification | |
| - Multi-turn conversation state management | |
| - Semantic similarity search with embeddings | |
| - Named entity recognition and extraction | |
| - Query validation and SQL injection prevention | |
| - Conversational business process automation | |
| ## Future Works | |
| ### Intent Classification Improvements | |
| The current system uses OpenAI API for intent classification, which provides excellent accuracy but has some limitations: | |
| **Current Limitations:** | |
| - Requires internet connectivity and API calls for each message | |
| - Dependent on OpenAI service availability and costs | |
| - May have latency for real-time applications | |
| - Limited customization for domain-specific intents | |
| **Potential Improvements:** | |
| 1. **Fine-tuned Classification Models** | |
| - Train a smaller, specialized model on business transaction data | |
| - Use frameworks like Hugging Face Transformers with custom datasets | |
| - Deploy locally for faster inference and offline capability | |
| - Examples: DistilBERT, RoBERTa fine-tuned on business intent data | |
| 2. **Local LLM Integration** | |
| - Replace OpenAI API with local models (Llama, Mistral, etc.) | |
| - Use frameworks like Ollama, LangChain, or vLLM for local deployment | |
| - Maintain privacy while reducing external dependencies | |
| - Cost-effective for high-volume applications | |
| 3. **Intent Embedding Approaches** | |
| - Create vector embeddings for known intent patterns | |
| - Use similarity search instead of generative classification | |
| - Combine with few-shot learning for new intent types | |
| - More efficient for simple intent detection scenarios | |
| 4. **Hybrid Approaches** | |
| - Combine rule-based filtering with LLM classification | |
| - Use confidence thresholds to decide when to query LLM | |
| - Cache common patterns to reduce API calls | |
| - Implement progressive enhancement from simple to complex classification | |
| 5. **Domain-Specific Enhancements** | |
| - Add business context and domain knowledge | |
| - Implement multi-intent detection for complex queries | |
| - Add conversation history context for better classification | |
| - Support for industry-specific terminology and patterns |