Spaces:

Ancastal
/

Business_Chatbot

Sleeping

App Files Files Community

Business_Chatbot / README.md

Ancastal

Update README.md

9d1f647 verified 8 months ago

preview code

raw

history blame contribute delete

7 kB

A newer version of the Gradio SDK is available: 6.7.0

Upgrade

metadata

title: Business_Chatbot
app_file: run_gui.py
sdk: gradio
sdk_version: 5.34.2

LLM Chatbot with SQL Database and Vector Store

A research-grade chatbot system that processes user queries to extract entities, store transactions in SQL database, and maintain semantic search capabilities through vector storage.

Features

Entity Extraction: Automatically extracts products, quantities, suppliers, customers, and prices from natural language
Interactive Transaction Completion: LLM-powered clarification for missing information
SQL Database: Stores structured sales and purchase data with relationships
Natural Language to SQL: Uses OpenAI GPT to convert plain English queries to SQL
RAG (Retrieval-Augmented Generation): Intelligent responses using LLM with retrieved context
Vector Store: Enables semantic search of events and transactions
Query Validation: Ensures generated SQL queries are safe and valid

Architecture

User Input: "Add a purchase of 20 USB drives from TechMart at €5 each"
    ↓
1. Entity Extraction → product: USB drives, quantity: 20, supplier: TechMart, unit price: €5
    ↓
2. SQL Generation → INSERT INTO purchases (supplier_id, product_id, quantity, unit_price, total_cost)
    ↓
3. Vector Storage → Embed and store semantic summary of the event

Setup

Install dependencies:

pip install -r requirements.txt

Download spaCy model:

python -m spacy download en_core_web_sm

Set up OpenAI API key:

export OPENAI_API_KEY='your-openai-api-key-here'

Run the chatbot:

python main.py

Test the system:

python test_chatbot.py
python test_nl_search.py     # Test natural language search
python test_intent_classifier.py  # Test intent classification
python test_rag_search.py    # Test RAG functionality
python test_interactive_transactions.py  # Test interactive transaction completion

Usage Examples

Adding Transactions (Interactive)

"Add a purchase of 20 USB drives from TechMart at €5 each" (Complete)
"I bought some laptops" (Will ask for: quantity, supplier, price)
"Sold items to a customer" (Will ask for: product, quantity, customer, price)
User can respond with specific details or say "N/A" for optional fields

Querying Data (Natural Language to SQL)

"How many USB drives did we purchase?"
"What's the total value of all purchases?"
"Show me all sales to John Smith"
"Which suppliers have we bought from?"
"What's our total spending on electronics?"
"Show me the most expensive purchases"

Semantic Search (RAG-powered)

"When is my meeting with George?"
"What do we know about TechMart as a supplier?"
"Tell me about recent meetings and discussions"
"Show me customer feedback and satisfaction information"

General Information

"Meeting with new supplier scheduled for next week"
"Important: Check inventory levels before next order"

Database Schema

suppliers: Company information
customers: Customer details
products: Product catalog
purchases: Purchase transactions
sales: Sales transactions

Vector Store

Uses ChromaDB with sentence transformers for semantic similarity search of:

Transaction summaries
General business events
Meeting notes and reminders

Files Structure

├── src/
│   ├── models.py           # Data models and schemas
│   ├── entity_extractor.py # NLP entity extraction
│   ├── database_manager.py # SQL database operations
│   ├── vector_store.py     # Semantic search functionality
│   ├── nl_to_sql.py        # OpenAI-powered natural language to SQL
│   ├── intent_classifier.py # OpenAI-powered intent classification
│   ├── rag_handler.py      # RAG (Retrieval-Augmented Generation)
│   ├── transaction_clarifier.py # Interactive transaction completion
│   └── chatbot.py          # Main chatbot logic
├── database/
│   └── schema.sql          # Database schema
├── main.py                 # Interactive chatbot interface
├── test_chatbot.py         # Test suite
├── test_nl_search.py       # Natural language search tests
├── test_intent_classifier.py # Intent classification tests
├── test_rag_search.py      # RAG functionality tests
├── test_interactive_transactions.py # Interactive transaction tests
├── .env.example            # Environment variables template
└── requirements.txt        # Python dependencies

Research Applications

This system demonstrates:

Multi-modal data storage (structured + vector)
LLM-powered natural language to SQL conversion
RAG (Retrieval-Augmented Generation) for intelligent responses
Interactive transaction completion with missing information handling
OpenAI-based intent classification
Multi-turn conversation state management
Semantic similarity search with embeddings
Named entity recognition and extraction
Query validation and SQL injection prevention
Conversational business process automation

Future Works

Intent Classification Improvements

The current system uses OpenAI API for intent classification, which provides excellent accuracy but has some limitations:

Current Limitations:

Requires internet connectivity and API calls for each message
Dependent on OpenAI service availability and costs
May have latency for real-time applications
Limited customization for domain-specific intents

Potential Improvements:

Fine-tuned Classification Models
- Train a smaller, specialized model on business transaction data
- Use frameworks like Hugging Face Transformers with custom datasets
- Deploy locally for faster inference and offline capability
- Examples: DistilBERT, RoBERTa fine-tuned on business intent data
Local LLM Integration
- Replace OpenAI API with local models (Llama, Mistral, etc.)
- Use frameworks like Ollama, LangChain, or vLLM for local deployment
- Maintain privacy while reducing external dependencies
- Cost-effective for high-volume applications
Intent Embedding Approaches
- Create vector embeddings for known intent patterns
- Use similarity search instead of generative classification
- Combine with few-shot learning for new intent types
- More efficient for simple intent detection scenarios
Hybrid Approaches
- Combine rule-based filtering with LLM classification
- Use confidence thresholds to decide when to query LLM
- Cache common patterns to reduce API calls
- Implement progressive enhancement from simple to complex classification
Domain-Specific Enhancements
- Add business context and domain knowledge
- Implement multi-intent detection for complex queries
- Add conversation history context for better classification
- Support for industry-specific terminology and patterns