---
title: Business_Chatbot
app_file: run_gui.py
sdk: gradio
sdk_version: 5.34.2
---
# LLM Chatbot with SQL Database and Vector Store

A research-grade chatbot system that processes user queries to extract entities, store transactions in SQL database, and maintain semantic search capabilities through vector storage.

## Features

- **Entity Extraction**: Automatically extracts products, quantities, suppliers, customers, and prices from natural language
- **Interactive Transaction Completion**: LLM-powered clarification for missing information
- **SQL Database**: Stores structured sales and purchase data with relationships
- **Natural Language to SQL**: Uses OpenAI GPT to convert plain English queries to SQL
- **RAG (Retrieval-Augmented Generation)**: Intelligent responses using LLM with retrieved context
- **Vector Store**: Enables semantic search of events and transactions
- **Query Validation**: Ensures generated SQL queries are safe and valid

## Architecture

```
User Input: "Add a purchase of 20 USB drives from TechMart at €5 each"
    ↓
1. Entity Extraction → product: USB drives, quantity: 20, supplier: TechMart, unit price: €5
    ↓
2. SQL Generation → INSERT INTO purchases (supplier_id, product_id, quantity, unit_price, total_cost)
    ↓
3. Vector Storage → Embed and store semantic summary of the event
```

## Setup

1. Install dependencies:
```bash
pip install -r requirements.txt
```

2. Download spaCy model:
```bash
python -m spacy download en_core_web_sm
```

3. Set up OpenAI API key:
```bash
export OPENAI_API_KEY='your-openai-api-key-here'
```

4. Run the chatbot:
```bash
python main.py
```

5. Test the system:
```bash
python test_chatbot.py
python test_nl_search.py     # Test natural language search
python test_intent_classifier.py  # Test intent classification
python test_rag_search.py    # Test RAG functionality
python test_interactive_transactions.py  # Test interactive transaction completion
```

## Usage Examples

### Adding Transactions (Interactive)
- `"Add a purchase of 20 USB drives from TechMart at €5 each"` (Complete)
- `"I bought some laptops"` (Will ask for: quantity, supplier, price)
- `"Sold items to a customer"` (Will ask for: product, quantity, customer, price)
- User can respond with specific details or say "N/A" for optional fields

### Querying Data (Natural Language to SQL)
- `"How many USB drives did we purchase?"`
- `"What's the total value of all purchases?"`
- `"Show me all sales to John Smith"`
- `"Which suppliers have we bought from?"`
- `"What's our total spending on electronics?"`
- `"Show me the most expensive purchases"`

### Semantic Search (RAG-powered)
- `"When is my meeting with George?"`
- `"What do we know about TechMart as a supplier?"`
- `"Tell me about recent meetings and discussions"`
- `"Show me customer feedback and satisfaction information"`

### General Information
- `"Meeting with new supplier scheduled for next week"`
- `"Important: Check inventory levels before next order"`

## Database Schema

- **suppliers**: Company information
- **customers**: Customer details
- **products**: Product catalog
- **purchases**: Purchase transactions
- **sales**: Sales transactions

## Vector Store

Uses ChromaDB with sentence transformers for semantic similarity search of:
- Transaction summaries
- General business events
- Meeting notes and reminders

## Files Structure

```
├── src/
│   ├── models.py           # Data models and schemas
│   ├── entity_extractor.py # NLP entity extraction
│   ├── database_manager.py # SQL database operations
│   ├── vector_store.py     # Semantic search functionality
│   ├── nl_to_sql.py        # OpenAI-powered natural language to SQL
│   ├── intent_classifier.py # OpenAI-powered intent classification
│   ├── rag_handler.py      # RAG (Retrieval-Augmented Generation)
│   ├── transaction_clarifier.py # Interactive transaction completion
│   └── chatbot.py          # Main chatbot logic
├── database/
│   └── schema.sql          # Database schema
├── main.py                 # Interactive chatbot interface
├── test_chatbot.py         # Test suite
├── test_nl_search.py       # Natural language search tests
├── test_intent_classifier.py # Intent classification tests
├── test_rag_search.py      # RAG functionality tests
├── test_interactive_transactions.py # Interactive transaction tests
├── .env.example            # Environment variables template
└── requirements.txt        # Python dependencies
```

## Research Applications

This system demonstrates:
- Multi-modal data storage (structured + vector)
- LLM-powered natural language to SQL conversion
- RAG (Retrieval-Augmented Generation) for intelligent responses
- Interactive transaction completion with missing information handling
- OpenAI-based intent classification
- Multi-turn conversation state management
- Semantic similarity search with embeddings
- Named entity recognition and extraction
- Query validation and SQL injection prevention
- Conversational business process automation

## Future Works

### Intent Classification Improvements

The current system uses OpenAI API for intent classification, which provides excellent accuracy but has some limitations:

**Current Limitations:**
- Requires internet connectivity and API calls for each message
- Dependent on OpenAI service availability and costs
- May have latency for real-time applications
- Limited customization for domain-specific intents

**Potential Improvements:**

1. **Fine-tuned Classification Models**
   - Train a smaller, specialized model on business transaction data
   - Use frameworks like Hugging Face Transformers with custom datasets
   - Deploy locally for faster inference and offline capability
   - Examples: DistilBERT, RoBERTa fine-tuned on business intent data

2. **Local LLM Integration**
   - Replace OpenAI API with local models (Llama, Mistral, etc.)
   - Use frameworks like Ollama, LangChain, or vLLM for local deployment
   - Maintain privacy while reducing external dependencies
   - Cost-effective for high-volume applications

3. **Intent Embedding Approaches**
   - Create vector embeddings for known intent patterns
   - Use similarity search instead of generative classification
   - Combine with few-shot learning for new intent types
   - More efficient for simple intent detection scenarios

4. **Hybrid Approaches**
   - Combine rule-based filtering with LLM classification
   - Use confidence thresholds to decide when to query LLM
   - Cache common patterns to reduce API calls
   - Implement progressive enhancement from simple to complex classification

5. **Domain-Specific Enhancements**
   - Add business context and domain knowledge
   - Implement multi-intent detection for complex queries
   - Add conversation history context for better classification
   - Support for industry-specific terminology and patterns