Spaces:

jessejohnson
/

plg4-dev-server

Paused

File size: 16,454 Bytes

c59d808

# Recipe Recommendation Chatbot - Backend API

Backend for AI-powered recipe recommendation system built with FastAPI, featuring RAG (Retrieval-Augmented Generation) capabilities, conversational memory, and multi-provider LLM support.

## 🚀 Quick Start

### Prerequisites
- Python 3.9+
- pip or poetry
- API keys for your chosen LLM provider (OpenAI, Google, or HuggingFace)

### Installation

1. **Clone and navigate to backend**
   ```bash
   git clone <repository-url>
   cd PLG4-Recipe-Recommendation-Chatbot/backend
   ```

2. **Install dependencies**
   ```bash
   pip install -r requirements.txt
   ```
   > 💡 **Note**: Some packages are commented out by default to keep the installation lightweight:
   > - **HuggingFace dependencies** (`transformers`, `accelerate`, `sentence-transformers`) - Uncomment if using HuggingFace models
   > - **sentence-transformers** (~800MB) - Uncomment for HuggingFace embeddings

3. **Configure environment**
   ```bash
   cp .env.example .env
   # Edit .env with your API keys and configuration
   ```

4. **Run the server**
   ```bash
   # Development mode with auto-reload
   uvicorn app:app --reload --host 127.0.0.1 --port 8080
   
   # Or production mode
   uvicorn app:app --host 127.0.0.1 --port 8080
   ```

5. **Test the API**
   ```bash
   curl http://localhost:8080/health
   ```

6. **HuggingFace Spaces deployment**
   ```
   sh deploy-to-hf.sh <remote>
   ``` 
   where <remote> points to the HuggingFace Spaces repository

## 📁 Project Structure

```
backend/
├── app.py                 # FastAPI application entry point
├── requirements.txt       # Python dependencies
├── .env.example          # Environment configuration template
├── .gitignore            # Git ignore rules
│
├── config/               # Configuration modules
│   ├── __init__.py
│   ├── settings.py       # Application settings
│   ├── database.py       # Database configuration
│   └── logging_config.py # Logging setup
│
├── services/             # Core business logic
│   ├── __init__.py
│   ├── llm_service.py    # LLM and RAG pipeline
│   └── vector_store.py   # Vector database management
│
├── data/                 # Data storage
│   ├── recipes/          # Recipe JSON files
│   │   └── recipe.json   # Sample recipe data
│   └── chromadb_persist/ # ChromaDB persistence
│
├── logs/                 # Application logs
│   └── recipe_bot.log    # Main log file
│
├── docs/                 # Documentation
│   ├── model-selection-guide.md      # 🎯 Complete model selection & comparison guide
│   ├── model-quick-reference.md      # ⚡ Quick model switching commands  
│   ├── chromadb_refresh.md           # ChromaDB refresh guide
│   ├── opensource-llm-configuration.md  # Open source LLM setup guide
│   ├── logging_guide.md              # Logging documentation
│   ├── optimal_recipes_structure.md  # Recipe data structure guide
│   ├── sanitization_guide.md         # Input sanitization guide
│   └── unified-provider-configuration.md  # Unified provider approach guide
│
└── utils/                # Utility functions
    └── __init__.py
```

## ⚙️ Configuration

### Environment Variables

Copy `.env.example` to `.env` and configure the following:

> 🎯 **Unified Provider Approach**: The `LLM_PROVIDER` setting controls both LLM and embedding models, preventing configuration mismatches. See [`docs/unified-provider-configuration.md`](docs/unified-provider-configuration.md) for details.

#### **Server Configuration**
```bash
PORT=8000                 # Server port
HOST=0.0.0.0             # Server host
ENVIRONMENT=development   # Environment mode
DEBUG=true               # Debug mode
```

#### **Provider Configuration**
Choose one provider for both LLM and embeddings (unified approach):

> 🎯 **NEW: Complete Model Selection Guide**: For detailed comparisons of all models (OpenAI, Google, Anthropic, Ollama, HuggingFace) including latest 2025 models, performance metrics, costs, and scenario-based recommendations, see [`docs/model-selection-guide.md`](docs/model-selection-guide.md)

> ⚡ **Quick Reference**: For one-command model switching, see [`docs/model-quick-reference.md`](docs/model-quick-reference.md)

**OpenAI (Best Value & Latest Models)**
```bash
LLM_PROVIDER=openai
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-5-nano             # 🎯 BEST VALUE: $1/month for 30K queries - Modern GPT-5 at nano price
# Alternatives:
# - gpt-4o-mini                     # Proven choice: $4/month for 30K queries
# - gpt-5                           # Premium: $20/month unlimited (Plus plan)
OPENAI_EMBEDDING_MODEL=text-embedding-3-small # Used automatically
```

**Google Gemini (Best Free Tier)**
```bash
LLM_PROVIDER=google
GOOGLE_API_KEY=your_google_api_key_here
GOOGLE_MODEL=gemini-2.5-flash       # 🎯 RECOMMENDED: Excellent free tier, then $2/month
# Alternatives:
# - gemini-2.0-flash-lite           # Ultra budget: $0.90/month for 30K queries
# - gemini-2.5-pro                  # Premium: $25/month for 30K queries
GOOGLE_EMBEDDING_MODEL=models/embedding-001 # Used automatically
```

**Anthropic Claude (Best Quality-to-Cost)**
```bash
LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=your_anthropic_api_key_here
ANTHROPIC_MODEL=claude-3-5-haiku-20241022  # 🎯 BUDGET WINNER: $4/month for 30K queries
# Alternatives:
# - claude-3-5-sonnet-20241022      # Production standard: $45/month for 30K queries
# - claude-3-opus-20240229          # Premium quality: $225/month for 30K queries
ANTHROPIC_EMBEDDING_MODEL=voyage-large-2 # Used automatically
```

**Ollama (Best for Privacy/Self-Hosting)**
```bash
LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.1:8b            # 🎯 YOUR CURRENT: 4.7GB download, 8GB RAM, excellent balance
# New alternatives: 
# - deepseek-r1:7b                  # Breakthrough reasoning: 4.7GB download, O1-level performance
# - codeqwen:7b                     # Structured data expert: 4.2GB download, excellent for recipes
# - gemma3:4b                       # Resource-efficient: 3.3GB download, 6GB RAM
# - mistral-nemo:12b                # Balanced performance: 7GB download, 12GB RAM
OLLAMA_EMBEDDING_MODEL=nomic-embed-text # Used automatically
```

**HuggingFace (Downloadable Models Only - APIs Unreliable)**
```bash
LLM_PROVIDER=ollama  # Use Ollama to run HuggingFace models locally
OLLAMA_MODEL=codeqwen:7b             # 🎯 RECOMMENDED: Download HF models via Ollama for reliability
# Other downloadable options:
# - mistral-nemo:12b                # Mistral's balanced model
# - nous-hermes2:10.7b              # Fine-tuned for instruction following
# - openhermes2.5-mistral:7b        # Community favorite
OLLAMA_EMBEDDING_MODEL=nomic-embed-text # Used automatically
```
> ⚠️ **Important Change**: HuggingFace APIs have proven unreliable for production. We now recommend downloading HuggingFace models locally via Ollama for consistent performance.
> ⚠️ **HuggingFace Update**: HuggingFace dependencies are no longer required as we recommend using downloadable models via Ollama instead of unreliable APIs. For local HuggingFace models, use Ollama which provides better reliability and performance.

> 📖 **Local Model Setup**: See [`docs/opensource-llm-configuration.md`](docs/opensource-llm-configuration.md) for GPU setup, model selection, and performance optimization with Ollama.

> 💡 **Unified Provider**: The `LLM_PROVIDER` setting automatically configures both the LLM and embedding models, ensuring consistency and preventing mismatched configurations.

#### **Vector Store Configuration**
Choose between ChromaDB (local) or MongoDB Atlas:

**ChromaDB (Default)**
```bash
VECTOR_STORE_PROVIDER=chromadb
DB_COLLECTION_NAME=recipes
DB_PERSIST_DIRECTORY=./data/chromadb_persist
# Set to true to delete and recreate DB on startup (useful for adding new recipes)
DB_REFRESH_ON_START=false
```

**MongoDB Atlas**
```bash
VECTOR_STORE_PROVIDER=mongodb
MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/
MONGODB_DATABASE=recipe_bot
MONGODB_COLLECTION=recipes
```

#### **Embedding Configuration**
```bash
# Embedding provider automatically matches LLM_PROVIDER (unified approach)
# No separate configuration needed - handled automatically based on LLM_PROVIDER setting
```

> 💡 **Unified Provider**: The `LLM_PROVIDER` setting automatically configures both the LLM and embedding models, ensuring consistency and preventing mismatched configurations. See [`docs/model-selection-guide.md`](docs/model-selection-guide.md) for all available options.

## 🛠️ API Endpoints

### Core Endpoints

#### **Health Check**
```bash
GET /health
```
Returns service health and configuration status.

#### **Chat with RAG**
```bash
POST /chat
Content-Type: application/json

{
  "message": "What chicken recipes do you have?"
}
```
Full conversational RAG pipeline with memory and vector retrieval.

#### **Simple Demo**
```bash
GET /demo?prompt=Tell me about Italian cuisine
```
Simple LLM completion without RAG for testing.

#### **Clear Memory**
```bash
POST /clear-memory
```
Clears conversation memory for fresh start.

### Example Requests

**Chat Request:**
```bash
curl -X POST "http://localhost:8080/chat" 
  -H "Content-Type: application/json" 
  -d '{"message": "What are some quick breakfast recipes?"}'
```

**Demo Request:**
```bash
curl "http://localhost:8080/demo?prompt=What%20is%20your%20favorite%20pasta%20dish?"
```

## 🏗️ Architecture

### Core Components

#### **LLM Service** (`services/llm_service.py`)
- **ConversationalRetrievalChain**: Main RAG pipeline with memory
- **Simple Chat Completion**: Direct LLM responses without RAG
- **Multi-provider Support**: OpenAI, Google, HuggingFace
- **Conversation Memory**: Persistent chat history

#### **Vector Store Service** (`services/vector_store.py`)
- **ChromaDB Integration**: Local vector database
- **MongoDB Atlas Support**: Cloud vector search
- **Document Loading**: Automatic recipe data ingestion
- **Embedding Management**: Multi-provider embedding support

#### **Configuration System** (`config/`)
- **Settings Management**: Environment-based configuration
- **Database Configuration**: Vector store setup
- **Logging Configuration**: Structured logging with rotation

### Data Flow

1. **User Query** → FastAPI endpoint
2. **RAG Pipeline** → Vector similarity search
3. **Context Retrieval** → Top-k relevant recipes
4. **LLM Generation** → Context-aware response
5. **Memory Storage** → Conversation persistence
6. **Response** → JSON formatted reply

## 📊 Logging

Comprehensive logging system with:

- **File Rotation**: 10MB max size, 5 backups
- **Structured Format**: Timestamps, levels, source location
- **Emoji Indicators**: Visual status indicators
- **Error Tracking**: Full stack traces for debugging

**Log Levels:**
- 🚀 **INFO**: Normal operations
- ⚠️ **WARNING**: Non-critical issues
- ❌ **ERROR**: Failures with stack traces
- 🔧 **DEBUG**: Detailed operation steps

**Log Location:** `./logs/recipe_bot.log`

## 📁 Data Management

### Recipe Data
- **Location**: `./data/recipes/`
- **Format**: JSON files with structured recipe data
- **Schema**: title, ingredients, directions, tags
- **Auto-loading**: Automatic chunking and vectorization

### Vector Storage
- **ChromaDB**: Local persistence in `./data/chromadb_persist/`
- **MongoDB**: Cloud-based vector search
- **Embeddings**: Configurable embedding models
- **Retrieval**: Top-k similarity search (k=25)

## 🔧 Development

### Running in Development
```bash
# Install dependencies
pip install -r requirements.txt

# Set up environment
cp .env.example .env
# Configure your API keys

# Run with auto-reload
uvicorn app:app --reload --host 127.0.0.1 --port 8080
```

### Testing Individual Components
```bash
# Test vector store
python -c "from services.vector_store import vector_store_service; print('Vector store initialized')"

# Test LLM service
python -c "from services.llm_service import llm_service; print('LLM service initialized')"
```

### Adding New Recipes
1. Add JSON files to `./data/recipes/`
2. Set `DB_REFRESH_ON_START=true` in `.env` file
3. Restart the application (ChromaDB will be recreated)
4. Set `DB_REFRESH_ON_START=false` to prevent repeated deletion
5. New recipes are now available for search

**Quick refresh:**
```bash
# Enable refresh, restart, then disable
echo "DB_REFRESH_ON_START=true" >> .env
uvicorn app:app --reload --host 127.0.0.1 --port 8080
# After startup completes:
sed -i 's/DB_REFRESH_ON_START=true/DB_REFRESH_ON_START=false/' .env
```

## 🚀 Production Deployment

### Environment Setup
```bash
ENVIRONMENT=production
DEBUG=false
LOG_LEVEL=INFO
```

### Docker Deployment
The backend is containerized and ready for deployment on platforms like Hugging Face Spaces.

### Security Features
- **Environment Variables**: Secure API key management
- **CORS Configuration**: Frontend integration protection  
- **Input Sanitization**: Context-appropriate validation for recipe queries
  - XSS protection through HTML encoding
  - Length validation (1-1000 characters)
  - Basic harmful pattern removal
  - Whitespace normalization
- **Pydantic Validation**: Type safety and automatic sanitization
- **Structured Error Handling**: Safe error responses without data leaks

## 🛠️ Troubleshooting

### Common Issues

**Vector store initialization fails**
- Check API keys for embedding provider
- Verify data folder contains recipe files
- Check ChromaDB permissions

**LLM service fails**
- Verify API key configuration
- Check provider-specific requirements
- Review logs for detailed error messages

**HuggingFace model import errors**
- HuggingFace APIs have proven unreliable for production use
- **Recommended**: Use Ollama to run HuggingFace models locally instead:
  ```bash
  # Install and run HuggingFace models via Ollama
  ollama pull codeqwen:7b
  ollama pull mistral-nemo:12b
  # Set LLM_PROVIDER=ollama in .env
  ```
- For legacy HuggingFace API setup, uncomment dependencies in `requirements.txt` (not recommended)
- For detailed model comparisons, see [`docs/model-selection-guide.md`](docs/model-selection-guide.md)

**Memory issues**
```bash
# Clear conversation memory
curl -X POST http://localhost:8080/clear-memory
```

### Debug Mode
Set `DEBUG=true` in `.env` for detailed logging and error traces.

### Log Analysis
Check `./logs/recipe_bot.log` for detailed operation logs with emoji indicators for quick status identification.

## 📚 Documentation

### Troubleshooting Guides
- **[Embedding Troubleshooting](./docs/embedding-troubleshooting.md)** - Quick fixes for common embedding dimension errors
- **[Embedding Compatibility Guide](./docs/embedding-compatibility-guide.md)** - Comprehensive guide to embedding models and dimensions
- **[Logging Guide](./docs/logging_guide.md)** - Understanding the logging system

### Technical Guides
- **[Architecture Documentation](./docs/architecture.md)** - System architecture overview
- **[API Documentation](./docs/api-documentation.md)** - Detailed API reference
- **[Deployment Guide](./docs/deployment.md)** - Production deployment instructions

### Common Issues
- **Dimension mismatch errors**: See [Embedding Troubleshooting](./docs/embedding-troubleshooting.md)
- **Model loading issues**: Check provider configuration in `.env`
- **Database connection problems**: Verify MongoDB/ChromaDB settings

## 📚 Dependencies

### Core Dependencies
- **FastAPI**: Modern web framework
- **uvicorn**: ASGI server
- **pydantic**: Data validation
- **python-dotenv**: Environment management

### AI/ML Dependencies
- **langchain**: LLM framework and chains
- **langchain-openai**: OpenAI integration
- **langchain-google-genai**: Google AI integration
- **sentence-transformers**: Embedding models
- **chromadb**: Vector database
- **pymongo**: MongoDB integration

### Optional Dependencies
- **langchain-huggingface**: HuggingFace integration
- **torch**: PyTorch for local models

## 📄 License

This project is part of the PLG4 Recipe Recommendation Chatbot system.

---

For more detailed documentation, check the `docs/` folder or visit the API documentation at `http://localhost:8080/docs` when running the server.