Spaces:

jessejohnson
/

plg4-dev-server

Paused

App Files Files Community

plg4-dev-server / backend /README.md

Jesse Johnson

New commit for backend deployment: 2025-09-25_13-24-03

c59d808 5 months ago

preview code

raw

history blame contribute delete

16.5 kB

	# Recipe Recommendation Chatbot - Backend API

	Backend for AI-powered recipe recommendation system built with FastAPI, featuring RAG (Retrieval-Augmented Generation) capabilities, conversational memory, and multi-provider LLM support.

	## 🚀 Quick Start

	### Prerequisites
	- Python 3.9+
	- pip or poetry
	- API keys for your chosen LLM provider (OpenAI, Google, or HuggingFace)

	### Installation

	1. Clone and navigate to backend
	```bash
	git clone <repository-url>
	cd PLG4-Recipe-Recommendation-Chatbot/backend
	```

	2. Install dependencies
	```bash
	pip install -r requirements.txt
	```
	> 💡 Note: Some packages are commented out by default to keep the installation lightweight:
	> - HuggingFace dependencies (`transformers`, `accelerate`, `sentence-transformers`) - Uncomment if using HuggingFace models
	> - sentence-transformers (~800MB) - Uncomment for HuggingFace embeddings

	3. Configure environment
	```bash
	cp .env.example .env
	# Edit .env with your API keys and configuration
	```

	4. Run the server
	```bash
	# Development mode with auto-reload
	uvicorn app:app --reload --host 127.0.0.1 --port 8080

	# Or production mode
	uvicorn app:app --host 127.0.0.1 --port 8080
	```

	5. Test the API
	```bash
	curl http://localhost:8080/health
	```

	6. HuggingFace Spaces deployment
	```
	sh deploy-to-hf.sh <remote>
	```
	where <remote> points to the HuggingFace Spaces repository

	## 📁 Project Structure

	```
	backend/
	├── app.py # FastAPI application entry point
	├── requirements.txt # Python dependencies
	├── .env.example # Environment configuration template
	├── .gitignore # Git ignore rules
	│
	├── config/ # Configuration modules
	│ ├── __init__.py
	│ ├── settings.py # Application settings
	│ ├── database.py # Database configuration
	│ └── logging_config.py # Logging setup
	│
	├── services/ # Core business logic
	│ ├── __init__.py
	│ ├── llm_service.py # LLM and RAG pipeline
	│ └── vector_store.py # Vector database management
	│
	├── data/ # Data storage
	│ ├── recipes/ # Recipe JSON files
	│ │ └── recipe.json # Sample recipe data
	│ └── chromadb_persist/ # ChromaDB persistence
	│
	├── logs/ # Application logs
	│ └── recipe_bot.log # Main log file
	│
	├── docs/ # Documentation
	│ ├── model-selection-guide.md # 🎯 Complete model selection & comparison guide
	│ ├── model-quick-reference.md # ⚡ Quick model switching commands
	│ ├── chromadb_refresh.md # ChromaDB refresh guide
	│ ├── opensource-llm-configuration.md # Open source LLM setup guide
	│ ├── logging_guide.md # Logging documentation
	│ ├── optimal_recipes_structure.md # Recipe data structure guide
	│ ├── sanitization_guide.md # Input sanitization guide
	│ └── unified-provider-configuration.md # Unified provider approach guide
	│
	└── utils/ # Utility functions
	└── __init__.py
	```

	## ⚙️ Configuration

	### Environment Variables

	Copy `.env.example` to `.env` and configure the following:

	> 🎯 Unified Provider Approach: The `LLM_PROVIDER` setting controls both LLM and embedding models, preventing configuration mismatches. See [`docs/unified-provider-configuration.md`](docs/unified-provider-configuration.md) for details.

	#### Server Configuration
	```bash
	PORT=8000 # Server port
	HOST=0.0.0.0 # Server host
	ENVIRONMENT=development # Environment mode
	DEBUG=true # Debug mode
	```

	#### Provider Configuration
	Choose one provider for both LLM and embeddings (unified approach):

	> 🎯 NEW: Complete Model Selection Guide: For detailed comparisons of all models (OpenAI, Google, Anthropic, Ollama, HuggingFace) including latest 2025 models, performance metrics, costs, and scenario-based recommendations, see [`docs/model-selection-guide.md`](docs/model-selection-guide.md)

	> ⚡ Quick Reference: For one-command model switching, see [`docs/model-quick-reference.md`](docs/model-quick-reference.md)

	OpenAI (Best Value & Latest Models)
	```bash
	LLM_PROVIDER=openai
	OPENAI_API_KEY=your_openai_api_key_here
	OPENAI_MODEL=gpt-5-nano # 🎯 BEST VALUE: $1/month for 30K queries - Modern GPT-5 at nano price
	# Alternatives:
	# - gpt-4o-mini # Proven choice: $4/month for 30K queries
	# - gpt-5 # Premium: $20/month unlimited (Plus plan)
	OPENAI_EMBEDDING_MODEL=text-embedding-3-small # Used automatically
	```

	Google Gemini (Best Free Tier)
	```bash
	LLM_PROVIDER=google
	GOOGLE_API_KEY=your_google_api_key_here
	GOOGLE_MODEL=gemini-2.5-flash # 🎯 RECOMMENDED: Excellent free tier, then $2/month
	# Alternatives:
	# - gemini-2.0-flash-lite # Ultra budget: $0.90/month for 30K queries
	# - gemini-2.5-pro # Premium: $25/month for 30K queries
	GOOGLE_EMBEDDING_MODEL=models/embedding-001 # Used automatically
	```

	Anthropic Claude (Best Quality-to-Cost)
	```bash
	LLM_PROVIDER=anthropic
	ANTHROPIC_API_KEY=your_anthropic_api_key_here
	ANTHROPIC_MODEL=claude-3-5-haiku-20241022 # 🎯 BUDGET WINNER: $4/month for 30K queries
	# Alternatives:
	# - claude-3-5-sonnet-20241022 # Production standard: $45/month for 30K queries
	# - claude-3-opus-20240229 # Premium quality: $225/month for 30K queries
	ANTHROPIC_EMBEDDING_MODEL=voyage-large-2 # Used automatically
	```

	Ollama (Best for Privacy/Self-Hosting)
	```bash
	LLM_PROVIDER=ollama
	OLLAMA_BASE_URL=http://localhost:11434
	OLLAMA_MODEL=llama3.1:8b # 🎯 YOUR CURRENT: 4.7GB download, 8GB RAM, excellent balance
	# New alternatives:
	# - deepseek-r1:7b # Breakthrough reasoning: 4.7GB download, O1-level performance
	# - codeqwen:7b # Structured data expert: 4.2GB download, excellent for recipes
	# - gemma3:4b # Resource-efficient: 3.3GB download, 6GB RAM
	# - mistral-nemo:12b # Balanced performance: 7GB download, 12GB RAM
	OLLAMA_EMBEDDING_MODEL=nomic-embed-text # Used automatically
	```

	HuggingFace (Downloadable Models Only - APIs Unreliable)
	```bash
	LLM_PROVIDER=ollama # Use Ollama to run HuggingFace models locally
	OLLAMA_MODEL=codeqwen:7b # 🎯 RECOMMENDED: Download HF models via Ollama for reliability
	# Other downloadable options:
	# - mistral-nemo:12b # Mistral's balanced model
	# - nous-hermes2:10.7b # Fine-tuned for instruction following
	# - openhermes2.5-mistral:7b # Community favorite
	OLLAMA_EMBEDDING_MODEL=nomic-embed-text # Used automatically
	```
	> ⚠️ Important Change: HuggingFace APIs have proven unreliable for production. We now recommend downloading HuggingFace models locally via Ollama for consistent performance.
	> ⚠️ HuggingFace Update: HuggingFace dependencies are no longer required as we recommend using downloadable models via Ollama instead of unreliable APIs. For local HuggingFace models, use Ollama which provides better reliability and performance.

	> 📖 Local Model Setup: See [`docs/opensource-llm-configuration.md`](docs/opensource-llm-configuration.md) for GPU setup, model selection, and performance optimization with Ollama.

	> 💡 Unified Provider: The `LLM_PROVIDER` setting automatically configures both the LLM and embedding models, ensuring consistency and preventing mismatched configurations.

	#### Vector Store Configuration
	Choose between ChromaDB (local) or MongoDB Atlas:

	ChromaDB (Default)
	```bash
	VECTOR_STORE_PROVIDER=chromadb
	DB_COLLECTION_NAME=recipes
	DB_PERSIST_DIRECTORY=./data/chromadb_persist
	# Set to true to delete and recreate DB on startup (useful for adding new recipes)
	DB_REFRESH_ON_START=false
	```

	MongoDB Atlas
	```bash
	VECTOR_STORE_PROVIDER=mongodb
	MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/
	MONGODB_DATABASE=recipe_bot
	MONGODB_COLLECTION=recipes
	```

	#### Embedding Configuration
	```bash
	# Embedding provider automatically matches LLM_PROVIDER (unified approach)
	# No separate configuration needed - handled automatically based on LLM_PROVIDER setting
	```

	> 💡 Unified Provider: The `LLM_PROVIDER` setting automatically configures both the LLM and embedding models, ensuring consistency and preventing mismatched configurations. See [`docs/model-selection-guide.md`](docs/model-selection-guide.md) for all available options.

	## 🛠️ API Endpoints

	### Core Endpoints

	#### Health Check
	```bash
	GET /health
	```
	Returns service health and configuration status.

	#### Chat with RAG
	```bash
	POST /chat
	Content-Type: application/json

	{
	"message": "What chicken recipes do you have?"
	}
	```
	Full conversational RAG pipeline with memory and vector retrieval.

	#### Simple Demo
	```bash
	GET /demo?prompt=Tell me about Italian cuisine
	```
	Simple LLM completion without RAG for testing.

	#### Clear Memory
	```bash
	POST /clear-memory
	```
	Clears conversation memory for fresh start.

	### Example Requests

	Chat Request:
	```bash
	curl -X POST "http://localhost:8080/chat"
	-H "Content-Type: application/json"
	-d '{"message": "What are some quick breakfast recipes?"}'
	```

	Demo Request:
	```bash
	curl "http://localhost:8080/demo?prompt=What%20is%20your%20favorite%20pasta%20dish?"
	```

	## 🏗️ Architecture

	### Core Components

	#### LLM Service (`services/llm_service.py`)
	- ConversationalRetrievalChain: Main RAG pipeline with memory
	- Simple Chat Completion: Direct LLM responses without RAG
	- Multi-provider Support: OpenAI, Google, HuggingFace
	- Conversation Memory: Persistent chat history

	#### Vector Store Service (`services/vector_store.py`)
	- ChromaDB Integration: Local vector database
	- MongoDB Atlas Support: Cloud vector search
	- Document Loading: Automatic recipe data ingestion
	- Embedding Management: Multi-provider embedding support

	#### Configuration System (`config/`)
	- Settings Management: Environment-based configuration
	- Database Configuration: Vector store setup
	- Logging Configuration: Structured logging with rotation

	### Data Flow

	1. User Query → FastAPI endpoint
	2. RAG Pipeline → Vector similarity search
	3. Context Retrieval → Top-k relevant recipes
	4. LLM Generation → Context-aware response
	5. Memory Storage → Conversation persistence
	6. Response → JSON formatted reply

	## 📊 Logging

	Comprehensive logging system with:

	- File Rotation: 10MB max size, 5 backups
	- Structured Format: Timestamps, levels, source location
	- Emoji Indicators: Visual status indicators
	- Error Tracking: Full stack traces for debugging

	Log Levels:
	- 🚀 INFO: Normal operations
	- ⚠️ WARNING: Non-critical issues
	- ❌ ERROR: Failures with stack traces
	- 🔧 DEBUG: Detailed operation steps

	Log Location: `./logs/recipe_bot.log`

	## 📁 Data Management

	### Recipe Data
	- Location: `./data/recipes/`
	- Format: JSON files with structured recipe data
	- Schema: title, ingredients, directions, tags
	- Auto-loading: Automatic chunking and vectorization

	### Vector Storage
	- ChromaDB: Local persistence in `./data/chromadb_persist/`
	- MongoDB: Cloud-based vector search
	- Embeddings: Configurable embedding models
	- Retrieval: Top-k similarity search (k=25)

	## 🔧 Development

	### Running in Development
	```bash
	# Install dependencies
	pip install -r requirements.txt

	# Set up environment
	cp .env.example .env
	# Configure your API keys

	# Run with auto-reload
	uvicorn app:app --reload --host 127.0.0.1 --port 8080
	```

	### Testing Individual Components
	```bash
	# Test vector store
	python -c "from services.vector_store import vector_store_service; print('Vector store initialized')"

	# Test LLM service
	python -c "from services.llm_service import llm_service; print('LLM service initialized')"
	```

	### Adding New Recipes
	1. Add JSON files to `./data/recipes/`
	2. Set `DB_REFRESH_ON_START=true` in `.env` file
	3. Restart the application (ChromaDB will be recreated)
	4. Set `DB_REFRESH_ON_START=false` to prevent repeated deletion
	5. New recipes are now available for search

	Quick refresh:
	```bash
	# Enable refresh, restart, then disable
	echo "DB_REFRESH_ON_START=true" >> .env
	uvicorn app:app --reload --host 127.0.0.1 --port 8080
	# After startup completes:
	sed -i 's/DB_REFRESH_ON_START=true/DB_REFRESH_ON_START=false/' .env
	```

	## 🚀 Production Deployment

	### Environment Setup
	```bash
	ENVIRONMENT=production
	DEBUG=false
	LOG_LEVEL=INFO
	```

	### Docker Deployment
	The backend is containerized and ready for deployment on platforms like Hugging Face Spaces.

	### Security Features
	- Environment Variables: Secure API key management
	- CORS Configuration: Frontend integration protection
	- Input Sanitization: Context-appropriate validation for recipe queries
	- XSS protection through HTML encoding
	- Length validation (1-1000 characters)
	- Basic harmful pattern removal
	- Whitespace normalization
	- Pydantic Validation: Type safety and automatic sanitization
	- Structured Error Handling: Safe error responses without data leaks

	## 🛠️ Troubleshooting

	### Common Issues

	Vector store initialization fails
	- Check API keys for embedding provider
	- Verify data folder contains recipe files
	- Check ChromaDB permissions

	LLM service fails
	- Verify API key configuration
	- Check provider-specific requirements
	- Review logs for detailed error messages

	HuggingFace model import errors
	- HuggingFace APIs have proven unreliable for production use
	- Recommended: Use Ollama to run HuggingFace models locally instead:
	```bash
	# Install and run HuggingFace models via Ollama
	ollama pull codeqwen:7b
	ollama pull mistral-nemo:12b
	# Set LLM_PROVIDER=ollama in .env
	```
	- For legacy HuggingFace API setup, uncomment dependencies in `requirements.txt` (not recommended)
	- For detailed model comparisons, see [`docs/model-selection-guide.md`](docs/model-selection-guide.md)

	Memory issues
	```bash
	# Clear conversation memory
	curl -X POST http://localhost:8080/clear-memory
	```

	### Debug Mode
	Set `DEBUG=true` in `.env` for detailed logging and error traces.

	### Log Analysis
	Check `./logs/recipe_bot.log` for detailed operation logs with emoji indicators for quick status identification.

	## 📚 Documentation

	### Troubleshooting Guides
	- [Embedding Troubleshooting](./docs/embedding-troubleshooting.md) - Quick fixes for common embedding dimension errors
	- [Embedding Compatibility Guide](./docs/embedding-compatibility-guide.md) - Comprehensive guide to embedding models and dimensions
	- [Logging Guide](./docs/logging_guide.md) - Understanding the logging system

	### Technical Guides
	- [Architecture Documentation](./docs/architecture.md) - System architecture overview
	- [API Documentation](./docs/api-documentation.md) - Detailed API reference
	- [Deployment Guide](./docs/deployment.md) - Production deployment instructions

	### Common Issues
	- Dimension mismatch errors: See [Embedding Troubleshooting](./docs/embedding-troubleshooting.md)
	- Model loading issues: Check provider configuration in `.env`
	- Database connection problems: Verify MongoDB/ChromaDB settings

	## 📚 Dependencies

	### Core Dependencies
	- FastAPI: Modern web framework
	- uvicorn: ASGI server
	- pydantic: Data validation
	- python-dotenv: Environment management

	### AI/ML Dependencies
	- langchain: LLM framework and chains
	- langchain-openai: OpenAI integration
	- langchain-google-genai: Google AI integration
	- sentence-transformers: Embedding models
	- chromadb: Vector database
	- pymongo: MongoDB integration

	### Optional Dependencies
	- langchain-huggingface: HuggingFace integration
	- torch: PyTorch for local models

	## 📄 License

	This project is part of the PLG4 Recipe Recommendation Chatbot system.

	---

	For more detailed documentation, check the `docs/` folder or visit the API documentation at `http://localhost:8080/docs` when running the server.