Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.1.0
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Multi-model LLM chatbot using Hugging Face Inference API and Gradio. Users can select from multiple pre-configured models and have conversations with them. Model changes automatically reset the conversation.
Tech Stack
- Python: 3.10+
- Framework: Gradio 5.x (ChatInterface + Blocks)
- API: Hugging Face Serverless Inference API (free tier)
- Deployment: Hugging Face Spaces (free CPU instance)
Project Structure
βββ app.py # Main application
βββ requirements.txt # Python dependencies
βββ README.md # Spaces configuration + documentation
βββ .env # HF_TOKEN (git ignored)
βββ CLAUDE.md # This file
Development Commands
Local Development
# Install dependencies
pip install -r requirements.txt
# Run locally (requires HF_TOKEN in .env)
python app.py
# Access at http://localhost:7860
Deployment to Hugging Face Spaces
Method 1: Web UI
- Create Space at https://huggingface.co/spaces
- Select Gradio SDK
- Upload
app.py,requirements.txt,README.md - Add
HF_TOKENto Settings β Repository secrets
Method 2: Git Push
git remote add space https://huggingface.co/spaces/<username>/<space-name>
git push space main
Architecture
Core Components
app.py Structure:
MODELSdict: Model configurations (ID, display name, parameters)chat_response(): Main inference function handling multiple model typeson_model_change(): Clears chat when model selection changes- Gradio Blocks: UI composition with model dropdown + ChatInterface
Model Handling Patterns:
- DialoGPT: Text continuation with conversation history formatting
- BlenderBot: Conversational API with single-turn context
- Flan-T5: Instruction-based text generation with prompt engineering
- Zephyr: Chat completion API with message history formatting
State Management:
- Global
current_modeltracks selected model - Model change triggers chat history reset via Gradio event handlers
- Each model type uses appropriate API method from
InferenceClient
API Integration
Hugging Face InferenceClient Usage:
client = InferenceClient(token=HF_TOKEN)
# Different methods for different model types
client.text_generation() # DialoGPT, Flan-T5
client.conversational() # BlenderBot
client.chat_completion() # Zephyr (chat models)
Rate Limiting & Error Handling:
- Free tier: ~100-300 requests/hour
- Graceful degradation with user-friendly error messages
- Timeout and rate limit detection in exception handling
Environment Setup
Required Environment Variable:
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Obtaining HF_TOKEN:
- Login to https://huggingface.co
- Settings β Access Tokens
- Create new token with "Read" permissions
- Copy to
.envfile (local) or Space secrets (deployment)
Adding New Models
- Add to MODELS dict in app.py:23-45:
"model-org/model-name": {
"name": "Display Name",
"max_length": 512,
"temperature": 0.7,
}
Update chat_response() if model requires special handling:
- Check model name in conditional logic
- Use appropriate InferenceClient method
- Format prompt/messages according to model requirements
Verify free tier compatibility:
- Test model availability via Inference API
- Check rate limits and response times
- Update README.md model list
UI Customization
Changing Language:
- All UI strings are in Korean by default
- Modify markdown strings and button labels in app.py:140-220
Theme & Styling:
gr.Blocks(theme=gr.themes.Soft()) # Change theme here
Chat Examples:
- Modify
examplesparameter in ChatInterface app.py:187-192
Common Issues
"Rate limit exceeded":
- Free tier limitation, wait ~1 hour or upgrade to PRO ($9/month)
Model timeout/unavailable:
- High demand on free tier, try different model or retry later
Space sleeping:
- Spaces sleep after inactivity, first load may be slow
Testing Locally
# Ensure .env exists with HF_TOKEN
python app.py
# Test each model:
# 1. Select model from dropdown
# 2. Send test message
# 3. Verify response generation
# 4. Change model and verify chat resets
Deployment Notes
README.md YAML Header:
- Required for Spaces configuration
- Specifies SDK, Python version, app file
- Auto-detected by Hugging Face
Environment Variables in Spaces:
- Set via Settings β Repository secrets
- Name must match exactly:
HF_TOKEN - Never commit tokens to repository
Free Tier Constraints:
- CPU only (no GPU)
- Auto-sleep after inactivity
- Rate limits on API calls
- May experience slower inference