Spaces:
Sleeping
Sleeping
| # CLAUDE.md | |
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | |
| ## Project Overview | |
| Multi-model LLM chatbot using Hugging Face Inference API and Gradio. Users can select from multiple pre-configured models and have conversations with them. Model changes automatically reset the conversation. | |
| ## Tech Stack | |
| - **Python**: 3.10+ | |
| - **Framework**: Gradio 5.x (ChatInterface + Blocks) | |
| - **API**: Hugging Face Serverless Inference API (free tier) | |
| - **Deployment**: Hugging Face Spaces (free CPU instance) | |
| ## Project Structure | |
| ``` | |
| ├── app.py # Main application | |
| ├── requirements.txt # Python dependencies | |
| ├── README.md # Spaces configuration + documentation | |
| ├── .env # HF_TOKEN (git ignored) | |
| └── CLAUDE.md # This file | |
| ``` | |
| ## Development Commands | |
| ### Local Development | |
| ```bash | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Run locally (requires HF_TOKEN in .env) | |
| python app.py | |
| # Access at http://localhost:7860 | |
| ``` | |
| ### Deployment to Hugging Face Spaces | |
| **Method 1: Web UI** | |
| 1. Create Space at https://huggingface.co/spaces | |
| 2. Select Gradio SDK | |
| 3. Upload `app.py`, `requirements.txt`, `README.md` | |
| 4. Add `HF_TOKEN` to Settings → Repository secrets | |
| **Method 2: Git Push** | |
| ```bash | |
| git remote add space https://huggingface.co/spaces/<username>/<space-name> | |
| git push space main | |
| ``` | |
| ## Architecture | |
| ### Core Components | |
| **`app.py` Structure**: | |
| - `MODELS` dict: Model configurations (ID, display name, parameters) | |
| - `chat_response()`: Main inference function handling multiple model types | |
| - `on_model_change()`: Clears chat when model selection changes | |
| - Gradio Blocks: UI composition with model dropdown + ChatInterface | |
| **Model Handling Patterns**: | |
| - **DialoGPT**: Text continuation with conversation history formatting | |
| - **BlenderBot**: Conversational API with single-turn context | |
| - **Flan-T5**: Instruction-based text generation with prompt engineering | |
| - **Zephyr**: Chat completion API with message history formatting | |
| **State Management**: | |
| - Global `current_model` tracks selected model | |
| - Model change triggers chat history reset via Gradio event handlers | |
| - Each model type uses appropriate API method from `InferenceClient` | |
| ### API Integration | |
| **Hugging Face InferenceClient Usage**: | |
| ```python | |
| client = InferenceClient(token=HF_TOKEN) | |
| # Different methods for different model types | |
| client.text_generation() # DialoGPT, Flan-T5 | |
| client.conversational() # BlenderBot | |
| client.chat_completion() # Zephyr (chat models) | |
| ``` | |
| **Rate Limiting & Error Handling**: | |
| - Free tier: ~100-300 requests/hour | |
| - Graceful degradation with user-friendly error messages | |
| - Timeout and rate limit detection in exception handling | |
| ## Environment Setup | |
| **Required Environment Variable**: | |
| ```bash | |
| HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx | |
| ``` | |
| **Obtaining HF_TOKEN**: | |
| 1. Login to https://huggingface.co | |
| 2. Settings → Access Tokens | |
| 3. Create new token with "Read" permissions | |
| 4. Copy to `.env` file (local) or Space secrets (deployment) | |
| ## Adding New Models | |
| 1. **Add to MODELS dict** in [app.py:23-45](app.py#L23-L45): | |
| ```python | |
| "model-org/model-name": { | |
| "name": "Display Name", | |
| "max_length": 512, | |
| "temperature": 0.7, | |
| } | |
| ``` | |
| 2. **Update chat_response()** if model requires special handling: | |
| - Check model name in conditional logic | |
| - Use appropriate InferenceClient method | |
| - Format prompt/messages according to model requirements | |
| 3. **Verify free tier compatibility**: | |
| - Test model availability via Inference API | |
| - Check rate limits and response times | |
| - Update README.md model list | |
| ## UI Customization | |
| **Changing Language**: | |
| - All UI strings are in Korean by default | |
| - Modify markdown strings and button labels in [app.py:140-220](app.py#L140-L220) | |
| **Theme & Styling**: | |
| ```python | |
| gr.Blocks(theme=gr.themes.Soft()) # Change theme here | |
| ``` | |
| **Chat Examples**: | |
| - Modify `examples` parameter in ChatInterface [app.py:187-192](app.py#L187-L192) | |
| ## Common Issues | |
| **"Rate limit exceeded"**: | |
| - Free tier limitation, wait ~1 hour or upgrade to PRO ($9/month) | |
| **Model timeout/unavailable**: | |
| - High demand on free tier, try different model or retry later | |
| **Space sleeping**: | |
| - Spaces sleep after inactivity, first load may be slow | |
| ## Testing Locally | |
| ```bash | |
| # Ensure .env exists with HF_TOKEN | |
| python app.py | |
| # Test each model: | |
| # 1. Select model from dropdown | |
| # 2. Send test message | |
| # 3. Verify response generation | |
| # 4. Change model and verify chat resets | |
| ``` | |
| ## Deployment Notes | |
| **README.md YAML Header**: | |
| - Required for Spaces configuration | |
| - Specifies SDK, Python version, app file | |
| - Auto-detected by Hugging Face | |
| **Environment Variables in Spaces**: | |
| - Set via Settings → Repository secrets | |
| - Name must match exactly: `HF_TOKEN` | |
| - Never commit tokens to repository | |
| **Free Tier Constraints**: | |
| - CPU only (no GPU) | |
| - Auto-sleep after inactivity | |
| - Rate limits on API calls | |
| - May experience slower inference | |