Spaces:
Sleeping
Sleeping
File size: 4,984 Bytes
c9ef1fe |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
Multi-model LLM chatbot using Hugging Face Inference API and Gradio. Users can select from multiple pre-configured models and have conversations with them. Model changes automatically reset the conversation.
## Tech Stack
- **Python**: 3.10+
- **Framework**: Gradio 5.x (ChatInterface + Blocks)
- **API**: Hugging Face Serverless Inference API (free tier)
- **Deployment**: Hugging Face Spaces (free CPU instance)
## Project Structure
```
βββ app.py # Main application
βββ requirements.txt # Python dependencies
βββ README.md # Spaces configuration + documentation
βββ .env # HF_TOKEN (git ignored)
βββ CLAUDE.md # This file
```
## Development Commands
### Local Development
```bash
# Install dependencies
pip install -r requirements.txt
# Run locally (requires HF_TOKEN in .env)
python app.py
# Access at http://localhost:7860
```
### Deployment to Hugging Face Spaces
**Method 1: Web UI**
1. Create Space at https://huggingface.co/spaces
2. Select Gradio SDK
3. Upload `app.py`, `requirements.txt`, `README.md`
4. Add `HF_TOKEN` to Settings β Repository secrets
**Method 2: Git Push**
```bash
git remote add space https://huggingface.co/spaces/<username>/<space-name>
git push space main
```
## Architecture
### Core Components
**`app.py` Structure**:
- `MODELS` dict: Model configurations (ID, display name, parameters)
- `chat_response()`: Main inference function handling multiple model types
- `on_model_change()`: Clears chat when model selection changes
- Gradio Blocks: UI composition with model dropdown + ChatInterface
**Model Handling Patterns**:
- **DialoGPT**: Text continuation with conversation history formatting
- **BlenderBot**: Conversational API with single-turn context
- **Flan-T5**: Instruction-based text generation with prompt engineering
- **Zephyr**: Chat completion API with message history formatting
**State Management**:
- Global `current_model` tracks selected model
- Model change triggers chat history reset via Gradio event handlers
- Each model type uses appropriate API method from `InferenceClient`
### API Integration
**Hugging Face InferenceClient Usage**:
```python
client = InferenceClient(token=HF_TOKEN)
# Different methods for different model types
client.text_generation() # DialoGPT, Flan-T5
client.conversational() # BlenderBot
client.chat_completion() # Zephyr (chat models)
```
**Rate Limiting & Error Handling**:
- Free tier: ~100-300 requests/hour
- Graceful degradation with user-friendly error messages
- Timeout and rate limit detection in exception handling
## Environment Setup
**Required Environment Variable**:
```bash
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```
**Obtaining HF_TOKEN**:
1. Login to https://huggingface.co
2. Settings β Access Tokens
3. Create new token with "Read" permissions
4. Copy to `.env` file (local) or Space secrets (deployment)
## Adding New Models
1. **Add to MODELS dict** in [app.py:23-45](app.py#L23-L45):
```python
"model-org/model-name": {
"name": "Display Name",
"max_length": 512,
"temperature": 0.7,
}
```
2. **Update chat_response()** if model requires special handling:
- Check model name in conditional logic
- Use appropriate InferenceClient method
- Format prompt/messages according to model requirements
3. **Verify free tier compatibility**:
- Test model availability via Inference API
- Check rate limits and response times
- Update README.md model list
## UI Customization
**Changing Language**:
- All UI strings are in Korean by default
- Modify markdown strings and button labels in [app.py:140-220](app.py#L140-L220)
**Theme & Styling**:
```python
gr.Blocks(theme=gr.themes.Soft()) # Change theme here
```
**Chat Examples**:
- Modify `examples` parameter in ChatInterface [app.py:187-192](app.py#L187-L192)
## Common Issues
**"Rate limit exceeded"**:
- Free tier limitation, wait ~1 hour or upgrade to PRO ($9/month)
**Model timeout/unavailable**:
- High demand on free tier, try different model or retry later
**Space sleeping**:
- Spaces sleep after inactivity, first load may be slow
## Testing Locally
```bash
# Ensure .env exists with HF_TOKEN
python app.py
# Test each model:
# 1. Select model from dropdown
# 2. Send test message
# 3. Verify response generation
# 4. Change model and verify chat resets
```
## Deployment Notes
**README.md YAML Header**:
- Required for Spaces configuration
- Specifies SDK, Python version, app file
- Auto-detected by Hugging Face
**Environment Variables in Spaces**:
- Set via Settings β Repository secrets
- Name must match exactly: `HF_TOKEN`
- Never commit tokens to repository
**Free Tier Constraints**:
- CPU only (no GPU)
- Auto-sleep after inactivity
- Rate limits on API calls
- May experience slower inference
|