File size: 4,984 Bytes
c9ef1fe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

Multi-model LLM chatbot using Hugging Face Inference API and Gradio. Users can select from multiple pre-configured models and have conversations with them. Model changes automatically reset the conversation.

## Tech Stack

- **Python**: 3.10+
- **Framework**: Gradio 5.x (ChatInterface + Blocks)
- **API**: Hugging Face Serverless Inference API (free tier)
- **Deployment**: Hugging Face Spaces (free CPU instance)

## Project Structure

```
β”œβ”€β”€ app.py              # Main application
β”œβ”€β”€ requirements.txt    # Python dependencies
β”œβ”€β”€ README.md          # Spaces configuration + documentation
β”œβ”€β”€ .env               # HF_TOKEN (git ignored)
└── CLAUDE.md          # This file
```

## Development Commands

### Local Development

```bash
# Install dependencies
pip install -r requirements.txt

# Run locally (requires HF_TOKEN in .env)
python app.py

# Access at http://localhost:7860
```

### Deployment to Hugging Face Spaces

**Method 1: Web UI**
1. Create Space at https://huggingface.co/spaces
2. Select Gradio SDK
3. Upload `app.py`, `requirements.txt`, `README.md`
4. Add `HF_TOKEN` to Settings β†’ Repository secrets

**Method 2: Git Push**
```bash
git remote add space https://huggingface.co/spaces/<username>/<space-name>
git push space main
```

## Architecture

### Core Components

**`app.py` Structure**:
- `MODELS` dict: Model configurations (ID, display name, parameters)
- `chat_response()`: Main inference function handling multiple model types
- `on_model_change()`: Clears chat when model selection changes
- Gradio Blocks: UI composition with model dropdown + ChatInterface

**Model Handling Patterns**:
- **DialoGPT**: Text continuation with conversation history formatting
- **BlenderBot**: Conversational API with single-turn context
- **Flan-T5**: Instruction-based text generation with prompt engineering
- **Zephyr**: Chat completion API with message history formatting

**State Management**:
- Global `current_model` tracks selected model
- Model change triggers chat history reset via Gradio event handlers
- Each model type uses appropriate API method from `InferenceClient`

### API Integration

**Hugging Face InferenceClient Usage**:
```python
client = InferenceClient(token=HF_TOKEN)

# Different methods for different model types
client.text_generation()      # DialoGPT, Flan-T5
client.conversational()        # BlenderBot
client.chat_completion()       # Zephyr (chat models)
```

**Rate Limiting & Error Handling**:
- Free tier: ~100-300 requests/hour
- Graceful degradation with user-friendly error messages
- Timeout and rate limit detection in exception handling

## Environment Setup

**Required Environment Variable**:
```bash
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```

**Obtaining HF_TOKEN**:
1. Login to https://huggingface.co
2. Settings β†’ Access Tokens
3. Create new token with "Read" permissions
4. Copy to `.env` file (local) or Space secrets (deployment)

## Adding New Models

1. **Add to MODELS dict** in [app.py:23-45](app.py#L23-L45):
```python
"model-org/model-name": {
    "name": "Display Name",
    "max_length": 512,
    "temperature": 0.7,
}
```

2. **Update chat_response()** if model requires special handling:
   - Check model name in conditional logic
   - Use appropriate InferenceClient method
   - Format prompt/messages according to model requirements

3. **Verify free tier compatibility**:
   - Test model availability via Inference API
   - Check rate limits and response times
   - Update README.md model list

## UI Customization

**Changing Language**:
- All UI strings are in Korean by default
- Modify markdown strings and button labels in [app.py:140-220](app.py#L140-L220)

**Theme & Styling**:
```python
gr.Blocks(theme=gr.themes.Soft())  # Change theme here
```

**Chat Examples**:
- Modify `examples` parameter in ChatInterface [app.py:187-192](app.py#L187-L192)

## Common Issues

**"Rate limit exceeded"**:
- Free tier limitation, wait ~1 hour or upgrade to PRO ($9/month)

**Model timeout/unavailable**:
- High demand on free tier, try different model or retry later

**Space sleeping**:
- Spaces sleep after inactivity, first load may be slow

## Testing Locally

```bash
# Ensure .env exists with HF_TOKEN
python app.py

# Test each model:
# 1. Select model from dropdown
# 2. Send test message
# 3. Verify response generation
# 4. Change model and verify chat resets
```

## Deployment Notes

**README.md YAML Header**:
- Required for Spaces configuration
- Specifies SDK, Python version, app file
- Auto-detected by Hugging Face

**Environment Variables in Spaces**:
- Set via Settings β†’ Repository secrets
- Name must match exactly: `HF_TOKEN`
- Never commit tokens to repository

**Free Tier Constraints**:
- CPU only (no GPU)
- Auto-sleep after inactivity
- Rate limits on API calls
- May experience slower inference