simple-chat / CLAUDE.md
alex4cip's picture
feat: Hugging Face LLM chatbot with multi-language support
c9ef1fe
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
Multi-model LLM chatbot using Hugging Face Inference API and Gradio. Users can select from multiple pre-configured models and have conversations with them. Model changes automatically reset the conversation.
## Tech Stack
- **Python**: 3.10+
- **Framework**: Gradio 5.x (ChatInterface + Blocks)
- **API**: Hugging Face Serverless Inference API (free tier)
- **Deployment**: Hugging Face Spaces (free CPU instance)
## Project Structure
```
├── app.py # Main application
├── requirements.txt # Python dependencies
├── README.md # Spaces configuration + documentation
├── .env # HF_TOKEN (git ignored)
└── CLAUDE.md # This file
```
## Development Commands
### Local Development
```bash
# Install dependencies
pip install -r requirements.txt
# Run locally (requires HF_TOKEN in .env)
python app.py
# Access at http://localhost:7860
```
### Deployment to Hugging Face Spaces
**Method 1: Web UI**
1. Create Space at https://huggingface.co/spaces
2. Select Gradio SDK
3. Upload `app.py`, `requirements.txt`, `README.md`
4. Add `HF_TOKEN` to Settings → Repository secrets
**Method 2: Git Push**
```bash
git remote add space https://huggingface.co/spaces/<username>/<space-name>
git push space main
```
## Architecture
### Core Components
**`app.py` Structure**:
- `MODELS` dict: Model configurations (ID, display name, parameters)
- `chat_response()`: Main inference function handling multiple model types
- `on_model_change()`: Clears chat when model selection changes
- Gradio Blocks: UI composition with model dropdown + ChatInterface
**Model Handling Patterns**:
- **DialoGPT**: Text continuation with conversation history formatting
- **BlenderBot**: Conversational API with single-turn context
- **Flan-T5**: Instruction-based text generation with prompt engineering
- **Zephyr**: Chat completion API with message history formatting
**State Management**:
- Global `current_model` tracks selected model
- Model change triggers chat history reset via Gradio event handlers
- Each model type uses appropriate API method from `InferenceClient`
### API Integration
**Hugging Face InferenceClient Usage**:
```python
client = InferenceClient(token=HF_TOKEN)
# Different methods for different model types
client.text_generation() # DialoGPT, Flan-T5
client.conversational() # BlenderBot
client.chat_completion() # Zephyr (chat models)
```
**Rate Limiting & Error Handling**:
- Free tier: ~100-300 requests/hour
- Graceful degradation with user-friendly error messages
- Timeout and rate limit detection in exception handling
## Environment Setup
**Required Environment Variable**:
```bash
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```
**Obtaining HF_TOKEN**:
1. Login to https://huggingface.co
2. Settings → Access Tokens
3. Create new token with "Read" permissions
4. Copy to `.env` file (local) or Space secrets (deployment)
## Adding New Models
1. **Add to MODELS dict** in [app.py:23-45](app.py#L23-L45):
```python
"model-org/model-name": {
"name": "Display Name",
"max_length": 512,
"temperature": 0.7,
}
```
2. **Update chat_response()** if model requires special handling:
- Check model name in conditional logic
- Use appropriate InferenceClient method
- Format prompt/messages according to model requirements
3. **Verify free tier compatibility**:
- Test model availability via Inference API
- Check rate limits and response times
- Update README.md model list
## UI Customization
**Changing Language**:
- All UI strings are in Korean by default
- Modify markdown strings and button labels in [app.py:140-220](app.py#L140-L220)
**Theme & Styling**:
```python
gr.Blocks(theme=gr.themes.Soft()) # Change theme here
```
**Chat Examples**:
- Modify `examples` parameter in ChatInterface [app.py:187-192](app.py#L187-L192)
## Common Issues
**"Rate limit exceeded"**:
- Free tier limitation, wait ~1 hour or upgrade to PRO ($9/month)
**Model timeout/unavailable**:
- High demand on free tier, try different model or retry later
**Space sleeping**:
- Spaces sleep after inactivity, first load may be slow
## Testing Locally
```bash
# Ensure .env exists with HF_TOKEN
python app.py
# Test each model:
# 1. Select model from dropdown
# 2. Send test message
# 3. Verify response generation
# 4. Change model and verify chat resets
```
## Deployment Notes
**README.md YAML Header**:
- Required for Spaces configuration
- Specifies SDK, Python version, app file
- Auto-detected by Hugging Face
**Environment Variables in Spaces**:
- Set via Settings → Repository secrets
- Name must match exactly: `HF_TOKEN`
- Never commit tokens to repository
**Free Tier Constraints**:
- CPU only (no GPU)
- Auto-sleep after inactivity
- Rate limits on API calls
- May experience slower inference