Spaces:

alex4cip
/

simple-chat

Sleeping

File size: 4,984 Bytes

c9ef1fe

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

Multi-model LLM chatbot using Hugging Face Inference API and Gradio. Users can select from multiple pre-configured models and have conversations with them. Model changes automatically reset the conversation.

## Tech Stack

- **Python**: 3.10+
- **Framework**: Gradio 5.x (ChatInterface + Blocks)
- **API**: Hugging Face Serverless Inference API (free tier)
- **Deployment**: Hugging Face Spaces (free CPU instance)

## Project Structure

```
├── app.py              # Main application
├── requirements.txt    # Python dependencies
├── README.md          # Spaces configuration + documentation
├── .env               # HF_TOKEN (git ignored)
└── CLAUDE.md          # This file
```

## Development Commands

### Local Development

```bash
# Install dependencies
pip install -r requirements.txt

# Run locally (requires HF_TOKEN in .env)
python app.py

# Access at http://localhost:7860
```

### Deployment to Hugging Face Spaces

**Method 1: Web UI**
1. Create Space at https://huggingface.co/spaces
2. Select Gradio SDK
3. Upload `app.py`, `requirements.txt`, `README.md`
4. Add `HF_TOKEN` to Settings → Repository secrets

**Method 2: Git Push**
```bash
git remote add space https://huggingface.co/spaces/<username>/<space-name>
git push space main
```

## Architecture

### Core Components

**`app.py` Structure**:
- `MODELS` dict: Model configurations (ID, display name, parameters)
- `chat_response()`: Main inference function handling multiple model types
- `on_model_change()`: Clears chat when model selection changes
- Gradio Blocks: UI composition with model dropdown + ChatInterface

**Model Handling Patterns**:
- **DialoGPT**: Text continuation with conversation history formatting
- **BlenderBot**: Conversational API with single-turn context
- **Flan-T5**: Instruction-based text generation with prompt engineering
- **Zephyr**: Chat completion API with message history formatting

**State Management**:
- Global `current_model` tracks selected model
- Model change triggers chat history reset via Gradio event handlers
- Each model type uses appropriate API method from `InferenceClient`

### API Integration

**Hugging Face InferenceClient Usage**:
```python
client = InferenceClient(token=HF_TOKEN)

# Different methods for different model types
client.text_generation()      # DialoGPT, Flan-T5
client.conversational()        # BlenderBot
client.chat_completion()       # Zephyr (chat models)
```

**Rate Limiting & Error Handling**:
- Free tier: ~100-300 requests/hour
- Graceful degradation with user-friendly error messages
- Timeout and rate limit detection in exception handling

## Environment Setup

**Required Environment Variable**:
```bash
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```

**Obtaining HF_TOKEN**:
1. Login to https://huggingface.co
2. Settings → Access Tokens
3. Create new token with "Read" permissions
4. Copy to `.env` file (local) or Space secrets (deployment)

## Adding New Models

1. **Add to MODELS dict** in [app.py:23-45](app.py#L23-L45):
```python
"model-org/model-name": {
    "name": "Display Name",
    "max_length": 512,
    "temperature": 0.7,
}
```

2. **Update chat_response()** if model requires special handling:
   - Check model name in conditional logic
   - Use appropriate InferenceClient method
   - Format prompt/messages according to model requirements

3. **Verify free tier compatibility**:
   - Test model availability via Inference API
   - Check rate limits and response times
   - Update README.md model list

## UI Customization

**Changing Language**:
- All UI strings are in Korean by default
- Modify markdown strings and button labels in [app.py:140-220](app.py#L140-L220)

**Theme & Styling**:
```python
gr.Blocks(theme=gr.themes.Soft())  # Change theme here
```

**Chat Examples**:
- Modify `examples` parameter in ChatInterface [app.py:187-192](app.py#L187-L192)

## Common Issues

**"Rate limit exceeded"**:
- Free tier limitation, wait ~1 hour or upgrade to PRO ($9/month)

**Model timeout/unavailable**:
- High demand on free tier, try different model or retry later

**Space sleeping**:
- Spaces sleep after inactivity, first load may be slow

## Testing Locally

```bash
# Ensure .env exists with HF_TOKEN
python app.py

# Test each model:
# 1. Select model from dropdown
# 2. Send test message
# 3. Verify response generation
# 4. Change model and verify chat resets
```

## Deployment Notes

**README.md YAML Header**:
- Required for Spaces configuration
- Specifies SDK, Python version, app file
- Auto-detected by Hugging Face

**Environment Variables in Spaces**:
- Set via Settings → Repository secrets
- Name must match exactly: `HF_TOKEN`
- Never commit tokens to repository

**Free Tier Constraints**:
- CPU only (no GPU)
- Auto-sleep after inactivity
- Rate limits on API calls
- May experience slower inference