Spaces:

alex4cip
/

simple-chat

Sleeping

App Files Files Community

simple-chat / CLAUDE.md

alex4cip

feat: Hugging Face LLM chatbot with multi-language support

c9ef1fe about 2 months ago

preview code

raw

history blame contribute delete

4.98 kB

	# CLAUDE.md

	This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

	## Project Overview

	Multi-model LLM chatbot using Hugging Face Inference API and Gradio. Users can select from multiple pre-configured models and have conversations with them. Model changes automatically reset the conversation.

	## Tech Stack

	- Python: 3.10+
	- Framework: Gradio 5.x (ChatInterface + Blocks)
	- API: Hugging Face Serverless Inference API (free tier)
	- Deployment: Hugging Face Spaces (free CPU instance)

	## Project Structure

	```
	├── app.py # Main application
	├── requirements.txt # Python dependencies
	├── README.md # Spaces configuration + documentation
	├── .env # HF_TOKEN (git ignored)
	└── CLAUDE.md # This file
	```

	## Development Commands

	### Local Development

	```bash
	# Install dependencies
	pip install -r requirements.txt

	# Run locally (requires HF_TOKEN in .env)
	python app.py

	# Access at http://localhost:7860
	```

	### Deployment to Hugging Face Spaces

	Method 1: Web UI
	1. Create Space at https://huggingface.co/spaces
	2. Select Gradio SDK
	3. Upload `app.py`, `requirements.txt`, `README.md`
	4. Add `HF_TOKEN` to Settings → Repository secrets

	Method 2: Git Push
	```bash
	git remote add space https://huggingface.co/spaces/<username>/<space-name>
	git push space main
	```

	## Architecture

	### Core Components

	`app.py` Structure:
	- `MODELS` dict: Model configurations (ID, display name, parameters)
	- `chat_response()`: Main inference function handling multiple model types
	- `on_model_change()`: Clears chat when model selection changes
	- Gradio Blocks: UI composition with model dropdown + ChatInterface

	Model Handling Patterns:
	- DialoGPT: Text continuation with conversation history formatting
	- BlenderBot: Conversational API with single-turn context
	- Flan-T5: Instruction-based text generation with prompt engineering
	- Zephyr: Chat completion API with message history formatting

	State Management:
	- Global `current_model` tracks selected model
	- Model change triggers chat history reset via Gradio event handlers
	- Each model type uses appropriate API method from `InferenceClient`

	### API Integration

	Hugging Face InferenceClient Usage:
	```python
	client = InferenceClient(token=HF_TOKEN)

	# Different methods for different model types
	client.text_generation() # DialoGPT, Flan-T5
	client.conversational() # BlenderBot
	client.chat_completion() # Zephyr (chat models)
	```

	Rate Limiting & Error Handling:
	- Free tier: ~100-300 requests/hour
	- Graceful degradation with user-friendly error messages
	- Timeout and rate limit detection in exception handling

	## Environment Setup

	Required Environment Variable:
	```bash
	HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
	```

	Obtaining HF_TOKEN:
	1. Login to https://huggingface.co
	2. Settings → Access Tokens
	3. Create new token with "Read" permissions
	4. Copy to `.env` file (local) or Space secrets (deployment)

	## Adding New Models

	1. Add to MODELS dict in [app.py:23-45](app.py#L23-L45):
	```python
	"model-org/model-name": {
	"name": "Display Name",
	"max_length": 512,
	"temperature": 0.7,
	}
	```

	2. Update chat_response() if model requires special handling:
	- Check model name in conditional logic
	- Use appropriate InferenceClient method
	- Format prompt/messages according to model requirements

	3. Verify free tier compatibility:
	- Test model availability via Inference API
	- Check rate limits and response times
	- Update README.md model list

	## UI Customization

	Changing Language:
	- All UI strings are in Korean by default
	- Modify markdown strings and button labels in [app.py:140-220](app.py#L140-L220)

	Theme & Styling:
	```python
	gr.Blocks(theme=gr.themes.Soft()) # Change theme here
	```

	Chat Examples:
	- Modify `examples` parameter in ChatInterface [app.py:187-192](app.py#L187-L192)

	## Common Issues

	"Rate limit exceeded":
	- Free tier limitation, wait ~1 hour or upgrade to PRO ($9/month)

	Model timeout/unavailable:
	- High demand on free tier, try different model or retry later

	Space sleeping:
	- Spaces sleep after inactivity, first load may be slow

	## Testing Locally

	```bash
	# Ensure .env exists with HF_TOKEN
	python app.py

	# Test each model:
	# 1. Select model from dropdown
	# 2. Send test message
	# 3. Verify response generation
	# 4. Change model and verify chat resets
	```

	## Deployment Notes

	README.md YAML Header:
	- Required for Spaces configuration
	- Specifies SDK, Python version, app file
	- Auto-detected by Hugging Face

	Environment Variables in Spaces:
	- Set via Settings → Repository secrets
	- Name must match exactly: `HF_TOKEN`
	- Never commit tokens to repository

	Free Tier Constraints:
	- CPU only (no GPU)
	- Auto-sleep after inactivity
	- Rate limits on API calls
	- May experience slower inference