Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.14.0
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Talker is a Gradio + FastAPI application deployed on HuggingFace Spaces that implements an AI chat agent with Open Floor Protocol (OFP) support. It uses Qwen3 (via transformers) for LLM inference, with ZeroGPU acceleration on HuggingFace Spaces.
Running the Application
pip install -r requirements.txt
python app.py
# Access at http://localhost:7860
Testing
# Test OFP endpoints against deployed space
python test_ofp_endpoint.py
# Validate deployment with curl
./validate_deployment.sh [BASE_URL]
# e.g. ./validate_deployment.sh https://bladeszasza-talker.hf.space
Architecture
The app is a single app.py that wires together Gradio UI, FastAPI routes, and the src/ modules:
app.pyβ Entry point. CreatesLLMClientandChatAgent, defines the@spaces.GPUinference function (llm_generate_gpu), builds the Gradiogr.BlocksUI, and registers OFP API routes.src/llm_client.pyβ Wraps Qwen3 via HuggingFacetransformers. Tokenizer loads eagerly; model weights load lazily on first GPU call insidegenerate_response_from_messages().src/chat_agent.pyβ Maintains conversation history and agent stats (messages_processed,responses_sent). Handles OFP envelope processing viaprocess_envelope().src/ofp_client.pyβ Sends OFP envelopes to external conveners/agents via HTTP POST.src/models.pyβ Dataclasses forEnvelope,DialogEvent,Event,Identification.config/config.yamlβ All runtime config: agent URIs, LLM model/params, UI settings.
Critical Routing Detail
Gradio mounts a SvelteKit catch-all at / that intercepts any routes registered directly on demo.app. Custom FastAPI routes must use a prefix that Gradio doesn't claim.
The app uses APIRouter(prefix="/ofp-api") and calls demo.app.include_router(ofp_router) before demo.launch(). External OFP endpoints are:
GET /ofp-api/manifestPOST /ofp-api/ofp
If /ofp-api/ collides with Gradio internals, change it to /xapi/ or another prefix and verify with:
print([r.path for r in demo.app.routes])
ZeroGPU Constraint
@spaces.GPU decorated functions must be defined at module level in app.py (not inside classes or nested functions) so the HuggingFace startup scanner can detect them. The model is always invoked through llm_generate_gpu() β never call llm_client.generate_response_from_messages() directly from app code, as it requires a GPU context.
Configuration
Edit config/config.yaml to change the model or agent settings. Key fields:
agent:
speaker_uri: 'tag:talker.service,2025:agent-01'
service_url: 'https://<space>.hf.space/ofp-api/ofp'
llm:
model: 'Qwen/Qwen3-0.6B' # 0.6B is fast on CPU; 4B is slower but better quality
max_tokens: 16384
temperature: 0.7
Environment Variables
HF_TOKENβ Required for private HuggingFace models; not needed for public Qwen3 models.OPENAI_API_KEYβ If switchingllm.providertoopenai.
OFP Event Handling
The /ofp-api/ofp endpoint processes two event types in-line (in app.py, not via ChatAgent.process_envelope):
getManifestsβ returnspublishManifestresponse immediatelyutteranceβ callsllm_generate_gpu, appends toagent.conversation_history, returns OFP utterance envelope
ChatAgent.process_envelope() in src/chat_agent.py exists but is not called by the main app flow β the API endpoint handles events directly. ChatAgent is used for state tracking (conversation_history, messages_processed, responses_sent) and the debug panel.