A newer version of the Gradio SDK is available:
6.6.0
Copilot Instructions for Jacek AI
This file provides guidance for GitHub Copilot when working with the Jacek AI codebase - a bilingual (Polish/English) accessibility chatbot using RAG with LanceDB and OpenAI GPT-4.
Build, Test, and Run Commands
Running the Application
# Local development - starts Gradio UI at http://127.0.0.1:7860
python app.py
# Run all startup tests before deployment
python test_startup.py
Environment Setup
# Install dependencies
pip install -r requirements.txt
# Configure environment (required before first run)
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY
Database Management
# Compact LanceDB (removes version history, reduces file count)
python compact_database.py
# Check document count
python -c "import lancedb; db = lancedb.connect('./lancedb'); print(len(db.open_table('a11y_expert')))"
Testing
# Run full test suite (imports, config, vector store, embeddings, agent)
python test_startup.py
# All tests must pass before deploying to Hugging Face Spaces
Architecture Overview
Core Components
Agent System (agent/)
a11y_agent.py: MainA11yExpertAgentclass with streaming responses via OpenAIprompts.py: Language-specific system prompts (Polish/English) with strict language enforcementtools.py: RAG tools for knowledge base search (top-5 semantic results)
Vector Store (database/)
vector_store_client.py: LanceDB client with lazy loading and automatic reconnection- Database path:
./lancedb/a11y_expert.lance(tracked with Git LFS) - READ-ONLY in production (Hugging Face Spaces environment)
Embeddings (models/)
embeddings.py: OpenAI embeddings client with disk caching (./cache/embeddings) and retry logic- Model:
text-embedding-3-large(3072 dimensions) - Singleton pattern: use
get_embeddings_client()for shared instance
UI (app.py)
- Gradio ChatInterface with two-column layout (chat + notes from
notes.md) - Lazy agent initialization - agent loads on first user query, not at startup
- Streaming responses for better UX
Configuration (config.py)
- Pydantic settings with environment variable support
- All config loaded from
.envfile (never hardcode secrets) - Required:
OPENAI_API_KEY(OpenAI API key for LLM and embeddings)
Data Flow (RAG Pipeline)
- User asks question in Gradio UI
- Language detected from query using
langdetect(Polish or English) - Query embedded using OpenAI embeddings API (with cache lookup)
- Vector search in LanceDB (filtered by language:
where="language = 'pl'"or'en') - Top 5 results formatted as context
- Context + query + language-specific system prompt sent to GPT-4
- Response streamed back to UI token-by-token
Key Design Patterns
- Lazy Initialization: Agent and database connections initialize on first use, not at startup (faster deployment)
- Singleton Pattern:
get_embeddings_client()returns shared instance across the app - Language Detection: Auto-detects query language and adjusts both prompt and vector search filter
- Stateless Agent: No internal conversation history (Gradio handles history in UI)
- Conversation Context: Last 4 messages kept in context for follow-up questions
Key Conventions
Language Handling - CRITICAL
The agent has strict language enforcement in system prompts:
- Polish queries get
SYSTEM_PROMPT_PLwith "CRITICAL: Answer ONLY in Polish" - English queries get
SYSTEM_PROMPT_ENwith "CRITICAL: Answer ONLY in English" - System prompts explicitly instruct the LLM to translate sources if needed
- Vector search is language-filtered:
where="language = 'pl'"orwhere="language = 'en'"
When modifying prompts: Never remove or weaken the language enforcement instructions - they prevent language mixing which confuses users.
LanceDB Database - READ-ONLY in Production
- Database at
./lancedb/is tracked with Git LFS (not generated at runtime) - In Hugging Face Spaces: database is read-only (filesystem is immutable)
- For local development: use
VectorStoreClient.add_documents()to add data - After local changes: run
compact_database.pyto reduce file count before committing - Schema:
text,vector,source,language,doc_type,created_at,updated_at
Configuration Loading
All settings in config.py are loaded from environment variables:
from config import get_settings
settings = get_settings() # Singleton, cached
print(settings.llm_model) # gpt-4o (default)
Never access environment variables directly - always use get_settings().
Hugging Face Spaces Deployment
Critical deployment requirements:
demo.queue()must be called explicitly (seeapp.py:238-243)- Do NOT use
atexit.register()for cleanup (causes premature shutdown) - LanceDB must be committed with Git LFS (database is read-only in HF)
- API key stored as HF Spaces Secret:
OPENAI_API_KEY - The
if __name__ == "__main__"block handles both local and HF deployments
Testing before deployment:
python test_startup.py # All tests must pass
Logging
Use loguru for all logging (already configured):
from loguru import logger
logger.info("Starting process...")
logger.success("✅ Completed successfully")
logger.error(f"❌ Failed: {error}")
Set LOG_LEVEL=DEBUG in .env for verbose output during development.
Error Handling
- Always close resources in agent/client classes (implement
close()method) - Use try/except with specific exception types
- Log full traceback for debugging:
logger.error(traceback.format_exc()) - For user-facing errors, provide clear Polish/English messages depending on detected language
Project Structure
JacekAI/
├── agent/ # Core agent logic
│ ├── a11y_agent.py # Main agent with RAG
│ ├── prompts.py # Language-specific prompts (PL/EN)
│ └── tools.py # Knowledge base search tools
├── database/
│ └── vector_store_client.py # LanceDB client
├── models/
│ └── embeddings.py # OpenAI embeddings with caching
├── lancedb/ # Vector database (Git LFS)
│ └── a11y_expert.lance/
├── cache/ # Embeddings cache (gitignored)
├── app.py # Gradio UI with lazy initialization
├── config.py # Pydantic settings (environment variables)
├── test_startup.py # Deployment readiness tests
├── compact_database.py # Database compaction utility
├── requirements.txt # Python dependencies
├── .env.example # Environment template
└── notes.md # Optional notes displayed in UI sidebar
Important Implementation Notes
When Adding New Features to Agent
- Modifying prompts → Edit
agent/prompts.py - Adding new tools → Add function to
agent/tools.py - Changing RAG logic → Modify
agent/a11y_agent.py - Test locally with
python app.pyand interact through UI
When Updating Dependencies
- Edit
requirements.txt - Run
pip install -r requirements.txt - Test with
python test_startup.py - Commit changes and test in HF Spaces
When Debugging
- Set
LOG_LEVEL=DEBUGin.envfor verbose logging - Agent initialization happens on first query (check logs for "A11yExpertAgent initialized")
- Embeddings cache is at
./cache/embeddings(create directory if missing) - Vector search logs show retrieved context from database
Common Pitfalls
- DO NOT modify the database in production (LanceDB is read-only on HF Spaces)
- DO NOT use
atexit.register()inapp.py(breaks HF Spaces deployment) - DO NOT weaken language enforcement in prompts (causes confusing mixed-language responses)
- DO NOT access
os.environdirectly - always useget_settings() - DO NOT initialize agent at module level - use lazy initialization pattern
- DO NOT forget to call
demo.queue()beforedemo.launch()in Gradio
Environment Variables
Required in .env file:
OPENAI_API_KEY- OpenAI API key for LLM and embeddings - REQUIRED
Optional (with defaults):
LLM_MODEL- Language model (default:gpt-4o-mini)LLM_BASE_URL- API endpoint (default: GitHub Models endpoint)EMBEDDING_MODEL- Embedding model (default:text-embedding-3-large)LANCEDB_URI- Database path (default:./lancedb)LANCEDB_TABLE- Table name (default:a11y_expert)LOG_LEVEL- Logging verbosity (default:INFO)SERVER_HOST- Gradio host (default:127.0.0.1, use0.0.0.0for HF)SERVER_PORT- Gradio port (default:7860)
Related Documentation
CLAUDE.md- Detailed guidance for Claude Code (includes architectural details)README.md- User-facing documentation with setup instructionsHF_SPACES_GUIDE.md- Hugging Face Spaces deployment guideQUICK_REFERENCE.md- Quick reference for common tasks