JacekAI / .github /copilot-instructions.md
Jacek Zadrożny
Revert to OPENAI_API_KEY and switch to gpt-4o-mini
f2986d3
# Copilot Instructions for Jacek AI
This file provides guidance for GitHub Copilot when working with the Jacek AI codebase - a bilingual (Polish/English) accessibility chatbot using RAG with LanceDB and OpenAI GPT-4.
## Build, Test, and Run Commands
### Running the Application
```bash
# Local development - starts Gradio UI at http://127.0.0.1:7860
python app.py
# Run all startup tests before deployment
python test_startup.py
```
### Environment Setup
```bash
# Install dependencies
pip install -r requirements.txt
# Configure environment (required before first run)
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY
```
### Database Management
```bash
# Compact LanceDB (removes version history, reduces file count)
python compact_database.py
# Check document count
python -c "import lancedb; db = lancedb.connect('./lancedb'); print(len(db.open_table('a11y_expert')))"
```
### Testing
```bash
# Run full test suite (imports, config, vector store, embeddings, agent)
python test_startup.py
# All tests must pass before deploying to Hugging Face Spaces
```
## Architecture Overview
### Core Components
**Agent System** (`agent/`)
- `a11y_agent.py`: Main `A11yExpertAgent` class with streaming responses via OpenAI
- `prompts.py`: Language-specific system prompts (Polish/English) with **strict language enforcement**
- `tools.py`: RAG tools for knowledge base search (top-5 semantic results)
**Vector Store** (`database/`)
- `vector_store_client.py`: LanceDB client with lazy loading and automatic reconnection
- Database path: `./lancedb/a11y_expert.lance` (tracked with Git LFS)
- **READ-ONLY in production** (Hugging Face Spaces environment)
**Embeddings** (`models/`)
- `embeddings.py`: OpenAI embeddings client with disk caching (`./cache/embeddings`) and retry logic
- Model: `text-embedding-3-large` (3072 dimensions)
- Singleton pattern: use `get_embeddings_client()` for shared instance
**UI** (`app.py`)
- Gradio ChatInterface with two-column layout (chat + notes from `notes.md`)
- **Lazy agent initialization** - agent loads on first user query, not at startup
- Streaming responses for better UX
**Configuration** (`config.py`)
- Pydantic settings with environment variable support
- All config loaded from `.env` file (never hardcode secrets)
- Required: `OPENAI_API_KEY` (OpenAI API key for LLM and embeddings)
### Data Flow (RAG Pipeline)
1. User asks question in Gradio UI
2. Language detected from query using `langdetect` (Polish or English)
3. Query embedded using OpenAI embeddings API (with cache lookup)
4. Vector search in LanceDB (filtered by language: `where="language = 'pl'"` or `'en'`)
5. Top 5 results formatted as context
6. Context + query + language-specific system prompt sent to GPT-4
7. Response streamed back to UI token-by-token
### Key Design Patterns
- **Lazy Initialization**: Agent and database connections initialize on first use, not at startup (faster deployment)
- **Singleton Pattern**: `get_embeddings_client()` returns shared instance across the app
- **Language Detection**: Auto-detects query language and adjusts both prompt and vector search filter
- **Stateless Agent**: No internal conversation history (Gradio handles history in UI)
- **Conversation Context**: Last 4 messages kept in context for follow-up questions
## Key Conventions
### Language Handling - CRITICAL
The agent has **strict language enforcement** in system prompts:
- Polish queries get `SYSTEM_PROMPT_PL` with "CRITICAL: Answer ONLY in Polish"
- English queries get `SYSTEM_PROMPT_EN` with "CRITICAL: Answer ONLY in English"
- System prompts explicitly instruct the LLM to translate sources if needed
- Vector search is language-filtered: `where="language = 'pl'"` or `where="language = 'en'"`
**When modifying prompts**: Never remove or weaken the language enforcement instructions - they prevent language mixing which confuses users.
### LanceDB Database - READ-ONLY in Production
- Database at `./lancedb/` is tracked with Git LFS (not generated at runtime)
- In Hugging Face Spaces: database is read-only (filesystem is immutable)
- For local development: use `VectorStoreClient.add_documents()` to add data
- After local changes: run `compact_database.py` to reduce file count before committing
- Schema: `text`, `vector`, `source`, `language`, `doc_type`, `created_at`, `updated_at`
### Configuration Loading
All settings in `config.py` are loaded from environment variables:
```python
from config import get_settings
settings = get_settings() # Singleton, cached
print(settings.llm_model) # gpt-4o (default)
```
Never access environment variables directly - always use `get_settings()`.
### Hugging Face Spaces Deployment
**Critical deployment requirements**:
1. `demo.queue()` must be called explicitly (see `app.py:238-243`)
2. Do **NOT** use `atexit.register()` for cleanup (causes premature shutdown)
3. LanceDB must be committed with Git LFS (database is read-only in HF)
4. API key stored as HF Spaces Secret: `OPENAI_API_KEY`
5. The `if __name__ == "__main__"` block handles both local and HF deployments
**Testing before deployment**:
```bash
python test_startup.py # All tests must pass
```
### Logging
Use loguru for all logging (already configured):
```python
from loguru import logger
logger.info("Starting process...")
logger.success("✅ Completed successfully")
logger.error(f"❌ Failed: {error}")
```
Set `LOG_LEVEL=DEBUG` in `.env` for verbose output during development.
### Error Handling
- Always close resources in agent/client classes (implement `close()` method)
- Use try/except with specific exception types
- Log full traceback for debugging: `logger.error(traceback.format_exc())`
- For user-facing errors, provide clear Polish/English messages depending on detected language
## Project Structure
```
JacekAI/
├── agent/ # Core agent logic
│ ├── a11y_agent.py # Main agent with RAG
│ ├── prompts.py # Language-specific prompts (PL/EN)
│ └── tools.py # Knowledge base search tools
├── database/
│ └── vector_store_client.py # LanceDB client
├── models/
│ └── embeddings.py # OpenAI embeddings with caching
├── lancedb/ # Vector database (Git LFS)
│ └── a11y_expert.lance/
├── cache/ # Embeddings cache (gitignored)
├── app.py # Gradio UI with lazy initialization
├── config.py # Pydantic settings (environment variables)
├── test_startup.py # Deployment readiness tests
├── compact_database.py # Database compaction utility
├── requirements.txt # Python dependencies
├── .env.example # Environment template
└── notes.md # Optional notes displayed in UI sidebar
```
## Important Implementation Notes
### When Adding New Features to Agent
1. Modifying prompts → Edit `agent/prompts.py`
2. Adding new tools → Add function to `agent/tools.py`
3. Changing RAG logic → Modify `agent/a11y_agent.py`
4. Test locally with `python app.py` and interact through UI
### When Updating Dependencies
1. Edit `requirements.txt`
2. Run `pip install -r requirements.txt`
3. Test with `python test_startup.py`
4. Commit changes and test in HF Spaces
### When Debugging
- Set `LOG_LEVEL=DEBUG` in `.env` for verbose logging
- Agent initialization happens on first query (check logs for "A11yExpertAgent initialized")
- Embeddings cache is at `./cache/embeddings` (create directory if missing)
- Vector search logs show retrieved context from database
## Common Pitfalls
1. **DO NOT** modify the database in production (LanceDB is read-only on HF Spaces)
2. **DO NOT** use `atexit.register()` in `app.py` (breaks HF Spaces deployment)
3. **DO NOT** weaken language enforcement in prompts (causes confusing mixed-language responses)
4. **DO NOT** access `os.environ` directly - always use `get_settings()`
5. **DO NOT** initialize agent at module level - use lazy initialization pattern
6. **DO NOT** forget to call `demo.queue()` before `demo.launch()` in Gradio
## Environment Variables
Required in `.env` file:
- `OPENAI_API_KEY` - OpenAI API key for LLM and embeddings - **REQUIRED**
Optional (with defaults):
- `LLM_MODEL` - Language model (default: `gpt-4o-mini`)
- `LLM_BASE_URL` - API endpoint (default: GitHub Models endpoint)
- `EMBEDDING_MODEL` - Embedding model (default: `text-embedding-3-large`)
- `LANCEDB_URI` - Database path (default: `./lancedb`)
- `LANCEDB_TABLE` - Table name (default: `a11y_expert`)
- `LOG_LEVEL` - Logging verbosity (default: `INFO`)
- `SERVER_HOST` - Gradio host (default: `127.0.0.1`, use `0.0.0.0` for HF)
- `SERVER_PORT` - Gradio port (default: `7860`)
## Related Documentation
- `CLAUDE.md` - Detailed guidance for Claude Code (includes architectural details)
- `README.md` - User-facing documentation with setup instructions
- `HF_SPACES_GUIDE.md` - Hugging Face Spaces deployment guide
- `QUICK_REFERENCE.md` - Quick reference for common tasks