Spaces:

jaczad
/

JacekAI

Sleeping

App Files Files Community

JacekAI / .github /copilot-instructions.md

Jacek Zadrożny

Revert to OPENAI_API_KEY and switch to gpt-4o-mini

f2986d3 15 days ago

preview code

raw

history blame contribute delete

9.09 kB

	# Copilot Instructions for Jacek AI

	This file provides guidance for GitHub Copilot when working with the Jacek AI codebase - a bilingual (Polish/English) accessibility chatbot using RAG with LanceDB and OpenAI GPT-4.

	## Build, Test, and Run Commands

	### Running the Application
	```bash
	# Local development - starts Gradio UI at http://127.0.0.1:7860
	python app.py

	# Run all startup tests before deployment
	python test_startup.py
	```

	### Environment Setup
	```bash
	# Install dependencies
	pip install -r requirements.txt

	# Configure environment (required before first run)
	cp .env.example .env
	# Edit .env and add your OPENAI_API_KEY
	```

	### Database Management
	```bash
	# Compact LanceDB (removes version history, reduces file count)
	python compact_database.py

	# Check document count
	python -c "import lancedb; db = lancedb.connect('./lancedb'); print(len(db.open_table('a11y_expert')))"
	```

	### Testing
	```bash
	# Run full test suite (imports, config, vector store, embeddings, agent)
	python test_startup.py

	# All tests must pass before deploying to Hugging Face Spaces
	```

	## Architecture Overview

	### Core Components

	Agent System (`agent/`)
	- `a11y_agent.py`: Main `A11yExpertAgent` class with streaming responses via OpenAI
	- `prompts.py`: Language-specific system prompts (Polish/English) with strict language enforcement
	- `tools.py`: RAG tools for knowledge base search (top-5 semantic results)

	Vector Store (`database/`)
	- `vector_store_client.py`: LanceDB client with lazy loading and automatic reconnection
	- Database path: `./lancedb/a11y_expert.lance` (tracked with Git LFS)
	- READ-ONLY in production (Hugging Face Spaces environment)

	Embeddings (`models/`)
	- `embeddings.py`: OpenAI embeddings client with disk caching (`./cache/embeddings`) and retry logic
	- Model: `text-embedding-3-large` (3072 dimensions)
	- Singleton pattern: use `get_embeddings_client()` for shared instance

	UI (`app.py`)
	- Gradio ChatInterface with two-column layout (chat + notes from `notes.md`)
	- Lazy agent initialization - agent loads on first user query, not at startup
	- Streaming responses for better UX

	Configuration (`config.py`)
	- Pydantic settings with environment variable support
	- All config loaded from `.env` file (never hardcode secrets)
	- Required: `OPENAI_API_KEY` (OpenAI API key for LLM and embeddings)

	### Data Flow (RAG Pipeline)

	1. User asks question in Gradio UI
	2. Language detected from query using `langdetect` (Polish or English)
	3. Query embedded using OpenAI embeddings API (with cache lookup)
	4. Vector search in LanceDB (filtered by language: `where="language = 'pl'"` or `'en'`)
	5. Top 5 results formatted as context
	6. Context + query + language-specific system prompt sent to GPT-4
	7. Response streamed back to UI token-by-token

	### Key Design Patterns

	- Lazy Initialization: Agent and database connections initialize on first use, not at startup (faster deployment)
	- Singleton Pattern: `get_embeddings_client()` returns shared instance across the app
	- Language Detection: Auto-detects query language and adjusts both prompt and vector search filter
	- Stateless Agent: No internal conversation history (Gradio handles history in UI)
	- Conversation Context: Last 4 messages kept in context for follow-up questions

	## Key Conventions

	### Language Handling - CRITICAL

	The agent has strict language enforcement in system prompts:
	- Polish queries get `SYSTEM_PROMPT_PL` with "CRITICAL: Answer ONLY in Polish"
	- English queries get `SYSTEM_PROMPT_EN` with "CRITICAL: Answer ONLY in English"
	- System prompts explicitly instruct the LLM to translate sources if needed
	- Vector search is language-filtered: `where="language = 'pl'"` or `where="language = 'en'"`

	When modifying prompts: Never remove or weaken the language enforcement instructions - they prevent language mixing which confuses users.

	### LanceDB Database - READ-ONLY in Production

	- Database at `./lancedb/` is tracked with Git LFS (not generated at runtime)
	- In Hugging Face Spaces: database is read-only (filesystem is immutable)
	- For local development: use `VectorStoreClient.add_documents()` to add data
	- After local changes: run `compact_database.py` to reduce file count before committing
	- Schema: `text`, `vector`, `source`, `language`, `doc_type`, `created_at`, `updated_at`

	### Configuration Loading

	All settings in `config.py` are loaded from environment variables:
	```python
	from config import get_settings

	settings = get_settings() # Singleton, cached
	print(settings.llm_model) # gpt-4o (default)
	```

	Never access environment variables directly - always use `get_settings()`.

	### Hugging Face Spaces Deployment

	Critical deployment requirements:
	1. `demo.queue()` must be called explicitly (see `app.py:238-243`)
	2. Do NOT use `atexit.register()` for cleanup (causes premature shutdown)
	3. LanceDB must be committed with Git LFS (database is read-only in HF)
	4. API key stored as HF Spaces Secret: `OPENAI_API_KEY`
	5. The `if __name__ == "__main__"` block handles both local and HF deployments

	Testing before deployment:
	```bash
	python test_startup.py # All tests must pass
	```

	### Logging

	Use loguru for all logging (already configured):
	```python
	from loguru import logger

	logger.info("Starting process...")
	logger.success("✅ Completed successfully")
	logger.error(f"❌ Failed: {error}")
	```

	Set `LOG_LEVEL=DEBUG` in `.env` for verbose output during development.

	### Error Handling

	- Always close resources in agent/client classes (implement `close()` method)
	- Use try/except with specific exception types
	- Log full traceback for debugging: `logger.error(traceback.format_exc())`
	- For user-facing errors, provide clear Polish/English messages depending on detected language

	## Project Structure

	```
	JacekAI/
	├── agent/ # Core agent logic
	│ ├── a11y_agent.py # Main agent with RAG
	│ ├── prompts.py # Language-specific prompts (PL/EN)
	│ └── tools.py # Knowledge base search tools
	├── database/
	│ └── vector_store_client.py # LanceDB client
	├── models/
	│ └── embeddings.py # OpenAI embeddings with caching
	├── lancedb/ # Vector database (Git LFS)
	│ └── a11y_expert.lance/
	├── cache/ # Embeddings cache (gitignored)
	├── app.py # Gradio UI with lazy initialization
	├── config.py # Pydantic settings (environment variables)
	├── test_startup.py # Deployment readiness tests
	├── compact_database.py # Database compaction utility
	├── requirements.txt # Python dependencies
	├── .env.example # Environment template
	└── notes.md # Optional notes displayed in UI sidebar
	```

	## Important Implementation Notes

	### When Adding New Features to Agent

	1. Modifying prompts → Edit `agent/prompts.py`
	2. Adding new tools → Add function to `agent/tools.py`
	3. Changing RAG logic → Modify `agent/a11y_agent.py`
	4. Test locally with `python app.py` and interact through UI

	### When Updating Dependencies

	1. Edit `requirements.txt`
	2. Run `pip install -r requirements.txt`
	3. Test with `python test_startup.py`
	4. Commit changes and test in HF Spaces

	### When Debugging

	- Set `LOG_LEVEL=DEBUG` in `.env` for verbose logging
	- Agent initialization happens on first query (check logs for "A11yExpertAgent initialized")
	- Embeddings cache is at `./cache/embeddings` (create directory if missing)
	- Vector search logs show retrieved context from database

	## Common Pitfalls

	1. DO NOT modify the database in production (LanceDB is read-only on HF Spaces)
	2. DO NOT use `atexit.register()` in `app.py` (breaks HF Spaces deployment)
	3. DO NOT weaken language enforcement in prompts (causes confusing mixed-language responses)
	4. DO NOT access `os.environ` directly - always use `get_settings()`
	5. DO NOT initialize agent at module level - use lazy initialization pattern
	6. DO NOT forget to call `demo.queue()` before `demo.launch()` in Gradio

	## Environment Variables

	Required in `.env` file:
	- `OPENAI_API_KEY` - OpenAI API key for LLM and embeddings - REQUIRED

	Optional (with defaults):
	- `LLM_MODEL` - Language model (default: `gpt-4o-mini`)
	- `LLM_BASE_URL` - API endpoint (default: GitHub Models endpoint)
	- `EMBEDDING_MODEL` - Embedding model (default: `text-embedding-3-large`)
	- `LANCEDB_URI` - Database path (default: `./lancedb`)
	- `LANCEDB_TABLE` - Table name (default: `a11y_expert`)
	- `LOG_LEVEL` - Logging verbosity (default: `INFO`)
	- `SERVER_HOST` - Gradio host (default: `127.0.0.1`, use `0.0.0.0` for HF)
	- `SERVER_PORT` - Gradio port (default: `7860`)

	## Related Documentation

	- `CLAUDE.md` - Detailed guidance for Claude Code (includes architectural details)
	- `README.md` - User-facing documentation with setup instructions
	- `HF_SPACES_GUIDE.md` - Hugging Face Spaces deployment guide
	- `QUICK_REFERENCE.md` - Quick reference for common tasks