# Cora - Visual Curating System **AI-powered historical illustration generator for etymological applications** Cora transforms etymological entries into compelling historical illustrations using a hybrid approach of AI-generation with museum artifact fallback (RAG). --- ## ๐ŸŽฏ Overview Cora is a complete visual generation pipeline designed to enhance etymology applications with historically-authentic illustrations. When modern AI generation fails (e.g., API payment limits), the system seamlessly falls back to serving curated museum artifacts from Smithsonian and Met Museum collections. **Key Features:** - ๐ŸŽจ **Visual Curator**: LLM-powered prompt refinement for historical accuracy - ๐Ÿ–ผ๏ธ **Dual-Source Generation**: SDXL-Lightning primary + RAG fallback - ๐Ÿ›๏ธ **Museum Integration**: Automated ingestion from Smithsonian & Met APIs - ๐Ÿ” **Hybrid Search**: Semantic similarity + metadata filtering - ๐ŸŒ **Etymology API**: Production-ready endpoint for integration - ๐Ÿ’พ **Persistent Archive**: ChromaDB-based vector store with CLIP embeddings --- ## ๐Ÿ—๏ธ Architecture ``` Etymology App (Frontend) โ†“ Etymology API (Port 8000) โ†“ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ CORA PIPELINE โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ 1. Curator (Prompt Refinement) โ”‚ โ”‚ โ†“ โ”‚ โ”‚ 2. Engine (Image Generation) โ”‚ โ”‚ โ”œโ”€ Primary: SDXL-Lightning (HF API) โ”‚ โ”‚ โ””โ”€ Fallback: RAG (Museum Archives) โ”‚ โ”‚ โ†“ โ”‚ โ”‚ 3. Vision (CLIP Embeddings) โ”‚ โ”‚ โ†“ โ”‚ โ”‚ 4. Memory (ChromaDB Archival) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` --- ## ๐Ÿš€ Quick Start ### Prerequisites - Python 3.8+ - Hugging Face API Token (for generation) - Smithsonian API Key (for data ingestion) ### Installation ```bash # Clone/Navigate to project cd c:\Users\Administrador\cora # Install dependencies pip install -r requirements.txt # Configure environment # Edit .env file with your API keys: HF_API_TOKEN=your_huggingface_token SI_API_KEY=your_smithsonian_key ``` ### Running the System **Option 1: Full Stack (UI + API)** ```bash # Terminal 1: Start API server python api.py # Terminal 2: Start Gradio UI python ui.py # Access UI at http://127.0.0.1:7861 ``` **Option 2: Etymology API Only** ```bash # Start etymology integration endpoint python etymology_api.py # Test the endpoint python test_etymology_api.py ``` --- ## ๐Ÿ“ฆ Core Components ### 1. **CoraCurator** (`cora_curator.py`) LLM-powered prompt refinement for visual accuracy. ```python from cora_curator import CoraCurator curator = CoraCurator() refined = curator.refine_prompt("mercenaries") # โ†’ "Historical scene depicting Roman mercenaries in authentic armor..." ``` ### 2. **CoraEngine** (`cora_engine.py`) Image generation with automatic RAG fallback. ```python from cora_engine import CoraEngine engine = CoraEngine() image = engine.generate_from_text("Roman soldier") # Returns PIL Image (generated or museum artifact) ``` ### 3. **CoraVision** (`cora_vision.py`) CLIP-based visual embeddings + YOLO object detection. ```python from cora_vision import CoraVision vision = CoraVision() embedding = vision.embed_image(pil_image) # 768-dim vector tags = vision.detect_tags(pil_image) # ["person", "armor", "weapon"] ``` ### 4. **CoraMemory** (`cora_memory.py`) ChromaDB vector store with hybrid search. ```python from cora_memory import CoraMemory memory = CoraMemory() memory.save(path, embedding, prompt, tags) results = memory.search_hybrid(vector, k=5, tag_filter=["roman"]) ``` --- ## ๐Ÿ“š Data Sources ### Museum APIs - **Smithsonian Open Access**: `loaders/smithsonian_loader.py` - **Met Museum Collection**: `loaders/met_loader.py` **Example Usage:** ```bash # Load Roman artifacts from Met Museum python loaders/met_loader.py # Load from Smithsonian python loaders/smithsonian_loader.py ``` **Indexed Artifacts:** 16+ historical items (armor, sculptures, reliefs, engravings) --- ## ๐Ÿ”ง API Reference See [docs/README_ETYMOLOGY_API.md](docs/README_ETYMOLOGY_API.md) for complete API documentation. **Quick Example:** ```javascript fetch('http://localhost:8000/api/v1/generate_illustration', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ word: "gladiator", etymology_context: "From Latin 'gladius' (sword)", style: "historical_illustration" }) }) ``` --- ## ๐Ÿงช Testing ```bash # Test generation parameters python tests/test_gen_params.py # Test etymology API integration python tests/test_etymology_api.py # Verify system components python tests/verify_system.py ``` --- ## ๐Ÿ“ Project Structure ``` cora/ โ”œโ”€โ”€ api.py # Main API server (UI backend) โ”œโ”€โ”€ etymology_api.py # Etymology app integration endpoint โ”œโ”€โ”€ ui.py # Gradio interface โ”‚ โ”œโ”€โ”€ cora_curator.py # Prompt refinement (LLM) โ”œโ”€โ”€ cora_engine.py # Image generation + RAG โ”œโ”€โ”€ cora_vision.py # CLIP embeddings + YOLO โ”œโ”€โ”€ cora_memory.py # ChromaDB vector store โ”‚ โ”œโ”€โ”€ loaders/ โ”‚ โ”œโ”€โ”€ smithsonian_loader.py # Smithsonian API ingestion โ”‚ โ””โ”€โ”€ met_loader.py # Met Museum API ingestion โ”‚ โ”œโ”€โ”€ scripts/ โ”‚ โ””โ”€โ”€ load_roman_artifacts.py # Example: batch artifact loading โ”‚ โ”œโ”€โ”€ tests/ โ”‚ โ”œโ”€โ”€ test_etymology_api.py โ”‚ โ”œโ”€โ”€ test_gen_params.py โ”‚ โ”œโ”€โ”€ verify_system.py โ”‚ โ””โ”€โ”€ ... # Other test scripts โ”‚ โ”œโ”€โ”€ archive_images/ # Downloaded museum artifacts (gitignored) โ”œโ”€โ”€ archive_db/ # ChromaDB persistent storage (gitignored) โ”‚ โ”œโ”€โ”€ docs/ โ”‚ โ”œโ”€โ”€ README.md # Project overview (this file) โ”‚ โ”œโ”€โ”€ ARCHITECTURE.md # System design details โ”‚ โ”œโ”€โ”€ SETUP.md # Installation guide โ”‚ โ””โ”€โ”€ README_ETYMOLOGY_API.md # API integration guide โ”‚ โ”œโ”€โ”€ requirements.txt โ”œโ”€โ”€ .env # API keys (gitignored) โ””โ”€โ”€ .gitignore ``` --- ## ๐ŸŽจ Visual Style **Target Aesthetic:** Historical Illustration / Strategy Game Art **Prompt Engineering:** The system guides all prompts toward two narrative modes: - **Daily Life**: Authentic period scenes (markets, workshops, households) - **Epic Dimension**: Heroic/mythological moments (battles, ceremonies, divine encounters) **Technical Parameters (SDXL-Lightning):** - `guidance_scale = 0.0` (no CFG) - `num_inference_steps = 4` (ultra-fast) - Resolution: 1024x1024 --- ## ๐Ÿ” Search & Retrieval **Hybrid Search Strategy:** 1. **Semantic Search**: CLIP embeddings for visual similarity 2. **Metadata Filtering**: Cultural tags ("roman", "greek", "medieval") 3. **Auto-Detection**: API extracts keywords from queries **Example:** ```python # Query: "roman armor" # โ†’ Auto-detects "roman" keyword # โ†’ Filters results by tag:roman # โ†’ Returns only Roman artifacts (not French baroque) ``` --- ## ๐Ÿ›ก๏ธ Error Handling **Graceful Degradation:** 1. Primary generation (SDXL-Lightning) โ†’ 402 Payment Error 2. RAG Fallback โ†’ Search archive for relevant artifact 3. Serve museum image instead of failing **Zero Downtime:** System never returns an error if archive is populated. --- ## ๐Ÿšง Known Issues - **API Crashes**: Port 8000 conflicts occasionally require restart - **HF Rate Limits**: Free tier subject to usage quotas - **Museum APIs**: Smithsonian requires API key; Met is fully open --- ## ๐Ÿ“ License & Attribution **Museum Sources:** - Smithsonian Open Access (CC0) - Met Museum Open Access (Public Domain) **AI Models:** - SDXL-Lightning (Stability AI) - CLIP-ViT-L-14 (OpenAI) - YOLOv8 (Ultralytics) --- ## ๐Ÿค Contributing This project is part of a larger etymology application. For integration questions, see `docs/README_ETYMOLOGY_API.md`. --- **Built with the philosophy of blending synthetic creation with authentic historical preservation.**