# Cora - Visual Curating System

**AI-powered historical illustration generator for etymological applications**

Cora transforms etymological entries into compelling historical illustrations using a hybrid approach of AI-generation with museum artifact fallback (RAG).

---

## 🎯 Overview

Cora is a complete visual generation pipeline designed to enhance etymology applications with historically-authentic illustrations. When modern AI generation fails (e.g., API payment limits), the system seamlessly falls back to serving curated museum artifacts from Smithsonian and Met Museum collections.

**Key Features:**
- 🎨 **Visual Curator**: LLM-powered prompt refinement for historical accuracy
- 🖼️ **Dual-Source Generation**: SDXL-Lightning primary + RAG fallback
- 🏛️ **Museum Integration**: Automated ingestion from Smithsonian & Met APIs
- 🔍 **Hybrid Search**: Semantic similarity + metadata filtering
- 🌐 **Etymology API**: Production-ready endpoint for integration
- 💾 **Persistent Archive**: ChromaDB-based vector store with CLIP embeddings

---

## 🏗️ Architecture

```
Etymology App (Frontend)
    ↓
Etymology API (Port 8000)
    ↓
┌─────────────────────────────────────────┐
│  CORA PIPELINE                          │
├─────────────────────────────────────────┤
│  1. Curator (Prompt Refinement)         │
│     ↓                                    │
│  2. Engine (Image Generation)           │
│     ├─ Primary: SDXL-Lightning (HF API) │
│     └─ Fallback: RAG (Museum Archives)  │
│     ↓                                    │
│  3. Vision (CLIP Embeddings)            │
│     ↓                                    │
│  4. Memory (ChromaDB Archival)          │
└─────────────────────────────────────────┘
```

---

## 🚀 Quick Start

### Prerequisites
- Python 3.8+
- Hugging Face API Token (for generation)
- Smithsonian API Key (for data ingestion)

### Installation

```bash
# Clone/Navigate to project
cd c:\Users\Administrador\cora

# Install dependencies
pip install -r requirements.txt

# Configure environment
# Edit .env file with your API keys:
HF_API_TOKEN=your_huggingface_token
SI_API_KEY=your_smithsonian_key
```

### Running the System

**Option 1: Full Stack (UI + API)**
```bash
# Terminal 1: Start API server
python api.py

# Terminal 2: Start Gradio UI
python ui.py

# Access UI at http://127.0.0.1:7861
```

**Option 2: Etymology API Only**
```bash
# Start etymology integration endpoint
python etymology_api.py

# Test the endpoint
python test_etymology_api.py
```

---

## 📦 Core Components

### 1. **CoraCurator** (`cora_curator.py`)
LLM-powered prompt refinement for visual accuracy.

```python
from cora_curator import CoraCurator

curator = CoraCurator()
refined = curator.refine_prompt("mercenaries")
# → "Historical scene depicting Roman mercenaries in authentic armor..."
```

### 2. **CoraEngine** (`cora_engine.py`)
Image generation with automatic RAG fallback.

```python
from cora_engine import CoraEngine

engine = CoraEngine()
image = engine.generate_from_text("Roman soldier")
# Returns PIL Image (generated or museum artifact)
```

### 3. **CoraVision** (`cora_vision.py`)
CLIP-based visual embeddings + YOLO object detection.

```python
from cora_vision import CoraVision

vision = CoraVision()
embedding = vision.embed_image(pil_image)  # 768-dim vector
tags = vision.detect_tags(pil_image)  # ["person", "armor", "weapon"]
```

### 4. **CoraMemory** (`cora_memory.py`)
ChromaDB vector store with hybrid search.

```python
from cora_memory import CoraMemory

memory = CoraMemory()
memory.save(path, embedding, prompt, tags)
results = memory.search_hybrid(vector, k=5, tag_filter=["roman"])
```

---

## 📚 Data Sources

### Museum APIs
- **Smithsonian Open Access**: `loaders/smithsonian_loader.py`
- **Met Museum Collection**: `loaders/met_loader.py`

**Example Usage:**
```bash
# Load Roman artifacts from Met Museum
python loaders/met_loader.py

# Load from Smithsonian
python loaders/smithsonian_loader.py
```

**Indexed Artifacts:** 16+ historical items (armor, sculptures, reliefs, engravings)

---

## 🔧 API Reference

See [docs/README_ETYMOLOGY_API.md](docs/README_ETYMOLOGY_API.md) for complete API documentation.

**Quick Example:**
```javascript
fetch('http://localhost:8000/api/v1/generate_illustration', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    word: "gladiator",
    etymology_context: "From Latin 'gladius' (sword)",
    style: "historical_illustration"
  })
})
```

---

## 🧪 Testing

```bash
# Test generation parameters
python tests/test_gen_params.py

# Test etymology API integration
python tests/test_etymology_api.py

# Verify system components
python tests/verify_system.py
```

---

## 📁 Project Structure

```
cora/
├── api.py                      # Main API server (UI backend)
├── etymology_api.py            # Etymology app integration endpoint
├── ui.py                       # Gradio interface
│
├── cora_curator.py             # Prompt refinement (LLM)
├── cora_engine.py              # Image generation + RAG
├── cora_vision.py              # CLIP embeddings + YOLO
├── cora_memory.py              # ChromaDB vector store
│
├── loaders/
│   ├── smithsonian_loader.py   # Smithsonian API ingestion
│   └── met_loader.py           # Met Museum API ingestion
│
├── scripts/
│   └── load_roman_artifacts.py # Example: batch artifact loading
│
├── tests/
│   ├── test_etymology_api.py
│   ├── test_gen_params.py
│   ├── verify_system.py
│   └── ...                     # Other test scripts
│
├── archive_images/             # Downloaded museum artifacts (gitignored)
├── archive_db/                 # ChromaDB persistent storage (gitignored)
│
├── docs/
│   ├── README.md               # Project overview (this file)
│   ├── ARCHITECTURE.md         # System design details
│   ├── SETUP.md                # Installation guide
│   └── README_ETYMOLOGY_API.md # API integration guide
│
├── requirements.txt
├── .env                        # API keys (gitignored)
└── .gitignore
```

---

## 🎨 Visual Style

**Target Aesthetic:** Historical Illustration / Strategy Game Art

**Prompt Engineering:** The system guides all prompts toward two narrative modes:
- **Daily Life**: Authentic period scenes (markets, workshops, households)
- **Epic Dimension**: Heroic/mythological moments (battles, ceremonies, divine encounters)

**Technical Parameters (SDXL-Lightning):**
- `guidance_scale = 0.0` (no CFG)
- `num_inference_steps = 4` (ultra-fast)
- Resolution: 1024x1024

---

## 🔍 Search & Retrieval

**Hybrid Search Strategy:**
1. **Semantic Search**: CLIP embeddings for visual similarity
2. **Metadata Filtering**: Cultural tags ("roman", "greek", "medieval")
3. **Auto-Detection**: API extracts keywords from queries

**Example:**
```python
# Query: "roman armor"
# → Auto-detects "roman" keyword
# → Filters results by tag:roman
# → Returns only Roman artifacts (not French baroque)
```

---

## 🛡️ Error Handling

**Graceful Degradation:**
1. Primary generation (SDXL-Lightning) → 402 Payment Error
2. RAG Fallback → Search archive for relevant artifact
3. Serve museum image instead of failing

**Zero Downtime:** System never returns an error if archive is populated.

---

## 🚧 Known Issues

- **API Crashes**: Port 8000 conflicts occasionally require restart
- **HF Rate Limits**: Free tier subject to usage quotas
- **Museum APIs**: Smithsonian requires API key; Met is fully open

---

## 📝 License & Attribution

**Museum Sources:**
- Smithsonian Open Access (CC0)
- Met Museum Open Access (Public Domain)

**AI Models:**
- SDXL-Lightning (Stability AI)
- CLIP-ViT-L-14 (OpenAI)
- YOLOv8 (Ultralytics)

---

## 🤝 Contributing

This project is part of a larger etymology application. For integration questions, see `docs/README_ETYMOLOGY_API.md`.

---

**Built with the philosophy of blending synthetic creation with authentic historical preservation.**