Architecture Overview
System Design Philosophy
Cora is built on three core principles:
- Graceful Degradation: Never fail completely; always serve a visual result
- RAG over Fine-Tuning: Use museum archives to provide context without costly training
- Hybrid Intelligence: Combine AI generation with curated historical data
Component Architecture
Layer 1: Interface
- UI (Gradio):
ui.py- Testing/demo interface - Etymology API (FastAPI):
etymology_api.py- Production integration endpoint
Layer 2: Generation Pipeline
CoraCurator → CoraEngine → CoraVision → CoraMemory
(LLM) (SDXL) (CLIP) (ChromaDB)
Layer 3: Data Sources
- Primary: Hugging Face Inference API (SDXL-Lightning)
- Fallback: Museum Archives (Smithsonian + Met)
Data Flow
Generation Request Flow
1. User Request
↓
2. Curator: Refine prompt with LLM
↓
3. Engine: Attempt SDXL generation
├─ Success → Continue to step 4
└─ 402 Error → RAG Fallback
↓
Search Memory by embedding
↓
Return museum artifact
↓
4. Vision: Generate embedding + tags
↓
5. Memory: Archive for future retrieval
↓
6. Response: Image URL + metadata
Ingestion Flow (Museums)
1. Loader (smithsonian_loader.py or met_loader.py)
↓
2. API Query → Download images
↓
3. Vision: Generate embedding + detect tags
↓
4. Memory: Index with metadata
↓
5. Persistent storage in ChromaDB
Search Strategy
Hybrid Search Algorithm
Input: Query text (e.g., "roman armor")
Process:
- Text → Vector: CLIP text encoder
- Keyword Detection: Extract cultural markers ("roman", "greek", etc.)
- Over-Retrieve: Fetch 3x candidates via semantic search
- Filter: Apply tag constraints (must contain "roman")
- Rank: Return top-k filtered results
Advantage: Prevents irrelevant matches (e.g., "roman" in "Roman Catholic art")
Model Details
CoraCurator (LLM)
- Model:
meta-llama/Llama-3.2-3B-Instruct - Purpose: Prompt refinement
- System Instruction: Guide toward "Daily Life" or "Epic Dimension" scenes
- Context: Etymology → Visual description
CoraEngine (Image Gen)
- Primary Model:
ByteDance/SDXL-Lightning - Params:
guidance_scale=0.0,steps=4 - Style: Historical Illustration / Strategy Game Art
- Fallback: RAG → Museum artifacts
CoraVision (Embeddings)
- CLIP Model:
sentence-transformers/clip-ViT-L-14 - Output: 768-dimensional vectors
- YOLO:
yolov8n.ptfor object detection/tagging
CoraMemory (Vector DB)
- Database: ChromaDB (persistent, local)
- Storage:
./archive_db - Metadata Schema:
path: Local file pathprompt: Original search querytags: Comma-separated (e.g., "roman,armor,met_museum_open_access")timestamp: ISO format
API Design
Etymology API Endpoints
POST /api/v1/generate_illustration
Purpose: Single endpoint for full pipeline
Design Decisions:
- Returns both
image_urlandimage_base64(flexibility) - Includes
sourcefield ("generated" vs "archive") - Auto-archives all results for future retrieval
- CORS-enabled for cross-origin integration
GET /api/v1/search_archive
Purpose: Direct access to historical artifacts
Use Case: Browse mode in etymology app
GET /health
Purpose: Monitor component status
Returns:
{
"status": "healthy",
"components": {
"engine": true,
"curator": true,
"vision": true,
"memory": true
}
}
Scaling Considerations
Current Constraints
- Single Instance: No load balancing
- Local Storage: ChromaDB in-process
- API Limits: HF free tier (402 errors common)
Future Optimizations
Archive Curator (Priority): Intelligent system to manage and curate the museum archive
- Auto-Tagging: Enhance metadata with historical period, culture, object type
- Quality Scoring: Rate artifact relevance for different etymology contexts
- Deduplication: Detect and merge similar artifacts
- Smart Indexing: Organize by historical timeline, geography, theme
- Active Curation: Suggest best artifacts for specific words/contexts
- Gap Analysis: Identify missing periods/cultures and trigger targeted ingestion
Caching: Hash etymology text → serve cached images
Queue System: Celery for async generation
CDN: Serve
archive_images/via CloudFront/similarModel Hosting: Self-host SDXL on GPU server to avoid 402 errors
Security Notes
API Keys
- Stored in
.env(gitignored) - Never exposed in responses or logs
CORS
- Currently set to
allow_origins=["*"]for development - Production: Restrict to etymology app domain
Static Files
archive_images/served directly via FastAPI- No authentication (museum artifacts are public domain)
- Consider rate limiting for public deployments