cora / docs /SETUP.md
tokgae's picture
Upload folder using huggingface_hub
38ab39c verified

Setup Guide

Prerequisites

Required Software

  • Python: 3.8 or higher
  • pip: Latest version (python -m pip install --upgrade pip)
  • Git: For version control (optional)

Required API Keys

1. Hugging Face API Token

Purpose: Image generation via SDXL-Lightning

Get your token:

  1. Create account at huggingface.co
  2. Go to Settings → Access Tokens
  3. Create new token with "Read" permissions
  4. Copy the token (starts with hf_...)

2. Smithsonian API Key

Purpose: Museum artifact ingestion

Get your key:

  1. Visit Smithsonian Open Access
  2. Request API key (free, instant approval)
  3. Copy the API key from email

Installation Steps

1. Clone or Download Project

cd c:\Users\Administrador\cora

2. Create Virtual Environment (Recommended)

python -m venv venv

# Activate
# Windows
venv\Scripts\activate

# Linux/Mac
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

Expected install time: 5-10 minutes (includes PyTorch)

4. Configure Environment Variables

Create a .env file in the project root:

# .env
HF_API_TOKEN=hf_your_hugging_face_token_here
SI_API_KEY=your_smithsonian_key_here

Important: Never commit .env to version control!

5. Verify Installation

python tests/verify_system.py

Expected output:

✅ CoraVision initialized
✅ CoraMemory initialized
✅ HF_API_TOKEN found
✅ System ready

First Run

Option A: Full UI (Testing)

# Terminal 1: Start API
python api.py
# Wait for: "Uvicorn running on http://0.0.0.0:8000"

# Terminal 2: Start UI
python ui.py
# Wait for: "Running on local URL: http://127.0.0.1:7861"

# Open browser to http://127.0.0.1:7861

Option B: Etymology API (Integration)

python etymology_api.py
# API ready at http://localhost:8000

Populate Archive (Optional but Recommended)

Load Museum Artifacts

# Load Roman artifacts from Met Museum
python loaders/met_loader.py

# Load from Smithsonian
python loaders/smithsonian_loader.py

What this does:

  • Downloads historical images from museum APIs
  • Generates CLIP embeddings
  • Indexes into ChromaDB (./archive_db)
  • Enables RAG fallback for generation failures

Time: ~2-3 minutes per loader

Custom Loading

Create your own loader script:

from met_loader import MetLoader

loader = MetLoader()
loader.search_and_index("Viking weapons", limit=5)
loader.search_and_index("Medieval manuscripts", limit=5)

Troubleshooting

Issue: ModuleNotFoundError

Solution: Ensure virtual environment is activated and dependencies installed

pip install -r requirements.txt

Issue: HF_API_TOKEN not found

Solution: Check .env file exists in project root with correct token

Issue: Port 8000 already in use

Solution: Find and kill existing process

# Windows
netstat -ano | findstr :8000
taskkill /PID <PID> /F

# Linux/Mac
lsof -ti:8000 | xargs kill -9

Issue: API returns 402 Payment Required

Solution: This is expected with HF free tier. The RAG fallback will activate:

  1. Ensure archive is populated (python met_loader.py)
  2. System will automatically serve museum artifacts
  3. No action needed from you

Issue: ChromaDB errors

Solution: Delete and recreate database

rm -rf archive_db
python
>>> from cora_memory import CoraMemory
>>> mem = CoraMemory()  # Creates fresh DB

Issue: CUDA out of memory

Solution: Vision models run on CPU by default. If you enabled GPU:

# In cora_vision.py, ensure:
device = "cpu"  # Not "cuda"

Directory Structure After Setup

cora/
├── .env                    # Your API keys (DO NOT COMMIT)
├── .gitignore
├── requirements.txt
│
├── venv/                   # Virtual environment (if created)
│
├── api.py
├── etymology_api.py
├── ui.py
│
├── cora_curator.py
├── cora_engine.py
├── cora_memory.py
├── cora_vision.py
│
├── loaders/
│   ├── smithsonian_loader.py
│   └── met_loader.py
│
├── scripts/
│   └── load_roman_artifacts.py
│
├── tests/
│   ├── test_etymology_api.py
│   ├── verify_system.py
│   └── ...
│
├── archive_db/             # ChromaDB storage (auto-created)
│   └── chroma.sqlite3
│
├── archive_images/         # Downloaded museum artifacts
│   ├── met_12345_abc.jpg
│   └── si_67890_def.jpg
│
├── docs/
│   ├── README.md
│   ├── ARCHITECTURE.md
│   ├── SETUP.md (this file)
│   └── README_ETYMOLOGY_API.md

Next Steps

  1. Test Generation: Try the UI → "Generate" tab → Enter "Roman soldier"
  2. Test Archive: UI → "Archive" tab → Search "romans"
  3. Test API: Run python tests/test_etymology_api.py
  4. Integrate: See docs/README_ETYMOLOGY_API.md for etymology app integration

Environment Variables Reference

Variable Required Purpose Example
HF_API_TOKEN Yes Hugging Face API access hf_abcd...xyz
SI_API_KEY Optional* Smithsonian data ingestion abc123...
PORT No Override API port (default 8000) 8080

*Required only for museum data ingestion, not for generation.


Updating

# Pull latest changes (if using Git)
git pull

# Update dependencies
pip install -r requirements.txt --upgrade

# Restart servers

Uninstall

# Deactivate virtual environment
deactivate

# Remove project directory
rm -rf c:\Users\Administrador\cora

# Or just delete venv and cache
rm -rf venv archive_db