cora / docs /SETUP.md
tokgae's picture
Upload folder using huggingface_hub
38ab39c verified
# Setup Guide
## Prerequisites
### Required Software
- **Python**: 3.8 or higher
- **pip**: Latest version (`python -m pip install --upgrade pip`)
- **Git**: For version control (optional)
### Required API Keys
#### 1. Hugging Face API Token
**Purpose**: Image generation via SDXL-Lightning
**Get your token:**
1. Create account at [huggingface.co](https://huggingface.co)
2. Go to Settings → Access Tokens
3. Create new token with "Read" permissions
4. Copy the token (starts with `hf_...`)
#### 2. Smithsonian API Key
**Purpose**: Museum artifact ingestion
**Get your key:**
1. Visit [Smithsonian Open Access](https://api.si.edu/openaccess)
2. Request API key (free, instant approval)
3. Copy the API key from email
---
## Installation Steps
### 1. Clone or Download Project
```bash
cd c:\Users\Administrador\cora
```
### 2. Create Virtual Environment (Recommended)
```bash
python -m venv venv
# Activate
# Windows
venv\Scripts\activate
# Linux/Mac
source venv/bin/activate
```
### 3. Install Dependencies
```bash
pip install -r requirements.txt
```
**Expected install time**: 5-10 minutes (includes PyTorch)
### 4. Configure Environment Variables
Create a `.env` file in the project root:
```bash
# .env
HF_API_TOKEN=hf_your_hugging_face_token_here
SI_API_KEY=your_smithsonian_key_here
```
**Important**: Never commit `.env` to version control!
### 5. Verify Installation
```bash
python tests/verify_system.py
```
Expected output:
```
✅ CoraVision initialized
✅ CoraMemory initialized
✅ HF_API_TOKEN found
✅ System ready
```
---
## First Run
### Option A: Full UI (Testing)
```bash
# Terminal 1: Start API
python api.py
# Wait for: "Uvicorn running on http://0.0.0.0:8000"
# Terminal 2: Start UI
python ui.py
# Wait for: "Running on local URL: http://127.0.0.1:7861"
# Open browser to http://127.0.0.1:7861
```
### Option B: Etymology API (Integration)
```bash
python etymology_api.py
# API ready at http://localhost:8000
```
---
## Populate Archive (Optional but Recommended)
### Load Museum Artifacts
```bash
# Load Roman artifacts from Met Museum
python loaders/met_loader.py
# Load from Smithsonian
python loaders/smithsonian_loader.py
```
**What this does:**
- Downloads historical images from museum APIs
- Generates CLIP embeddings
- Indexes into ChromaDB (`./archive_db`)
- Enables RAG fallback for generation failures
**Time**: ~2-3 minutes per loader
### Custom Loading
Create your own loader script:
```python
from met_loader import MetLoader
loader = MetLoader()
loader.search_and_index("Viking weapons", limit=5)
loader.search_and_index("Medieval manuscripts", limit=5)
```
---
## Troubleshooting
### Issue: `ModuleNotFoundError`
**Solution**: Ensure virtual environment is activated and dependencies installed
```bash
pip install -r requirements.txt
```
### Issue: `HF_API_TOKEN not found`
**Solution**: Check `.env` file exists in project root with correct token
### Issue: Port 8000 already in use
**Solution**: Find and kill existing process
```bash
# Windows
netstat -ano | findstr :8000
taskkill /PID <PID> /F
# Linux/Mac
lsof -ti:8000 | xargs kill -9
```
### Issue: API returns 402 Payment Required
**Solution**: This is expected with HF free tier. The RAG fallback will activate:
1. Ensure archive is populated (`python met_loader.py`)
2. System will automatically serve museum artifacts
3. No action needed from you
### Issue: ChromaDB errors
**Solution**: Delete and recreate database
```bash
rm -rf archive_db
python
>>> from cora_memory import CoraMemory
>>> mem = CoraMemory() # Creates fresh DB
```
### Issue: CUDA out of memory
**Solution**: Vision models run on CPU by default. If you enabled GPU:
```python
# In cora_vision.py, ensure:
device = "cpu" # Not "cuda"
```
---
## Directory Structure After Setup
```
cora/
├── .env # Your API keys (DO NOT COMMIT)
├── .gitignore
├── requirements.txt
├── venv/ # Virtual environment (if created)
├── api.py
├── etymology_api.py
├── ui.py
├── cora_curator.py
├── cora_engine.py
├── cora_memory.py
├── cora_vision.py
├── loaders/
│ ├── smithsonian_loader.py
│ └── met_loader.py
├── scripts/
│ └── load_roman_artifacts.py
├── tests/
│ ├── test_etymology_api.py
│ ├── verify_system.py
│ └── ...
├── archive_db/ # ChromaDB storage (auto-created)
│ └── chroma.sqlite3
├── archive_images/ # Downloaded museum artifacts
│ ├── met_12345_abc.jpg
│ └── si_67890_def.jpg
├── docs/
│ ├── README.md
│ ├── ARCHITECTURE.md
│ ├── SETUP.md (this file)
│ └── README_ETYMOLOGY_API.md
```
---
## Next Steps
1. **Test Generation**: Try the UI → "Generate" tab → Enter "Roman soldier"
2. **Test Archive**: UI → "Archive" tab → Search "romans"
3. **Test API**: Run `python tests/test_etymology_api.py`
4. **Integrate**: See `docs/README_ETYMOLOGY_API.md` for etymology app integration
---
## Environment Variables Reference
| Variable | Required | Purpose | Example |
|----------|----------|---------|---------|
| `HF_API_TOKEN` | Yes | Hugging Face API access | `hf_abcd...xyz` |
| `SI_API_KEY` | Optional* | Smithsonian data ingestion | `abc123...` |
| `PORT` | No | Override API port (default 8000) | `8080` |
*Required only for museum data ingestion, not for generation.
---
## Updating
```bash
# Pull latest changes (if using Git)
git pull
# Update dependencies
pip install -r requirements.txt --upgrade
# Restart servers
```
---
## Uninstall
```bash
# Deactivate virtual environment
deactivate
# Remove project directory
rm -rf c:\Users\Administrador\cora
# Or just delete venv and cache
rm -rf venv archive_db
```