Spaces:

tokgae
/

cora

Sleeping

App Files Files Community

cora / docs /SETUP.md

tokgae

Upload folder using huggingface_hub

38ab39c verified about 2 months ago

preview code

raw

history blame contribute delete

6.16 kB

	# Setup Guide

	## Prerequisites

	### Required Software
	- Python: 3.8 or higher
	- pip: Latest version (`python -m pip install --upgrade pip`)
	- Git: For version control (optional)

	### Required API Keys

	#### 1. Hugging Face API Token
	Purpose: Image generation via SDXL-Lightning

	Get your token:
	1. Create account at [huggingface.co](https://huggingface.co)
	2. Go to Settings → Access Tokens
	3. Create new token with "Read" permissions
	4. Copy the token (starts with `hf_...`)

	#### 2. Smithsonian API Key
	Purpose: Museum artifact ingestion

	Get your key:
	1. Visit [Smithsonian Open Access](https://api.si.edu/openaccess)
	2. Request API key (free, instant approval)
	3. Copy the API key from email

	---

	## Installation Steps

	### 1. Clone or Download Project
	```bash
	cd c:\Users\Administrador\cora
	```

	### 2. Create Virtual Environment (Recommended)
	```bash
	python -m venv venv

	# Activate
	# Windows
	venv\Scripts\activate

	# Linux/Mac
	source venv/bin/activate
	```

	### 3. Install Dependencies
	```bash
	pip install -r requirements.txt
	```

	Expected install time: 5-10 minutes (includes PyTorch)

	### 4. Configure Environment Variables
	Create a `.env` file in the project root:

	```bash
	# .env
	HF_API_TOKEN=hf_your_hugging_face_token_here
	SI_API_KEY=your_smithsonian_key_here
	```

	Important: Never commit `.env` to version control!

	### 5. Verify Installation
	```bash
	python tests/verify_system.py
	```

	Expected output:
	```
	✅ CoraVision initialized
	✅ CoraMemory initialized
	✅ HF_API_TOKEN found
	✅ System ready
	```

	---

	## First Run

	### Option A: Full UI (Testing)
	```bash
	# Terminal 1: Start API
	python api.py
	# Wait for: "Uvicorn running on http://0.0.0.0:8000"

	# Terminal 2: Start UI
	python ui.py
	# Wait for: "Running on local URL: http://127.0.0.1:7861"

	# Open browser to http://127.0.0.1:7861
	```

	### Option B: Etymology API (Integration)
	```bash
	python etymology_api.py
	# API ready at http://localhost:8000
	```

	---

	## Populate Archive (Optional but Recommended)

	### Load Museum Artifacts
	```bash
	# Load Roman artifacts from Met Museum
	python loaders/met_loader.py

	# Load from Smithsonian
	python loaders/smithsonian_loader.py
	```

	What this does:
	- Downloads historical images from museum APIs
	- Generates CLIP embeddings
	- Indexes into ChromaDB (`./archive_db`)
	- Enables RAG fallback for generation failures

	Time: ~2-3 minutes per loader

	### Custom Loading
	Create your own loader script:

	```python
	from met_loader import MetLoader

	loader = MetLoader()
	loader.search_and_index("Viking weapons", limit=5)
	loader.search_and_index("Medieval manuscripts", limit=5)
	```

	---

	## Troubleshooting

	### Issue: `ModuleNotFoundError`
	Solution: Ensure virtual environment is activated and dependencies installed
	```bash
	pip install -r requirements.txt
	```

	### Issue: `HF_API_TOKEN not found`
	Solution: Check `.env` file exists in project root with correct token

	### Issue: Port 8000 already in use
	Solution: Find and kill existing process
	```bash
	# Windows
	netstat -ano \| findstr :8000
	taskkill /PID <PID> /F

	# Linux/Mac
	lsof -ti:8000 \| xargs kill -9
	```

	### Issue: API returns 402 Payment Required
	Solution: This is expected with HF free tier. The RAG fallback will activate:
	1. Ensure archive is populated (`python met_loader.py`)
	2. System will automatically serve museum artifacts
	3. No action needed from you

	### Issue: ChromaDB errors
	Solution: Delete and recreate database
	```bash
	rm -rf archive_db
	python
	>>> from cora_memory import CoraMemory
	>>> mem = CoraMemory() # Creates fresh DB
	```

	### Issue: CUDA out of memory
	Solution: Vision models run on CPU by default. If you enabled GPU:
	```python
	# In cora_vision.py, ensure:
	device = "cpu" # Not "cuda"
	```

	---

	## Directory Structure After Setup

	```
	cora/
	├── .env # Your API keys (DO NOT COMMIT)
	├── .gitignore
	├── requirements.txt
	│
	├── venv/ # Virtual environment (if created)
	│
	├── api.py
	├── etymology_api.py
	├── ui.py
	│
	├── cora_curator.py
	├── cora_engine.py
	├── cora_memory.py
	├── cora_vision.py
	│
	├── loaders/
	│ ├── smithsonian_loader.py
	│ └── met_loader.py
	│
	├── scripts/
	│ └── load_roman_artifacts.py
	│
	├── tests/
	│ ├── test_etymology_api.py
	│ ├── verify_system.py
	│ └── ...
	│
	├── archive_db/ # ChromaDB storage (auto-created)
	│ └── chroma.sqlite3
	│
	├── archive_images/ # Downloaded museum artifacts
	│ ├── met_12345_abc.jpg
	│ └── si_67890_def.jpg
	│
	├── docs/
	│ ├── README.md
	│ ├── ARCHITECTURE.md
	│ ├── SETUP.md (this file)
	│ └── README_ETYMOLOGY_API.md
	```

	---

	## Next Steps

	1. Test Generation: Try the UI → "Generate" tab → Enter "Roman soldier"
	2. Test Archive: UI → "Archive" tab → Search "romans"
	3. Test API: Run `python tests/test_etymology_api.py`
	4. Integrate: See `docs/README_ETYMOLOGY_API.md` for etymology app integration

	---

	## Environment Variables Reference

	\| Variable \| Required \| Purpose \| Example \|
	\|----------\|----------\|---------\|---------\|
	\| `HF_API_TOKEN` \| Yes \| Hugging Face API access \| `hf_abcd...xyz` \|
	\| `SI_API_KEY` \| Optional* \| Smithsonian data ingestion \| `abc123...` \|
	\| `PORT` \| No \| Override API port (default 8000) \| `8080` \|

	*Required only for museum data ingestion, not for generation.

	---

	## Updating

	```bash
	# Pull latest changes (if using Git)
	git pull

	# Update dependencies
	pip install -r requirements.txt --upgrade

	# Restart servers
	```

	---

	## Uninstall

	```bash
	# Deactivate virtual environment
	deactivate

	# Remove project directory
	rm -rf c:\Users\Administrador\cora

	# Or just delete venv and cache
	rm -rf venv archive_db
	```