Raiff1982

Upload 78 files

d574a3d verified 1 day ago

3.64 kB

	# Codette Model Downloads

	All production models and adapters are available on HuggingFace: https://huggingface.co/Raiff1982

	## Quick Download

	### Option 1: Auto-Download (Recommended)
	```bash
	pip install huggingface-hub

	# Download directly
	huggingface-cli download Raiff1982/Meta-Llama-3.1-8B-Instruct-Q4 \
	--local-dir models/base/

	huggingface-cli download Raiff1982/Llama-3.2-1B-Instruct-Q8 \
	--local-dir models/base/

	# Download adapters
	huggingface-cli download Raiff1982/Codette-Adapters \
	--local-dir adapters/
	```

	### Option 2: Manual Download
	1. Visit: https://huggingface.co/Raiff1982
	2. Select model repository
	3. Click "Files and versions"
	4. Download `.gguf` files to `models/base/`
	5. Download adapters to `adapters/`

	### Option 3: Using Git-LFS
	```bash
	git clone https://huggingface.co/Raiff1982/Meta-Llama-3.1-8B-Instruct-Q4
	git lfs pull
	```

	## Available Models

	All models are quantized GGUF format (optimized for llama.cpp and similar):

	\| Model \| Size \| Location \| Type \|
	\|-------\|------\|----------\|------\|
	\| Llama 3.1 8B Q4 \| 4.6 GB \| Raiff1982/Meta-Llama-3.1-8B-Instruct-Q4 \| Default (recommended) \|
	\| Llama 3.1 8B F16 \| 3.4 GB \| Raiff1982/Meta-Llama-3.1-8B-Instruct-F16 \| High quality \|
	\| Llama 3.2 1B Q8 \| 1.3 GB \| Raiff1982/Llama-3.2-1B-Instruct-Q8 \| Lightweight/CPU \|
	\| Codette Adapters \| 224 MB \| Raiff1982/Codette-Adapters \| 8 LORA weights \|

	## Setup Instructions

	### Step 1: Clone Repository
	```bash
	git clone https://github.com/Raiff1982/Codette-Reasoning.git
	cd Codette-Reasoning
	```

	### Step 2: Install Dependencies
	```bash
	pip install -r requirements.txt
	```

	### Step 3: Download Models
	```bash
	# Quick method using huggingface-cli
	huggingface-cli download Raiff1982/Meta-Llama-3.1-8B-Instruct-Q4 \
	--local-dir models/base/

	huggingface-cli download Raiff1982/Llama-3.2-1B-Instruct-Q8 \
	--local-dir models/base/

	huggingface-cli download Raiff1982/Codette-Adapters \
	--local-dir adapters/
	```

	### Step 4: Verify Setup
	```bash
	ls -lh models/base/ # Should show 3 GGUF files
	ls adapters/*.gguf # Should show 8 adapters
	```

	### Step 5: Start Server
	```bash
	python inference/codette_server.py
	# Visit http://localhost:7860
	```

	## HuggingFace Profile

	All models hosted at: https://huggingface.co/Raiff1982

	Models include:
	- Complete documentation
	- Model cards with specifications
	- License information
	- Version history

	## Offline Setup

	If you have models downloaded locally:
	```bash
	# Just copy files to correct location
	cp /path/to/models/*.gguf models/base/
	cp /path/to/adapters/*.gguf adapters/
	```

	## Troubleshooting Downloads

	### Issue: "Connection timeout"
	```bash
	# Increase timeout
	huggingface-cli download Raiff1982/Meta-Llama-3.1-8B-Instruct-Q4 \
	--local-dir models/base/ \
	--resume-download
	```

	### Issue: "Disk space full"
	Each model needs:
	- Llama 3.1 8B Q4: 4.6 GB
	- Llama 3.1 8B F16: 3.4 GB
	- Llama 3.2 1B: 1.3 GB
	- Adapters: ~1 GB
	- Total: ~10 GB minimum

	### Issue: "HuggingFace token required"
	```bash
	huggingface-cli login
	# Paste token from: https://huggingface.co/settings/tokens
	```

	## Bandwidth & Speed

	Typical download times:
	- Llama 3.1 8B Q4: 5-15 minutes (100 Mbps connection)
	- Llama 3.2 1B: 2-5 minutes
	- Adapters: 1-2 minutes
	- Total: 8-22 minutes (first-time setup)

	## Attribution

	Models:
	- Llama: Meta AI (open source)
	- GGUF Quantization: Ollama/ggerganov
	- Adapters: Jonathan Harrison (Raiff1982)

	License: See individual model cards on HuggingFace

	---

	Once downloaded, follow `DEPLOYMENT.md` for production setup.

	For questions, visit: https://huggingface.co/Raiff1982