Codette-Reasoning / MODEL_DOWNLOAD.md
Raiff1982's picture
Upload 78 files
d574a3d verified
# Codette Model Downloads
All production models and adapters are available on **HuggingFace**: https://huggingface.co/Raiff1982
## Quick Download
### Option 1: Auto-Download (Recommended)
```bash
pip install huggingface-hub
# Download directly
huggingface-cli download Raiff1982/Meta-Llama-3.1-8B-Instruct-Q4 \
--local-dir models/base/
huggingface-cli download Raiff1982/Llama-3.2-1B-Instruct-Q8 \
--local-dir models/base/
# Download adapters
huggingface-cli download Raiff1982/Codette-Adapters \
--local-dir adapters/
```
### Option 2: Manual Download
1. Visit: https://huggingface.co/Raiff1982
2. Select model repository
3. Click "Files and versions"
4. Download `.gguf` files to `models/base/`
5. Download adapters to `adapters/`
### Option 3: Using Git-LFS
```bash
git clone https://huggingface.co/Raiff1982/Meta-Llama-3.1-8B-Instruct-Q4
git lfs pull
```
## Available Models
All models are quantized GGUF format (optimized for llama.cpp and similar):
| Model | Size | Location | Type |
|-------|------|----------|------|
| **Llama 3.1 8B Q4** | 4.6 GB | Raiff1982/Meta-Llama-3.1-8B-Instruct-Q4 | Default (recommended) |
| **Llama 3.1 8B F16** | 3.4 GB | Raiff1982/Meta-Llama-3.1-8B-Instruct-F16 | High quality |
| **Llama 3.2 1B Q8** | 1.3 GB | Raiff1982/Llama-3.2-1B-Instruct-Q8 | Lightweight/CPU |
| **Codette Adapters** | 224 MB | Raiff1982/Codette-Adapters | 8 LORA weights |
## Setup Instructions
### Step 1: Clone Repository
```bash
git clone https://github.com/Raiff1982/Codette-Reasoning.git
cd Codette-Reasoning
```
### Step 2: Install Dependencies
```bash
pip install -r requirements.txt
```
### Step 3: Download Models
```bash
# Quick method using huggingface-cli
huggingface-cli download Raiff1982/Meta-Llama-3.1-8B-Instruct-Q4 \
--local-dir models/base/
huggingface-cli download Raiff1982/Llama-3.2-1B-Instruct-Q8 \
--local-dir models/base/
huggingface-cli download Raiff1982/Codette-Adapters \
--local-dir adapters/
```
### Step 4: Verify Setup
```bash
ls -lh models/base/ # Should show 3 GGUF files
ls adapters/*.gguf # Should show 8 adapters
```
### Step 5: Start Server
```bash
python inference/codette_server.py
# Visit http://localhost:7860
```
## HuggingFace Profile
**All models hosted at**: https://huggingface.co/Raiff1982
Models include:
- Complete documentation
- Model cards with specifications
- License information
- Version history
## Offline Setup
If you have models downloaded locally:
```bash
# Just copy files to correct location
cp /path/to/models/*.gguf models/base/
cp /path/to/adapters/*.gguf adapters/
```
## Troubleshooting Downloads
### Issue: "Connection timeout"
```bash
# Increase timeout
huggingface-cli download Raiff1982/Meta-Llama-3.1-8B-Instruct-Q4 \
--local-dir models/base/ \
--resume-download
```
### Issue: "Disk space full"
Each model needs:
- Llama 3.1 8B Q4: 4.6 GB
- Llama 3.1 8B F16: 3.4 GB
- Llama 3.2 1B: 1.3 GB
- Adapters: ~1 GB
- **Total: ~10 GB minimum**
### Issue: "HuggingFace token required"
```bash
huggingface-cli login
# Paste token from: https://huggingface.co/settings/tokens
```
## Bandwidth & Speed
**Typical download times**:
- Llama 3.1 8B Q4: 5-15 minutes (100 Mbps connection)
- Llama 3.2 1B: 2-5 minutes
- Adapters: 1-2 minutes
- **Total: 8-22 minutes** (first-time setup)
## Attribution
Models:
- **Llama**: Meta AI (open source)
- **GGUF Quantization**: Ollama/ggerganov
- **Adapters**: Jonathan Harrison (Raiff1982)
License: See individual model cards on HuggingFace
---
**Once downloaded**, follow `DEPLOYMENT.md` for production setup.
For questions, visit: https://huggingface.co/Raiff1982