# Codette Model Downloads All production models and adapters are available on **HuggingFace**: https://huggingface.co/Raiff1982 ## Quick Download ### Option 1: Auto-Download (Recommended) ```bash pip install huggingface-hub # Download directly huggingface-cli download Raiff1982/Meta-Llama-3.1-8B-Instruct-Q4 \ --local-dir models/base/ huggingface-cli download Raiff1982/Llama-3.2-1B-Instruct-Q8 \ --local-dir models/base/ # Download adapters huggingface-cli download Raiff1982/Codette-Adapters \ --local-dir adapters/ ``` ### Option 2: Manual Download 1. Visit: https://huggingface.co/Raiff1982 2. Select model repository 3. Click "Files and versions" 4. Download `.gguf` files to `models/base/` 5. Download adapters to `adapters/` ### Option 3: Using Git-LFS ```bash git clone https://huggingface.co/Raiff1982/Meta-Llama-3.1-8B-Instruct-Q4 git lfs pull ``` ## Available Models All models are quantized GGUF format (optimized for llama.cpp and similar): | Model | Size | Location | Type | |-------|------|----------|------| | **Llama 3.1 8B Q4** | 4.6 GB | Raiff1982/Meta-Llama-3.1-8B-Instruct-Q4 | Default (recommended) | | **Llama 3.1 8B F16** | 3.4 GB | Raiff1982/Meta-Llama-3.1-8B-Instruct-F16 | High quality | | **Llama 3.2 1B Q8** | 1.3 GB | Raiff1982/Llama-3.2-1B-Instruct-Q8 | Lightweight/CPU | | **Codette Adapters** | 224 MB | Raiff1982/Codette-Adapters | 8 LORA weights | ## Setup Instructions ### Step 1: Clone Repository ```bash git clone https://github.com/Raiff1982/Codette-Reasoning.git cd Codette-Reasoning ``` ### Step 2: Install Dependencies ```bash pip install -r requirements.txt ``` ### Step 3: Download Models ```bash # Quick method using huggingface-cli huggingface-cli download Raiff1982/Meta-Llama-3.1-8B-Instruct-Q4 \ --local-dir models/base/ huggingface-cli download Raiff1982/Llama-3.2-1B-Instruct-Q8 \ --local-dir models/base/ huggingface-cli download Raiff1982/Codette-Adapters \ --local-dir adapters/ ``` ### Step 4: Verify Setup ```bash ls -lh models/base/ # Should show 3 GGUF files ls adapters/*.gguf # Should show 8 adapters ``` ### Step 5: Start Server ```bash python inference/codette_server.py # Visit http://localhost:7860 ``` ## HuggingFace Profile **All models hosted at**: https://huggingface.co/Raiff1982 Models include: - Complete documentation - Model cards with specifications - License information - Version history ## Offline Setup If you have models downloaded locally: ```bash # Just copy files to correct location cp /path/to/models/*.gguf models/base/ cp /path/to/adapters/*.gguf adapters/ ``` ## Troubleshooting Downloads ### Issue: "Connection timeout" ```bash # Increase timeout huggingface-cli download Raiff1982/Meta-Llama-3.1-8B-Instruct-Q4 \ --local-dir models/base/ \ --resume-download ``` ### Issue: "Disk space full" Each model needs: - Llama 3.1 8B Q4: 4.6 GB - Llama 3.1 8B F16: 3.4 GB - Llama 3.2 1B: 1.3 GB - Adapters: ~1 GB - **Total: ~10 GB minimum** ### Issue: "HuggingFace token required" ```bash huggingface-cli login # Paste token from: https://huggingface.co/settings/tokens ``` ## Bandwidth & Speed **Typical download times**: - Llama 3.1 8B Q4: 5-15 minutes (100 Mbps connection) - Llama 3.2 1B: 2-5 minutes - Adapters: 1-2 minutes - **Total: 8-22 minutes** (first-time setup) ## Attribution Models: - **Llama**: Meta AI (open source) - **GGUF Quantization**: Ollama/ggerganov - **Adapters**: Jonathan Harrison (Raiff1982) License: See individual model cards on HuggingFace --- **Once downloaded**, follow `DEPLOYMENT.md` for production setup. For questions, visit: https://huggingface.co/Raiff1982