--- license: apache-2.0 base_model: - Qwen/Qwen3-8B - allura-org/remnant-qwen3-8b tags: - merge - qwen3 - creative-writing - roleplay - gguf - llama-cpp model_type: qwen3 --- # RemnantInstruct-8B-GGUF GGUF quantizations of RemnantInstruct-8B, a SLERP merge combining instruction-following with creative writing capabilities. ## Model Details **Base Models:** - [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) - Strong instruction following and reasoning - [allura-org/remnant-qwen3-8b](https://huggingface.co/allura-org/remnant-qwen3-8b) - Enhanced creative writing and roleplay **Merge Method:** SLERP (Spherical Linear Interpolation) The merge uses a complementary interpolation strategy: - Self-attention layers: Gradual blend from base to creative (0 -> 0.5 -> 0.3 -> 0.7 -> 1) - MLP layers: Inverse blend (1 -> 0.5 -> 0.7 -> 0.3 -> 0) - Default: 50/50 blend This approach preserves the base model's instruction-following while incorporating the creative writing capabilities of the remnant fine-tune. ## Quantizations | Quant | Size | Description | |-------|------|-------------| | Q4_K_M | 4.7 GB | Balanced quality and size (recommended) | | Q5_K_M | 5.5 GB | Better quality, slightly larger | | Q8_0 | 8.2 GB | Highest quality quantization | ## Usage ### llama.cpp ```bash ./llama-cli -m RemnantInstruct-8B-Q4_K_M.gguf -p "Write a story about..." -n 512 ``` ### Ollama ```bash ollama run anthonym21/remnantinstruct-8b ``` ### LM Studio Download any GGUF file and load it directly in LM Studio. ## Merge Configuration ```yaml slices: - sources: - model: Qwen/Qwen3-8B layer_range: [0, 36] - model: allura-org/remnant-qwen3-8b layer_range: [0, 36] merge_method: slerp base_model: Qwen/Qwen3-8B parameters: t: - filter: self_attn value: [0, 0.5, 0.3, 0.7, 1] - filter: mlp value: [1, 0.5, 0.7, 0.3, 0] - value: 0.5 dtype: bfloat16 ``` ## License Apache 2.0 (inherited from Qwen3-8B)