How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf alesierraalta/codepause-phase7-qwen25-coder-7b-gguf:F16
# Run inference directly in the terminal:
llama-cli -hf alesierraalta/codepause-phase7-qwen25-coder-7b-gguf:F16
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf alesierraalta/codepause-phase7-qwen25-coder-7b-gguf:F16
# Run inference directly in the terminal:
llama-cli -hf alesierraalta/codepause-phase7-qwen25-coder-7b-gguf:F16
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf alesierraalta/codepause-phase7-qwen25-coder-7b-gguf:F16
# Run inference directly in the terminal:
./llama-cli -hf alesierraalta/codepause-phase7-qwen25-coder-7b-gguf:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf alesierraalta/codepause-phase7-qwen25-coder-7b-gguf:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf alesierraalta/codepause-phase7-qwen25-coder-7b-gguf:F16
Use Docker
docker model run hf.co/alesierraalta/codepause-phase7-qwen25-coder-7b-gguf:F16
Quick Links

CodePause Phase 7 โ€” Qwen2.5-Coder-7B ThinkAnywhere GGUF

This is a GGUF export of the CodePause Phase 7 model: a Qwen2.5-Coder-7B-Instruct based model fine-tuned with QLoRA for structured code reasoning using <thinkanywhere> blocks.

Intended use

Use in LM Studio or llama.cpp-compatible runtimes for code generation experiments.

Recommended prompt style:

You MUST answer using exactly this format:

<thinkanywhere>
Briefly explain the algorithm, edge cases, and complexity.
</thinkanywhere>

```python
# final code only

Task: Write a Python function longest_unique_substring(s).


## Training summary

- Base model: `Qwen/Qwen2.5-Coder-7B-Instruct`
- Method: QLoRA 4-bit NF4
- Dataset: CodePause Dataset v7
- Dataset size: 150 examples
- Mix: 70% examples with structured reasoning, 30% plain code
- Epochs: 3
- Final artifact: F16 GGUF

## Known limitations

- The model can generate correct code, but `<thinkanywhere>` tag adherence may still require strong prompt formatting.
- This F16 GGUF is large (~15.2GB). Quantized Q4_K_M export is recommended for faster local inference.

## Local loading

Load the `.gguf` file in LM Studio using a Qwen/ChatML-compatible prompt template.
Downloads last month
106
GGUF
Model size
8B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for alesierraalta/codepause-phase7-qwen25-coder-7b-gguf

Base model

Qwen/Qwen2.5-7B
Quantized
(193)
this model