How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf WithinUsAI/GPT2.5.2-High.Reasoning.Codex-0.4B-GGUF:
# Run inference directly in the terminal:
llama-cli -hf WithinUsAI/GPT2.5.2-High.Reasoning.Codex-0.4B-GGUF:
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf WithinUsAI/GPT2.5.2-High.Reasoning.Codex-0.4B-GGUF:
# Run inference directly in the terminal:
llama-cli -hf WithinUsAI/GPT2.5.2-High.Reasoning.Codex-0.4B-GGUF:
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf WithinUsAI/GPT2.5.2-High.Reasoning.Codex-0.4B-GGUF:
# Run inference directly in the terminal:
./llama-cli -hf WithinUsAI/GPT2.5.2-High.Reasoning.Codex-0.4B-GGUF:
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf WithinUsAI/GPT2.5.2-High.Reasoning.Codex-0.4B-GGUF:
# Run inference directly in the terminal:
./build/bin/llama-cli -hf WithinUsAI/GPT2.5.2-High.Reasoning.Codex-0.4B-GGUF:
Use Docker
docker model run hf.co/WithinUsAI/GPT2.5.2-High.Reasoning.Codex-0.4B-GGUF:
Quick Links

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

language:

  • en

pipeline_tag: text-generation

tags:

  • gguf
  • llama.cpp
  • gpt2
  • quantized
  • text-generation
  • code
  • coding
  • reasoning
  • lightweight
  • withinusai

license: other license_name: withinusai-custom-license license_link: LICENSE

base_model: WithinUsAI/GPT2.5.2-high-reasoning-codex-0.4B base_model_relation: quantized

metrics:

  • pass@1
  • accuracy
  • exact_match

model-index: - name: WithinUsAI/GPT2.5.2-high-reasoning-codex-0.4B-GGUF results: []

WithinUsAI/GPT2.5.2-high-reasoning-codex-0.4B-GGUF

GGUF quantizations of the GPT-2 Medium → “GPT-5.2 twin target” finetune.
Pick your quant, run local, move fast. ⚡🧠

What this repo contains

This repository provides GGUF quantizations for local inference (llama.cpp ecosystem) of:

  • WithinUsAI/GPT2.5.2-high-reasoning-codex-0.4B (source Transformers model)

Model details

  • Architecture: gpt2
  • Size class: ~0.4B parameters (approx.)
  • Source model: WithinUsAI/GPT2.5.2-high-reasoning-codex-0.4B
  • Base model foundation credit: openai-community/gpt2-medium
  • Relation: quantized distribution of the source model

Available quantizations

Quant Bits Size
Q4_K_M 4-bit 242 MB
Q5_K_M 5-bit 274 MB
F16 16-bit 714 MB

Which one should you choose?

  • Q4_K_M: best default for CPUs (small + fast)
  • Q5_K_M: slightly higher quality, still compact
  • F16: maximum fidelity (largest)

Prompting tips

  • “List edge cases first, then implement.”
  • “Explain root cause → propose fix → provide patch.”
  • “State invariants + complexity.”

Example usage (llama.cpp)

Replace MODEL.gguf with the quant file you downloaded:

./llama-cli -m MODEL.gguf \
  -p "You are a senior engineer. List edge cases, then write the code.\nTask: Implement an LRU cache in Python.\n\nAnswer:\n" \
  -n 256
Downloads last month
182
GGUF
Model size
0.4B params
Architecture
gpt2
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collections including WithinUsAI/GPT2.5.2-High.Reasoning.Codex-0.4B-GGUF