EAE-7B GGUF

GGUF quantized versions of infernet/eae-7b for use with llama.cpp, Ollama, LM Studio, and other compatible inference engines.

Model Details

  • Base Model: Qwen/Qwen2.5-7B
  • Training: LoRA fine-tuned with EAE (Epistemic AI Engine) reasoning methodology
  • Original Adapter: infernet/eae-7b
  • Context Length: 131072 tokens

Available Quantizations

Filename Quant Type Size GSM8K Accuracy Description
eae-7b-f16.gguf F16 15 GB 83% Full precision, best quality
eae-7b-Q5_K_M.gguf Q5_K_M 5.4 GB 74% High quality
eae-7b-Q4_K_M.gguf Q4_K_M 4.6 GB 70% Recommended - best balance
eae-7b-Q4_0.gguf Q4_0 4.4 GB 68% Smallest, fastest

Recommendations

  • Best Quality: F16 (requires ~16GB VRAM)
  • Balanced: Q4_K_M (recommended for most users)
  • Low VRAM / CPU: Q4_0

Benchmark Results

Tested on 100 GSM8K math problems:

Model Accuracy Time/Problem
F16 83% 5.96s
Q5_K_M 74% 3.69s
Q4_K_M 70% 3.64s
Q4_0 68% 3.31s

Tested on NVIDIA H100 80GB with llama.cpp

Usage

Important: This model uses the Qwen2 chat format. Using other prompt formats (like Alpaca ### Instruction:) will result in poor output.

llama.cpp

./llama-cli -m eae-7b-Q4_K_M.gguf \
  -p "<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is 25 * 48?<|im_end|>
<|im_start|>assistant
" \
  -n 800 -ngl 99

Ollama

Create a Modelfile:

FROM ./eae-7b-Q4_K_M.gguf

TEMPLATE """<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

PARAMETER stop "<|im_end|>"
SYSTEM "You are a helpful assistant that solves problems step by step using structured reasoning."

Then:

ollama create eae-7b -f Modelfile
ollama run eae-7b

LM Studio

Download any GGUF file and load it directly in LM Studio. Make sure to select Qwen2 / ChatML as the prompt template.

Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama(model_path="eae-7b-Q4_K_M.gguf", n_gpu_layers=-1)

response = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Solve step by step: If a train travels 120 miles in 2 hours, how far will it travel in 5 hours at the same speed?"}
    ],
    max_tokens=800
)
print(response["choices"][0]["message"]["content"])

EAE Reasoning Format

This model was trained with the EAE (Epistemic AI Engine) methodology and produces structured reasoning with:

  • OBSERVE: Identifies known facts (K_i), beliefs (B_i), and unknowns (I_i)
  • DECIDE: Selects approach based on available information
  • ACT: Executes step-by-step solution
  • VERIFY: Validates results and updates beliefs
  • COMPOUND: Extracts transferable insights

Example output:

<problem>
What is 25 * 48?
</problem>

<reasoning>
## OBSERVE
K_i (Known):
- First factor: 25 โ€” source: problem statement
- Second factor: 48 โ€” source: problem statement

## DECIDE
Selected: Break down multiplication using distributive property
Rationale: 25 ร— 48 = 25 ร— (50 - 2) = 1250 - 50 = 1200

## ACT
Step 1: 25 ร— 50 = 1250
Step 2: 25 ร— 2 = 50
Step 3: 1250 - 50 = 1200

## VERIFY
Check: 25 ร— 48 = 1200 โœ“
</reasoning>

Original Model

For fine-tuning or training, use the original LoRA adapter at infernet/eae-7b.

License

Apache 2.0 (same as base Qwen2.5-7B model)

Downloads last month
97
GGUF
Model size
8B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for infernet/eae-7b-GGUF

Base model

infernet/eae-7b
Quantized
(1)
this model