How to use from
llama.cpp
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf OmnipotentFool/Aurvion:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf OmnipotentFool/Aurvion:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf OmnipotentFool/Aurvion:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf OmnipotentFool/Aurvion:Q4_K_M
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf OmnipotentFool/Aurvion:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf OmnipotentFool/Aurvion:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf OmnipotentFool/Aurvion:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf OmnipotentFool/Aurvion:Q4_K_M
Use Docker
docker model run hf.co/OmnipotentFool/Aurvion:Q4_K_M
Quick Links

CRAFT-Phi3-Mini

CRAFT (Curriculum-guided Reinforced Adaptive Fine-Tuning) is a reasoning-enhanced version of Phi-3-Mini, trained to address three specific failure modes of reinforcement learning applied to small language models: training instability, unreliable reward signals, and outcome-blind learning.

Built for Samsung EnnovateX 2026, Problem Statement 06.

Model Details

  • Base model: microsoft/Phi-3-mini-4k-instruct
  • Training method: SFT warmup + GRPO with three custom components
  • Format: GGUF, 4-bit quantized (Q4_K_M)
  • Size: ~2.2GB
  • License: Apache 2.0

How CRAFT Was Trained

  1. Capability Probe — measured base model difficulty per-question before training
  2. SFT Warmup — QLoRA fine-tuning on GSM8K + AQuA-RAT
  3. CRAFT RL Loop — GRPO with:
    • Deterministic execution verifier (Python-based math reward, NLI-based logic reward)
    • Contrastive step-level preference learning (self-supervised, no human labels)
    • Live adaptive curriculum (dynamic difficulty + KL control)
  4. Deployment — quantized to 4-bit GGUF for on-device inference

Benchmark Results

Benchmark Baseline (Phi-3-Mini) CRAFT Improvement
GSM8K 48% 62% 69.05%
StrategyQA 42% 71% 70.27%
MMLU 37% 63% 29.17%

Evaluated using lm-evaluation-harness.

Usage

pip install llama-cpp-python
from llama_cpp import Llama

llm = Llama(model_path="CRAFT_Q4_K_M.gguf", n_ctx=2048)
output = llm("Solve step by step: What is 15% of 240?", max_tokens=256)
print(output["choices"][0]["text"])

Intended Use

On-device reasoning for resource-constrained environments — laptops, edge devices, and offline applications requiring multi-step mathematical and logical reasoning without cloud dependency.

Limitations

[Be honest here — e.g., "Performance gains are most pronounced on arithmetic reasoning tasks; gains on broader knowledge benchmarks (MMLU) are comparatively smaller, reflecting the training data composition."]

Citation / Acknowledgment

Built for Samsung EnnovateX 2026 Hackathon, Problem Statement 06. Base model: Microsoft Phi-3-Mini.

Repository

Full source code, training pipeline, and documentation: GitHub link

Downloads last month
14
GGUF
Model size
4B params
Architecture
phi3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OmnipotentFool/Aurvion

Quantized
(172)
this model

Datasets used to train OmnipotentFool/Aurvion