Alpie-Core: 4-bit Quantized Reasoning Model

[Space reserved for blog paper, technical report links, and company logo]

1. Introduction

Alpie-Core is one of the world's first fine-tuned 4-bit reasoning models, proving that aggressive quantization can surpass full-precision baselines in reasoning, mathematics, and coding. By combining cutting-edge quantization-aware training with synthetic STEM-rich datasets, Alpie-Core achieves frontier-level reasoning while being practical for real-world deployment at scale.

2. Model Summary

Base Architecture: DeepSeek-R1-Distill-Qwen-32B
Parameters: 32 billion (quantized to 4-bit)
Training Method: Supervised Fine-Tuning (SFT) using LoRA/QLoRA techniques
Quantization: 4-bit NF4 with double quantization
Context Length: 65,536 tokens
Max Output Length: 16,384 tokens
License: Apache 2.0
Memory Footprint: ~8GB (75% reduction from full-precision)

3. Model Features

Supports Streaming – Real-time token-level responses
OpenAI-Compatible API – Seamless integration with OpenAI client libraries
65K Context Length – Handles very large inputs and conversations
16,384 Max Output Length – Enables extremely long generations
4-Bit Quantization – Memory-efficient and optimized for deployment
High Throughput Inference – Powered by vLLM for efficient large-scale serving
Low Latency Inference – Fast response times optimized for production
Customizable Safety & Moderation Filters – Built-in guardrails for safer outputs
Supports Function Calling / Tool Use – Enables structured outputs and external API integration

4. Key Highlights

Frontier Performance in 4-bit: 81.28% MMLU, 92.75% GSM8K, 57.8% SWE-Bench Verified
Global Ranking: 3rd place on Humanity's Last Exam leaderboard
Cost Advantage: 70-88% lower inference cost vs GPT-4/Claude/DeepSeek
Environmental Impact: 64% lower carbon footprint per inference
STEM + Coding Excellence: Outperforms full-precision peers in mathematics and programming
Enhanced Content Access: Provides factual responses to geopolitically sensitive topics

5. Benchmark Results

Benchmark	Alpie-Core (32B-4bit)	DeepSeek-V2 (236B)	Qwen2.5 72B	Llama 3.1 405B	Llama 3.1 70B	Gemma-3 27B-PT	Mistral-Small-24B-Base-2501
MMLU (5-shot)	81.28%	78.4%	85.0%	84.4%	79.3%	78.6%	80.73%
GSM8K (8-shot)	92.75%	81.6%	88.3%	83.5%	nan	82.2%	80.73%
BBH (3-shot)	85.12%	78.8%	79.8%	82.9%	81.6%	77.7%	nan
MMLU-Pro (5-shot)	64.78%	51.4%	58.3%	52.8%	53.8%	52.2%	54.37%
MBPP (pass@1)	75.20%	65.0%	72.6%	68.4%	nan	65.6%	69.64%
HumanEval (pass@1)	57.23%	43.3%	53.0%	54.9%	nan	48.8%	nan

SWE-Bench Verified Performance

Rank	Model	Accuracy (%)	Performance vs Alpie
1	Alpie Core	57.8	Alpie
2	Qwen3-Coder-30B-A3B-Instruct	51.6	Below Alpie
3	o1	48.9	Below Alpie
4	o3-mini (high)	49.3	Below Alpie
5	Claude 3.5 Sonnet	49.0	Below Alpie
6	DeepSeek R1	49.2	Below Alpie
7	Devstral	46.8	Below Alpie

Humanity's Last Exam Leaderboard Performance

Rank	Model	Accuracy (%)	Performance vs Alpie
1	GPT 4.5 Preview	5.8	Above Alpie
2	Claude Sonnet 4	5.42	Above Alpie
3	Alpie Core 32B (4-bit)	5.41	Alpie
4	Llama 4 Maverik	5.34	Below Alpie
5	GPT 4.1	4.97	Below Alpie
6	Kimi K2 Instruct	4.68	Below Alpie
7	DeepSeek V3	4.55	Below Alpie
8	Gemini 1.5 Pro 002	4.55	Below Alpie

Additional Benchmarks

Benchmark	Alpie-Core (32B-4bit)	Category
AIME	47.34%	Advanced Mathematics
GPQA (Diamond)	40.91%	Graduate-level QA
TruthfulQA (MC2)	60.05%	Truthfulness
HellaSwag	84.66%	Commonsense
PIQA	83.24%	Physical Reasoning
ARC Challenge	67.58%	Science QA
CommonSenseQA	87.06%	Commonsense
AGIEval	64.98%	General Intelligence
Winogrande	79.53%	Commonsense Reasoning

6. Training Details

Hardware: 8× NVIDIA A100-80GB GPUs
Training Duration: 408 hours
Fine-tuning Method: LoRA/QLoRA with the following configuration:
- LoRA Alpha: 8
- LoRA Dropout: 0.05
- LoRA Rank: 8
Quantization: 4-bit NF4 + Double Quantization + FP16 compute
Dataset Domains: Mathematics, coding, reasoning, science, general knowledge, competitive exams, Indian context + law, multilingual (Hindi and Hinglish)
Synthetic Data Advantage: +15-20% performance boost in STEM & coding domains

7. Environmental Impact

Carbon Footprint: 298-835 kg CO₂e (training)

8. Use Cases

Scientific Research Excellence

98% performance on SciQ benchmark
Advanced physics, chemistry, and mathematical sciences
Literature review automation and hypothesis generation
Experimental design optimization

Advanced Coding and Software Engineering

57.8% SWE-Bench Verified score (8% above nearest competitor)
Automated bug detection and GitHub issue resolution
Competitive programming and algorithm design
Enterprise software development and architecture design

Indian Cultural and Religious Expertise

Comprehensive understanding of Hindu philosophy, Buddhist traditions
Regional diversity and cultural knowledge across Indian states
Legal and constitutional framework understanding
Educational support for Indian competitive exams (JEE, NEET, UPSC, SSC)

9. Safety and Limitations

Enhanced Content Access

Unlike the base DeepSeek model, Alpie-Core provides factual, balanced responses to geopolitically sensitive questions, offering global accessibility and factual accuracy on topics like Taiwan's status, Arunachal Pradesh sovereignty, and other sensitive geopolitical issues.

Current Limitations

Multilingual reasoning in Hindi/Hinglish shows room for improvement
Fixed knowledge cutoff without real-time information retrieval
Occasional struggles with complex multi-hop mathematical reasoning
Potential hallucinations in factual question-answering

Mitigations

Safety classifiers and output filtering systems
Model-assisted safety pipeline using RLHF
Comprehensive adversarial testing by domain experts

10. How to Use

Non-Streaming Inference

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig
import torch

# Load LoRA adapter configuration to find the base model
peft_model_id = "169Pi/Alpie-core"
config = PeftConfig.from_pretrained(peft_model_id)

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load LoRA weights
model = PeftModel.from_pretrained(base_model, peft_model_id)

# Ensure evaluation mode
model.eval()

# Sample inference
prompt = "Solve the Riemann Hypothesis and provide a final answer?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=1000)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("Response:\n", response)

Streaming Inference

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
from peft import PeftModel, PeftConfig
import torch

# Load LoRA adapter configuration to find the base model
peft_model_id = "169Pi/Alpie-core"
config = PeftConfig.from_pretrained(peft_model_id)

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load LoRA weights
model = PeftModel.from_pretrained(base_model, peft_model_id)

# Ensure evaluation mode
model.eval()

# Initialize streamer
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

# Sample streaming inference
prompt = "Solve the Riemann Hypothesis and provide a final answer?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

print("Streaming Response:")
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=1000,
        streamer=streamer,
        do_sample=True,
        temperature=0.7,
        top_p=0.9
    )

Deployment Options

Transformers: Python, PyTorch integration
vLLM: High-throughput inference
LMDeploy/Ollama/TensorRT-LLM: Production deployments

11. Citation

@misc{alpie2025core,
  title     = {Alpie-Core: A 4-bit Quantized Reasoning Model Surpassing Full-Precision Benchmarks},
  author    = {Alpie AI},
  year      = {2025},
  url       = {https://huggingface.co/alpie/Alpie-Core-4bit}
}

12. License

Apache 2.0 – Free for research and commercial use

For technical details, training methodology, and comprehensive evaluation results, please refer to our technical report.