Alpie-Core / README.md
deepanshupillm's picture
Update README.md
b487e83 verified
|
raw
history blame
9.61 kB

Alpie-Core: 4-bit Quantized Reasoning Model


[Space reserved for blog paper, technical report links, and company logo]


1. Introduction

Alpie-Core is one of the world's first fine-tuned 4-bit reasoning models, proving that aggressive quantization can surpass full-precision baselines in reasoning, mathematics, and coding. By combining cutting-edge quantization-aware training with synthetic STEM-rich datasets, Alpie-Core achieves frontier-level reasoning while being practical for real-world deployment at scale.

2. Model Summary

  • Base Architecture: DeepSeek-R1-Distill-Qwen-32B
  • Parameters: 32 billion (quantized to 4-bit)
  • Training Method: Supervised Fine-Tuning (SFT) using LoRA/QLoRA techniques
  • Quantization: 4-bit NF4 with double quantization
  • Context Length: 65,536 tokens
  • Max Output Length: 16,384 tokens
  • License: Apache 2.0
  • Memory Footprint: ~8GB (75% reduction from full-precision)

3. Model Features

  1. Supports Streaming – Real-time token-level responses
  2. OpenAI-Compatible API – Seamless integration with OpenAI client libraries
  3. 65K Context Length – Handles very large inputs and conversations
  4. 16,384 Max Output Length – Enables extremely long generations
  5. 4-Bit Quantization – Memory-efficient and optimized for deployment
  6. High Throughput Inference – Powered by vLLM for efficient large-scale serving
  7. Low Latency Inference – Fast response times optimized for production
  8. Customizable Safety & Moderation Filters – Built-in guardrails for safer outputs
  9. Supports Function Calling / Tool Use – Enables structured outputs and external API integration

4. Key Highlights

  • Frontier Performance in 4-bit: 81.28% MMLU, 92.75% GSM8K, 57.8% SWE-Bench Verified
  • Global Ranking: 3rd place on Humanity's Last Exam leaderboard
  • Cost Advantage: 70-88% lower inference cost vs GPT-4/Claude/DeepSeek
  • Environmental Impact: 64% lower carbon footprint per inference
  • STEM + Coding Excellence: Outperforms full-precision peers in mathematics and programming
  • Enhanced Content Access: Provides factual responses to geopolitically sensitive topics

5. Benchmark Results

Benchmark Alpie-Core (32B-4bit) DeepSeek-V2 (236B) Qwen2.5 72B Llama 3.1 405B Llama 3.1 70B Gemma-3 27B-PT Mistral-Small-24B-Base-2501
MMLU (5-shot) 81.28% 78.4% 85.0% 84.4% 79.3% 78.6% 80.73%
GSM8K (8-shot) 92.75% 81.6% 88.3% 83.5% nan 82.2% 80.73%
BBH (3-shot) 85.12% 78.8% 79.8% 82.9% 81.6% 77.7% nan
MMLU-Pro (5-shot) 64.78% 51.4% 58.3% 52.8% 53.8% 52.2% 54.37%
MBPP (pass@1) 75.20% 65.0% 72.6% 68.4% nan 65.6% 69.64%
HumanEval (pass@1) 57.23% 43.3% 53.0% 54.9% nan 48.8% nan

SWE-Bench Verified Performance

Rank Model Accuracy (%) Performance vs Alpie
1 Alpie Core 57.8 Alpie
2 Qwen3-Coder-30B-A3B-Instruct 51.6 Below Alpie
3 o1 48.9 Below Alpie
4 o3-mini (high) 49.3 Below Alpie
5 Claude 3.5 Sonnet 49.0 Below Alpie
6 DeepSeek R1 49.2 Below Alpie
7 Devstral 46.8 Below Alpie

Humanity's Last Exam Leaderboard Performance

Rank Model Accuracy (%) Performance vs Alpie
1 GPT 4.5 Preview 5.8 Above Alpie
2 Claude Sonnet 4 5.42 Above Alpie
3 Alpie Core 32B (4-bit) 5.41 Alpie
4 Llama 4 Maverik 5.34 Below Alpie
5 GPT 4.1 4.97 Below Alpie
6 Kimi K2 Instruct 4.68 Below Alpie
7 DeepSeek V3 4.55 Below Alpie
8 Gemini 1.5 Pro 002 4.55 Below Alpie

Additional Benchmarks

Benchmark Alpie-Core (32B-4bit) Category
AIME 47.34% Advanced Mathematics
GPQA (Diamond) 40.91% Graduate-level QA
TruthfulQA (MC2) 60.05% Truthfulness
HellaSwag 84.66% Commonsense
PIQA 83.24% Physical Reasoning
ARC Challenge 67.58% Science QA
CommonSenseQA 87.06% Commonsense
AGIEval 64.98% General Intelligence
Winogrande 79.53% Commonsense Reasoning

6. Training Details

  • Hardware: 8× NVIDIA A100-80GB GPUs
  • Training Duration: 408 hours
  • Fine-tuning Method: LoRA/QLoRA with the following configuration:
    • LoRA Alpha: 8
    • LoRA Dropout: 0.05
    • LoRA Rank: 8
  • Quantization: 4-bit NF4 + Double Quantization + FP16 compute
  • Dataset Domains: Mathematics, coding, reasoning, science, general knowledge, competitive exams, Indian context + law, multilingual (Hindi and Hinglish)
  • Synthetic Data Advantage: +15-20% performance boost in STEM & coding domains

7. Environmental Impact

Carbon Footprint: 298-835 kg CO₂e (training)

8. Use Cases

Scientific Research Excellence

  • 98% performance on SciQ benchmark
  • Advanced physics, chemistry, and mathematical sciences
  • Literature review automation and hypothesis generation
  • Experimental design optimization

Advanced Coding and Software Engineering

  • 57.8% SWE-Bench Verified score (8% above nearest competitor)
  • Automated bug detection and GitHub issue resolution
  • Competitive programming and algorithm design
  • Enterprise software development and architecture design

Indian Cultural and Religious Expertise

  • Comprehensive understanding of Hindu philosophy, Buddhist traditions
  • Regional diversity and cultural knowledge across Indian states
  • Legal and constitutional framework understanding
  • Educational support for Indian competitive exams (JEE, NEET, UPSC, SSC)

9. Safety and Limitations

Enhanced Content Access

Unlike the base DeepSeek model, Alpie-Core provides factual, balanced responses to geopolitically sensitive questions, offering global accessibility and factual accuracy on topics like Taiwan's status, Arunachal Pradesh sovereignty, and other sensitive geopolitical issues.

Current Limitations

  • Multilingual reasoning in Hindi/Hinglish shows room for improvement
  • Fixed knowledge cutoff without real-time information retrieval
  • Occasional struggles with complex multi-hop mathematical reasoning
  • Potential hallucinations in factual question-answering

Mitigations

  • Safety classifiers and output filtering systems
  • Model-assisted safety pipeline using RLHF
  • Comprehensive adversarial testing by domain experts

10. How to Use

Non-Streaming Inference

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig
import torch

# Load LoRA adapter configuration to find the base model
peft_model_id = "169Pi/Alpie-core"
config = PeftConfig.from_pretrained(peft_model_id)

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load LoRA weights
model = PeftModel.from_pretrained(base_model, peft_model_id)

# Ensure evaluation mode
model.eval()

# Sample inference
prompt = "Solve the Riemann Hypothesis and provide a final answer?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=1000)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("Response:\n", response)

Streaming Inference

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
from peft import PeftModel, PeftConfig
import torch

# Load LoRA adapter configuration to find the base model
peft_model_id = "169Pi/Alpie-core"
config = PeftConfig.from_pretrained(peft_model_id)

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load LoRA weights
model = PeftModel.from_pretrained(base_model, peft_model_id)

# Ensure evaluation mode
model.eval()

# Initialize streamer
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

# Sample streaming inference
prompt = "Solve the Riemann Hypothesis and provide a final answer?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

print("Streaming Response:")
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=1000,
        streamer=streamer,
        do_sample=True,
        temperature=0.7,
        top_p=0.9
    )

Deployment Options

  • Transformers: Python, PyTorch integration
  • vLLM: High-throughput inference
  • LMDeploy/Ollama/TensorRT-LLM: Production deployments

11. Citation

@misc{alpie2025core,
  title     = {Alpie-Core: A 4-bit Quantized Reasoning Model Surpassing Full-Precision Benchmarks},
  author    = {Alpie AI},
  year      = {2025},
  url       = {https://huggingface.co/alpie/Alpie-Core-4bit}
}

12. License

Apache 2.0 – Free for research and commercial use


For technical details, training methodology, and comprehensive evaluation results, please refer to our technical report.