File size: 15,686 Bytes
861a266 be73b15 861a266 53c3c73 e37f6af 53c3c73 d3c4bfa 9c68ef7 d3c4bfa 8f9c8bf d3c4bfa 9c68ef7 d3c4bfa e37f6af 53c3c73 e37f6af be73b15 e37f6af be73b15 6cd8c67 53c3c73 e37f6af 53c3c73 e37f6af 53c3c73 ee94a8c 53c3c73 e37f6af 53c3c73 2eae6d7 e37f6af 31db0f5 e37f6af 730abb3 e37f6af 730abb3 e37f6af 730abb3 e37f6af 730abb3 e37f6af 730abb3 e37f6af 31db0f5 e37f6af 31db0f5 e37f6af 53c3c73 e37f6af 53c3c73 e37f6af 53c3c73 e37f6af 53c3c73 e37f6af 53c3c73 c0a3c38 bae016e e37f6af 53c3c73 f3dd06b 53c3c73 f3dd06b 8f9c8bf 53c3c73 e37f6af 53c3c73 80567f0 53c3c73 e37f6af 53c3c73 bae016e 53c3c73 d3cc664 e37f6af 53c3c73 e37f6af 494b673 53c3c73 494b673 53c3c73 e37f6af 53c3c73 e37f6af 53c3c73 e687c70 e37f6af 8f9c8bf e37f6af b93a5f6 e37f6af 8f9c8bf 831a704 e37f6af 831a704 2b3ee70 53c3c73 e37f6af f3dd06b e37f6af f3dd06b e37f6af be73b15 e37f6af 861a266 e37f6af 53c3c73 e37f6af 53c3c73 e37f6af 53c3c73 e37f6af 53c3c73 e37f6af 53c3c73 e37f6af 53c3c73 e37f6af c5e95e4 e37f6af be73b15 8f9c8bf e37f6af 8f9c8bf e37f6af 8f9c8bf e37f6af 8f9c8bf c5e95e4 e37f6af c5e95e4 e37f6af 53c3c73 e37f6af 53c3c73 e37f6af 53c3c73 e37f6af 53c3c73 be73b15 53c3c73 e37f6af 53c3c73 8f9c8bf e37f6af 8f9c8bf e37f6af 53c3c73 e37f6af 53c3c73 2b3ee70 be73b15 53c3c73 45d1b55 53c3c73 e37f6af be73b15 e37f6af be73b15 e37f6af be73b15 e37f6af be73b15 e37f6af 53c3c73 e37f6af 53c3c73 e37f6af e687c70 e37f6af e687c70 e37f6af be73b15 e37f6af be73b15 e37f6af be73b15 e37f6af be73b15 e37f6af bae016e e37f6af bae016e 53c3c73 e37f6af |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 |
---
tags:
- text-generation
- reasoning
- coding
- mathematics
- quantization
- 4-bit model
- state-of-the-art
license: apache-2.0
datasets:
- synthetic
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
language:
- en
- hi
library_name: transformers
pipeline_tag: text-generation
---
# Alpie Core: 4-bit Quantized Reasoning Model
<p align="center">
<a href="https://169pi.ai/"><img src="https://img.shields.io/badge/🌐%20Website-169Pi%20AI-blue" alt="Website"></a>
<a href="https://huggingface.co/169Pi"><img src="https://img.shields.io/badge/🤗%20Hugging%20Face-169Pi%20AI-yellow" alt="Hugging Face"></a>
<a href="https://pypi.org/project/pi169/0.1/"><img src="https://img.shields.io/badge/PyPI-pi169-blue" alt="PyPI"></a>
<a href="https://www.linkedin.com/company/169pi/"><img src="https://img.shields.io/badge/LinkedIn-169Pi%20AI-blue" alt="LinkedIn"></a>
<a href="https://x.com/169Pi_ai"><img src="https://img.shields.io/badge/X-169Pi%20AI-black" alt="X"></a>
</p>
## TL;DR
- **32B reasoning model**, trained & served at **4-bit quantization**
- **Competitive with GPT-4o / Claude 3.5 Sonnet** on reasoning & coding benchmarks
- **65K context length** for long-document reasoning
- **Open source** (Apache 2.0) - fully permissive for commercial use
- Available via **Ollama**, **Hugging Face**, and **hosted API** with 5M free tokens
📄 **[Technical Report: Alpie Core.pdf](./Alpie_Core.pdf)**
---
## How to Use Alpie Core
### Option 1: Local Inference with Ollama (Recommended for Quick Start)
```bash
# Pull the model (20GB)
ollama pull 169pi/alpie-core
# Run inference
ollama run 169pi/alpie-core
```
**Requirements**: 20GB RAM/VRAM minimum
### Option 2: Hosted Inference via 169Pi API
Get started instantly with our **hosted API** - no setup required!
**Get your first free API key** including **5 million tokens** to test real workloads
- **OpenAI-compatible** - drop-in replacement for OpenAI SDK
- Supports **streaming**, **async**, and **long-context reasoning**
- Production-ready with low latency
**[Get your API key at 169pi.ai](https://169pi.ai/)**
### Option 3: Programmatic Access with Python SDK
```bash
# Install the official SDK
pip install pi169
# Set your API key
export ALPIE_API_KEY="your_key_here"
# Use via CLI
pi169 "Explain quantum entanglement"
# Or use in Python
from pi169 import AlpieClient
client = AlpieClient(api_key="your_key_here")
response = client.chat.completions.create(
model="alpie-core",
messages=[{"role": "user", "content": "Solve this coding problem..."}],
stream=True
)
```
**SDK Features**: Streaming, async/await, OpenAI compatibility, type-safe interface
### Option 4: Load Directly with Transformers (Advanced)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig
import torch
# Load LoRA adapter configuration
peft_model_id = "169Pi/Alpie-Core"
config = PeftConfig.from_pretrained(peft_model_id)
# Load base model + LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
model = PeftModel.from_pretrained(base_model, peft_model_id)
# Inference
prompt = "Solve: What is the integral of x^2?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1000)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
---
## Why Alpie Core?
**Alpie Core is one of the first fine-tuned 4-bit reasoning models from India, and among the first worldwide at this scale.** Trained on just 8 Hopper GPUs using LoRA and QLoRA 4-bit quantization with synthetic STEM-rich datasets, it proves that aggressive quantization can match and even surpass full-precision baselines.
With a dramatically reduced memory footprint, Alpie Core delivers competitive, frontier-level reasoning performance, even beating top proprietary models. It achieves:
- **81.28% on MMLU** (5-shot)
- **92.75% on GSM8K** (8-shot)
- **57.8% on SWE-Bench Verified** (ranked #1 globally)
This demonstrates that efficient models can rival frontier systems while remaining practical for real-world deployment at scale.

---
## Model Summary
- **Base Architecture**: DeepSeek-R1-Distill-Qwen-32B
- **Parameters**: 32 billion (quantized to 4-bit)
- **Training Method**: Supervised Fine-Tuning (SFT) using LoRA/QLoRA
- **Quantization**: 4-bit NF4 with double quantization
- **Context Length**: 65k tokens
- **Max Output Length**: 16,384 tokens
- **Training Data**: Synthetic (STEM, reasoning, coding) + curated data (law, Indian context, exams, multilingual)
- **License**: Apache 2.0
---
## Approach
**Alpie Core** underwent extensive **supervised fine-tuning (SFT)** to strengthen reasoning, robustness, and safety. The training leveraged a diverse mixture of curated open-source datasets and proprietary synthetic data, optimized with high-quality LLM-generated responses. The fine-tuning process emphasized:
1. **User Understanding and Clarity** – ensuring outputs are direct, interpretable, and pedagogically sound
2. **Security and Ethical Guidelines** – filtering unsafe or harmful generations
3. **Limitations and Knowledge Boundaries** – transparently communicating uncertainty
4. **Handling Complex and Sensitive Topics** – balancing informativeness with responsible guardrails
5. **Safety and Respectful Engagement** – maintaining politeness, inclusivity, and cultural sensitivity
6. **Confidentiality and Responsible Use** – preventing leakage of private data or internal reasoning traces
This approach enables Alpie Core to deliver reliable, aligned, and context-aware responses while maintaining safety across a broad range of use cases, generalizing across global and Indian contexts.
---
## Model Features
1. **Supports Streaming** – Real-time token-level responses
2. **OpenAI-Compatible API** – Seamless integration with OpenAI client libraries
3. **65K Context Length** – Handles very large inputs and conversations
4. **16,384 Max Output Length** – Enables extremely long generations
5. **4-Bit Quantization** – Memory-efficient and optimized for deployment
6. **High Throughput Inference** – Powered by vLLM for efficient large-scale serving
7. **Low Latency Inference** – Fast response times optimized for production
8. **Customizable Safety & Moderation** – Built-in guardrails for safer outputs
9. **Supports Function Calling / Tool Use** – Structured outputs and external API integration
10. **Instruction Following** – Optimized for reasoning and chain-of-thought answers
11. **Education & Research Ready** – Tailored for competitive exams, STEM reasoning, and knowledge tasks
---
## Key Highlights
1. **First 4-bit Reasoning Model from India**: Competitive globally with frontier models
2. **Benchmark Competitiveness**: Outperforms or matches 70B+ models across reasoning, math, and coding
3. **STEM & Coding Strength**: Excellent on GSM8K, MATH-500, HumanEval, SWE-Bench Verified
4. **Efficiency & Deployment**: 16 GB VRAM footprint, runs on commodity GPUs
5. **Extended Context Length**: 65K tokens for research papers, multi-document reasoning
6. **Environmental Benefits**: ~298–835 kg CO₂e, 2–3× more efficient than FP16 training
7. **Open-Source Commitment**: Released under Apache 2.0 for global use
---
## Benchmark Results

### Core Benchmarks
| Benchmark | Alpie Core (32B-4bit) | DeepSeek-V2 (236B) | Qwen2.5 72B | Llama 3.1 405B | Llama 3.1 70B | Gemma-3 27B-PT | Mistral-Small-24B |
|-----------|----------------------|-------------------|-------------|---------------|---------------|----------------|-------------------|
| MMLU (5-shot) | **81.28%** | 78.4% | 85.0% | 84.4% | 79.3% | 78.6% | 80.73% |
| GSM8K (8-shot) | **92.75%** | 81.6% | 88.3% | 83.5% | - | 82.2% | 80.73% |
| BBH (3-shot) | **85.12%** | 78.8% | 79.8% | 82.9% | 81.6% | 77.7% | - |
| MMLU-Pro (5-shot) | **64.78%** | 51.4% | 58.3% | 52.8% | 53.8% | 52.2% | 54.37% |
| MBPP (pass@1) | **75.20%** | 65.0% | 72.6% | 68.4% | - | 65.6% | 69.64% |
| HumanEval (pass@1) | **57.23%** | 43.3% | 53.0% | 54.9% | - | 48.8% | - |
### SWE-Bench Verified Performance (#1 Globally)
| Rank | Model | Accuracy (%) | vs Alpie |
|------|-------|-------------|----------|
| **1** | **Alpie Core** | **57.8** | **—** |
| 2 | Qwen3-Coder-30B-A3B-Instruct | 51.6 | -6.2% |
| 3 | o1 | 48.9 | -8.9% |
| 4 | o3-mini (high) | 49.3 | -8.5% |
| 5 | Claude 3.5 Sonnet | 49.0 | -8.8% |
| 6 | DeepSeek R1 | 49.2 | -8.6% |
| 7 | Devstral | 46.8 | -11.0% |
### Humanity's Last Exam Leaderboard (#3 Globally)
| Rank | Model | Accuracy (%) | vs Alpie |
|------|-------|-------------|----------|
| 1 | GPT 4.5 Preview | 5.8 | +0.39% |
| 2 | Claude Sonnet 4 | 5.42 | +0.01% |
| **3** | **Alpie Core 32B (4-bit)** | **5.41** | **—** |
| 4 | Llama 4 Maverik | 5.34 | -0.07% |
| 5 | GPT 4.1 | 4.97 | -0.44% |
| 6 | Kimi K2 Instruct | 4.68 | -0.73% |
| 7 | DeepSeek V3 | 4.55 | -0.86% |

### Additional Benchmarks
| Benchmark | Alpie Core | Category |
|-----------|-----------|----------|
| AIME | **47.34%** | Advanced Mathematics |
| GPQA (Diamond) | **40.91%** | Graduate-level QA |
| TruthfulQA (MC2) | **60.05%** | Truthfulness |
| HellaSwag | **84.66%** | Commonsense |
| PIQA | **83.24%** | Physical Reasoning |
| ARC Challenge | **67.58%** | Science QA |
| CommonSenseQA | **87.06%** | Commonsense |
| AGIEval | **64.98%** | General Intelligence |
| Winogrande | **79.53%** | Commonsense Reasoning |
| MATH-500 | **70.00%** | Advanced Mathematics |

---
## Training Details
- **Hardware**: 8× NVIDIA H100-80GB GPUs
- **Fine-tuning Method**: LoRA/QLoRA
- LoRA Alpha: 16
- LoRA Dropout: 0.05
- LoRA Rank: 16
- **Quantization**: 4-bit NF4 + Double Quantization + FP16 compute
- **Dataset Domains**: Mathematics, coding, reasoning, science, competitive exams, Indian context + law, multilingual (Hindi/Hinglish)
- **Synthetic Data Advantage**: +15-20% performance boost in STEM & coding
- **Training Strategy**: Multi-stage distillation → SFT → safety alignment
- **Total Training Time**: 408 hours
---
## Environmental Impact

We estimated the carbon footprint of training Alpie Core on 8× NVIDIA H100-80GB GPUs:
**Formula**: CO₂e (kg) = Grid CO₂ Factor × Runtime × Power per GPU × Number of GPUs
**Training Parameters**:
- Grid CO₂ Factor (Azure): 0.364 kg CO₂e/kWh
- Runtime: 408 hours
- GPUs: 8× H100-80GB
**Results**:
- **Realistic mode** (250W avg per GPU): **~298 kg CO₂e**
- **Conservative mode** (700W TDP per GPU): **~835 kg CO₂e**
*This makes Alpie Core one of the most carbon-efficient reasoning models released to date.*
---
## Use Cases
Best for **STEM**, **complex mathematical reasoning**, **coding**, and **Indian context**
1. **STEM Education**: Advanced problem-solving in science, technology, engineering, mathematics
2. **Mathematical Reasoning**: Multi-step logical and quantitative reasoning
3. **Software Development**: Code generation, debugging, algorithmic problem-solving
4. **Indian Context**: Competitive exam assistance (JEE, NEET, UPSC), Hindi/Hinglish support
5. **Research & Legal**: 65K context for academic papers, legal documents, long-form analysis
---
## Safety and Limitations
### Enhanced Content Access
Unlike the base DeepSeek model, Alpie Core provides factual, balanced responses to geopolitically sensitive questions, offering global accessibility on topics like Taiwan's status, Arunachal Pradesh sovereignty, and other sensitive issues.
### Current Limitations
- Multilingual reasoning in Hindi/Hinglish shows room for improvement
- Fixed knowledge cutoff without real-time information retrieval
- Occasional struggles with complex multi-hop mathematical reasoning
- Potential hallucinations in factual question-answering
- Should not be used for medical/legal advice without expert oversight
### Mitigations
- Safety classifiers and output filtering systems
- Model-assisted safety pipeline using RLHF
- Comprehensive adversarial testing by domain experts
---
## Python SDK Quick Start
```bash
# Install
pip install pi169
# Set API key
export ALPIE_API_KEY="your_key_here"
# CLI usage
pi169 "Explain 4-bit quantization"
```
### SDK Features
- **CLI Integration** for quick interactions
- **Streaming & Non-Streaming** completions
- **Async/Await Support** for concurrent requests
- **Type-safe Interface** with dataclasses
- **Robust Error Handling**
- **OpenAI-Compatible**: Drop-in replacement
[Full SDK documentation on PyPI](https://pypi.org/project/pi169/0.1/)
---
## Advanced Usage Examples
### Streaming Inference with Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
from peft import PeftModel, PeftConfig
import torch
peft_model_id = "169Pi/Alpie-Core"
config = PeftConfig.from_pretrained(peft_model_id)
base_model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
model = PeftModel.from_pretrained(base_model, peft_model_id)
model.eval()
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
prompt = "Explain the P vs NP problem"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
print("Streaming Response:")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=1000,
streamer=streamer,
do_sample=True,
temperature=0.7,
top_p=0.9
)
```
### Deployment Options
- **Transformers**: Python, PyTorch integration
- **vLLM**: High-throughput inference server
- **Ollama**: Easy local deployment (20GB model size)
- **169Pi API**: Production-ready hosted inference
---
## Citation
```bibtex
@misc{169pi2025alpiecore,
title = {Alpie-Core: A 4-Bit Quantized Reasoning Model from India that Outperforms Full-Precision Models},
author = {169Pi AI},
year = {2025},
url = {https://huggingface.co/169Pi/Alpie-Core}
}
```
---
## Community & Contributions
Released under Apache 2.0 - we welcome the community to build, extend, and improve!
1. **Issues & Discussions**: Report bugs or suggest features on Hugging Face
2. **Contributions**: Pull requests welcome for improvements
3. **Share Results**: Post your fine-tuning experiments and benchmarks
4. **Collaborate**: Join us in shaping the future of efficient AI
---
## License
**Apache 2.0 License** – Permissive for research and commercial use
---
## Acknowledgements
Thanks to **DeepSeek** for the original model foundation. We also acknowledge:
- **Hugging Face** ecosystem (Transformers, PEFT, vLLM, bitsandbytes)
- Open-source datasets (MMLU, GSM8K, SWE-Bench, etc.)
- Cloud infrastructure providers
- The broader AI research community
---
## Contact
**Technical Support**: support@169pi.com
---
*Alpie Core represents a milestone for open-source AI from India, demonstrating that 4-bit reasoning models can rival frontier-scale systems. We hope this release empowers developers, researchers, and organizations worldwide to build more efficient, inclusive, and impactful AI.*
**Get started today with 5 million free tokens at [169pi.ai](https://169pi.ai/)** |