File size: 15,686 Bytes
861a266
 
 
 
 
 
 
be73b15
 
861a266
 
 
 
 
 
 
 
 
 
53c3c73
e37f6af
53c3c73
d3c4bfa
9c68ef7
d3c4bfa
8f9c8bf
d3c4bfa
9c68ef7
d3c4bfa
 
e37f6af
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53c3c73
e37f6af
 
be73b15
e37f6af
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
be73b15
6cd8c67
53c3c73
e37f6af
 
 
53c3c73
 
 
e37f6af
53c3c73
ee94a8c
53c3c73
e37f6af
53c3c73
2eae6d7
e37f6af
31db0f5
e37f6af
730abb3
e37f6af
730abb3
e37f6af
 
 
 
 
 
730abb3
e37f6af
730abb3
e37f6af
730abb3
e37f6af
31db0f5
e37f6af
 
 
 
 
 
 
 
 
 
 
31db0f5
e37f6af
53c3c73
e37f6af
53c3c73
e37f6af
 
 
 
 
 
 
53c3c73
e37f6af
53c3c73
e37f6af
53c3c73
c0a3c38
bae016e
e37f6af
 
 
 
53c3c73
f3dd06b
 
53c3c73
f3dd06b
8f9c8bf
53c3c73
e37f6af
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53c3c73
80567f0
 
53c3c73
 
e37f6af
 
53c3c73
 
 
 
 
 
 
 
 
bae016e
53c3c73
d3cc664
 
e37f6af
 
 
53c3c73
e37f6af
 
494b673
53c3c73
494b673
53c3c73
e37f6af
 
 
 
 
 
53c3c73
e37f6af
53c3c73
e687c70
 
e37f6af
8f9c8bf
e37f6af
b93a5f6
e37f6af
 
8f9c8bf
 
831a704
e37f6af
 
 
831a704
2b3ee70
53c3c73
e37f6af
f3dd06b
e37f6af
f3dd06b
e37f6af
be73b15
e37f6af
 
 
 
 
861a266
e37f6af
53c3c73
e37f6af
53c3c73
 
e37f6af
 
53c3c73
 
e37f6af
53c3c73
 
 
 
e37f6af
53c3c73
 
e37f6af
53c3c73
 
 
 
e37f6af
c5e95e4
e37f6af
be73b15
8f9c8bf
e37f6af
8f9c8bf
 
e37f6af
8f9c8bf
 
e37f6af
 
8f9c8bf
 
c5e95e4
 
e37f6af
 
 
 
 
 
c5e95e4
e37f6af
53c3c73
e37f6af
53c3c73
e37f6af
53c3c73
e37f6af
53c3c73
 
 
 
 
 
be73b15
53c3c73
 
 
 
 
 
 
 
 
 
 
 
 
 
e37f6af
53c3c73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8f9c8bf
e37f6af
 
 
 
8f9c8bf
e37f6af
53c3c73
e37f6af
53c3c73
 
2b3ee70
 
be73b15
53c3c73
45d1b55
53c3c73
 
 
e37f6af
be73b15
e37f6af
be73b15
e37f6af
be73b15
e37f6af
 
 
 
be73b15
e37f6af
53c3c73
e37f6af
53c3c73
e37f6af
e687c70
e37f6af
e687c70
e37f6af
be73b15
e37f6af
be73b15
e37f6af
 
 
 
be73b15
e37f6af
be73b15
e37f6af
bae016e
e37f6af
bae016e
53c3c73
e37f6af
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
---
tags:
- text-generation
- reasoning
- coding
- mathematics
- quantization
- 4-bit model
- state-of-the-art
license: apache-2.0
datasets:
- synthetic
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
language:
- en
- hi
library_name: transformers
pipeline_tag: text-generation
---

# Alpie Core: 4-bit Quantized Reasoning Model

<p align="center">
  <a href="https://169pi.ai/"><img src="https://img.shields.io/badge/🌐%20Website-169Pi%20AI-blue" alt="Website"></a>
  <a href="https://huggingface.co/169Pi"><img src="https://img.shields.io/badge/🤗%20Hugging%20Face-169Pi%20AI-yellow" alt="Hugging Face"></a>
  <a href="https://pypi.org/project/pi169/0.1/"><img src="https://img.shields.io/badge/PyPI-pi169-blue" alt="PyPI"></a>
  <a href="https://www.linkedin.com/company/169pi/"><img src="https://img.shields.io/badge/LinkedIn-169Pi%20AI-blue" alt="LinkedIn"></a>
  <a href="https://x.com/169Pi_ai"><img src="https://img.shields.io/badge/X-169Pi%20AI-black" alt="X"></a>
</p>

##  TL;DR

- **32B reasoning model**, trained & served at **4-bit quantization**
- **Competitive with GPT-4o / Claude 3.5 Sonnet** on reasoning & coding benchmarks
- **65K context length** for long-document reasoning
- **Open source** (Apache 2.0) - fully permissive for commercial use
- Available via **Ollama**, **Hugging Face**, and **hosted API** with 5M free tokens

📄 **[Technical Report: Alpie Core.pdf](./Alpie_Core.pdf)**

---

## How to Use Alpie Core

### Option 1: Local Inference with Ollama (Recommended for Quick Start)

```bash
# Pull the model (20GB)
ollama pull 169pi/alpie-core

# Run inference
ollama run 169pi/alpie-core
```

**Requirements**: 20GB RAM/VRAM minimum

### Option 2: Hosted Inference via 169Pi API

Get started instantly with our **hosted API** - no setup required!

 **Get your first free API key** including **5 million tokens** to test real workloads

- **OpenAI-compatible** - drop-in replacement for OpenAI SDK
- Supports **streaming**, **async**, and **long-context reasoning**
- Production-ready with low latency

 **[Get your API key at 169pi.ai](https://169pi.ai/)**

### Option 3: Programmatic Access with Python SDK

```bash
# Install the official SDK
pip install pi169

# Set your API key
export ALPIE_API_KEY="your_key_here"

# Use via CLI
pi169 "Explain quantum entanglement"

# Or use in Python
from pi169 import AlpieClient

client = AlpieClient(api_key="your_key_here")
response = client.chat.completions.create(
    model="alpie-core",
    messages=[{"role": "user", "content": "Solve this coding problem..."}],
    stream=True
)
```

**SDK Features**: Streaming, async/await, OpenAI compatibility, type-safe interface

### Option 4: Load Directly with Transformers (Advanced)

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig
import torch

# Load LoRA adapter configuration
peft_model_id = "169Pi/Alpie-Core"
config = PeftConfig.from_pretrained(peft_model_id)

# Load base model + LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
model = PeftModel.from_pretrained(base_model, peft_model_id)

# Inference
prompt = "Solve: What is the integral of x^2?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1000)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

---

##  Why Alpie Core?

**Alpie Core is one of the first fine-tuned 4-bit reasoning models from India, and among the first worldwide at this scale.** Trained on just 8 Hopper GPUs using LoRA and QLoRA 4-bit quantization with synthetic STEM-rich datasets, it proves that aggressive quantization can match and even surpass full-precision baselines.

With a dramatically reduced memory footprint, Alpie Core delivers competitive, frontier-level reasoning performance, even beating top proprietary models. It achieves:

- **81.28% on MMLU** (5-shot)
- **92.75% on GSM8K** (8-shot)  
- **57.8% on SWE-Bench Verified** (ranked #1 globally)

This demonstrates that efficient models can rival frontier systems while remaining practical for real-world deployment at scale.

![Bench](https://cdn-uploads.huggingface.co/production/uploads/66e2f8a815879154e1f9e023/i2SOWOOHdsTx5RajIkyrE.png)

---

## Model Summary

- **Base Architecture**: DeepSeek-R1-Distill-Qwen-32B
- **Parameters**: 32 billion (quantized to 4-bit)
- **Training Method**: Supervised Fine-Tuning (SFT) using LoRA/QLoRA
- **Quantization**: 4-bit NF4 with double quantization
- **Context Length**: 65k tokens
- **Max Output Length**: 16,384 tokens
- **Training Data**: Synthetic (STEM, reasoning, coding) + curated data (law, Indian context, exams, multilingual)
- **License**: Apache 2.0

---

##  Approach

**Alpie Core** underwent extensive **supervised fine-tuning (SFT)** to strengthen reasoning, robustness, and safety. The training leveraged a diverse mixture of curated open-source datasets and proprietary synthetic data, optimized with high-quality LLM-generated responses. The fine-tuning process emphasized:

1. **User Understanding and Clarity** – ensuring outputs are direct, interpretable, and pedagogically sound
2. **Security and Ethical Guidelines** – filtering unsafe or harmful generations
3. **Limitations and Knowledge Boundaries** – transparently communicating uncertainty
4. **Handling Complex and Sensitive Topics** – balancing informativeness with responsible guardrails
5. **Safety and Respectful Engagement** – maintaining politeness, inclusivity, and cultural sensitivity
6. **Confidentiality and Responsible Use** – preventing leakage of private data or internal reasoning traces

This approach enables Alpie Core to deliver reliable, aligned, and context-aware responses while maintaining safety across a broad range of use cases, generalizing across global and Indian contexts.

---

##  Model Features

1.  **Supports Streaming** – Real-time token-level responses
2.  **OpenAI-Compatible API** – Seamless integration with OpenAI client libraries
3.  **65K Context Length** – Handles very large inputs and conversations
4.  **16,384 Max Output Length** – Enables extremely long generations
5.  **4-Bit Quantization** – Memory-efficient and optimized for deployment
6.  **High Throughput Inference** – Powered by vLLM for efficient large-scale serving
7.  **Low Latency Inference** – Fast response times optimized for production
8.  **Customizable Safety & Moderation** – Built-in guardrails for safer outputs
9.  **Supports Function Calling / Tool Use** – Structured outputs and external API integration
10. **Instruction Following** – Optimized for reasoning and chain-of-thought answers
11. **Education & Research Ready** – Tailored for competitive exams, STEM reasoning, and knowledge tasks

---

##  Key Highlights

1. **First 4-bit Reasoning Model from India**: Competitive globally with frontier models
2. **Benchmark Competitiveness**: Outperforms or matches 70B+ models across reasoning, math, and coding
3. **STEM & Coding Strength**: Excellent on GSM8K, MATH-500, HumanEval, SWE-Bench Verified
4. **Efficiency & Deployment**: 16 GB VRAM footprint, runs on commodity GPUs
5. **Extended Context Length**: 65K tokens for research papers, multi-document reasoning
6. **Environmental Benefits**: ~298–835 kg CO₂e, 2–3× more efficient than FP16 training
7. **Open-Source Commitment**: Released under Apache 2.0 for global use

---

## Benchmark Results

![Combined Benchmark](combined_benchmark.png)

### Core Benchmarks

| Benchmark | Alpie Core (32B-4bit) | DeepSeek-V2 (236B) | Qwen2.5 72B | Llama 3.1 405B | Llama 3.1 70B | Gemma-3 27B-PT | Mistral-Small-24B |
|-----------|----------------------|-------------------|-------------|---------------|---------------|----------------|-------------------|
| MMLU (5-shot) | **81.28%** | 78.4% | 85.0% | 84.4% | 79.3% | 78.6% | 80.73% |
| GSM8K (8-shot) | **92.75%** | 81.6% | 88.3% | 83.5% | - | 82.2% | 80.73% |
| BBH (3-shot) | **85.12%** | 78.8% | 79.8% | 82.9% | 81.6% | 77.7% | - |
| MMLU-Pro (5-shot) | **64.78%** | 51.4% | 58.3% | 52.8% | 53.8% | 52.2% | 54.37% |
| MBPP (pass@1) | **75.20%** | 65.0% | 72.6% | 68.4% | - | 65.6% | 69.64% |
| HumanEval (pass@1) | **57.23%** | 43.3% | 53.0% | 54.9% | - | 48.8% | - |

### SWE-Bench Verified Performance (#1 Globally)

| Rank | Model | Accuracy (%) | vs Alpie |
|------|-------|-------------|----------|
| **1** | **Alpie Core** | **57.8** | **—** |
| 2 | Qwen3-Coder-30B-A3B-Instruct | 51.6 | -6.2% |
| 3 | o1 | 48.9 | -8.9% |
| 4 | o3-mini (high) | 49.3 | -8.5% |
| 5 | Claude 3.5 Sonnet | 49.0 | -8.8% |
| 6 | DeepSeek R1 | 49.2 | -8.6% |
| 7 | Devstral | 46.8 | -11.0% |

### Humanity's Last Exam Leaderboard (#3 Globally)

| Rank | Model | Accuracy (%) | vs Alpie |
|------|-------|-------------|----------|
| 1 | GPT 4.5 Preview | 5.8 | +0.39% |
| 2 | Claude Sonnet 4 | 5.42 | +0.01% |
| **3** | **Alpie Core 32B (4-bit)** | **5.41** | **—** |
| 4 | Llama 4 Maverik | 5.34 | -0.07% |
| 5 | GPT 4.1 | 4.97 | -0.44% |
| 6 | Kimi K2 Instruct | 4.68 | -0.73% |
| 7 | DeepSeek V3 | 4.55 | -0.86% |

![Humanity's Last Exam](HLE.png)

### Additional Benchmarks

| Benchmark | Alpie Core | Category |
|-----------|-----------|----------|
| AIME | **47.34%** | Advanced Mathematics |
| GPQA (Diamond) | **40.91%** | Graduate-level QA |
| TruthfulQA (MC2) | **60.05%** | Truthfulness |
| HellaSwag | **84.66%** | Commonsense |
| PIQA | **83.24%** | Physical Reasoning |
| ARC Challenge | **67.58%** | Science QA |
| CommonSenseQA | **87.06%** | Commonsense |
| AGIEval | **64.98%** | General Intelligence |
| Winogrande | **79.53%** | Commonsense Reasoning |
| MATH-500 | **70.00%** | Advanced Mathematics |

![AIME Benchmark](AIME.png)

---

## Training Details

- **Hardware**: 8× NVIDIA H100-80GB GPUs
- **Fine-tuning Method**: LoRA/QLoRA
  - LoRA Alpha: 16
  - LoRA Dropout: 0.05
  - LoRA Rank: 16
- **Quantization**: 4-bit NF4 + Double Quantization + FP16 compute
- **Dataset Domains**: Mathematics, coding, reasoning, science, competitive exams, Indian context + law, multilingual (Hindi/Hinglish)
- **Synthetic Data Advantage**: +15-20% performance boost in STEM & coding
- **Training Strategy**: Multi-stage distillation → SFT → safety alignment
- **Total Training Time**: 408 hours

---

## Environmental Impact

![Carbon Footprint](carbon_footprint.png)

We estimated the carbon footprint of training Alpie Core on 8× NVIDIA H100-80GB GPUs:

**Formula**: CO₂e (kg) = Grid CO₂ Factor × Runtime × Power per GPU × Number of GPUs

**Training Parameters**:
- Grid CO₂ Factor (Azure): 0.364 kg CO₂e/kWh
- Runtime: 408 hours
- GPUs: 8× H100-80GB

**Results**:
- **Realistic mode** (250W avg per GPU): **~298 kg CO₂e**
- **Conservative mode** (700W TDP per GPU): **~835 kg CO₂e**

*This makes Alpie Core one of the most carbon-efficient reasoning models released to date.*

---

## Use Cases

Best for **STEM**, **complex mathematical reasoning**, **coding**, and **Indian context**

1. **STEM Education**: Advanced problem-solving in science, technology, engineering, mathematics
2. **Mathematical Reasoning**: Multi-step logical and quantitative reasoning
3. **Software Development**: Code generation, debugging, algorithmic problem-solving
4. **Indian Context**: Competitive exam assistance (JEE, NEET, UPSC), Hindi/Hinglish support
5. **Research & Legal**: 65K context for academic papers, legal documents, long-form analysis

---

## Safety and Limitations

### Enhanced Content Access

Unlike the base DeepSeek model, Alpie Core provides factual, balanced responses to geopolitically sensitive questions, offering global accessibility on topics like Taiwan's status, Arunachal Pradesh sovereignty, and other sensitive issues.

### Current Limitations

- Multilingual reasoning in Hindi/Hinglish shows room for improvement
- Fixed knowledge cutoff without real-time information retrieval
- Occasional struggles with complex multi-hop mathematical reasoning
- Potential hallucinations in factual question-answering
- Should not be used for medical/legal advice without expert oversight

### Mitigations

- Safety classifiers and output filtering systems
- Model-assisted safety pipeline using RLHF
- Comprehensive adversarial testing by domain experts

---

## Python SDK Quick Start

```bash
# Install
pip install pi169

# Set API key
export ALPIE_API_KEY="your_key_here"

# CLI usage
pi169 "Explain 4-bit quantization"
```

### SDK Features

- **CLI Integration** for quick interactions
- **Streaming & Non-Streaming** completions
- **Async/Await Support** for concurrent requests
- **Type-safe Interface** with dataclasses
- **Robust Error Handling**
- **OpenAI-Compatible**: Drop-in replacement

[Full SDK documentation on PyPI](https://pypi.org/project/pi169/0.1/)

---

## Advanced Usage Examples

### Streaming Inference with Transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
from peft import PeftModel, PeftConfig
import torch

peft_model_id = "169Pi/Alpie-Core"
config = PeftConfig.from_pretrained(peft_model_id)

base_model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    torch_dtype=torch.float16,
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
model = PeftModel.from_pretrained(base_model, peft_model_id)
model.eval()

streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

prompt = "Explain the P vs NP problem"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

print("Streaming Response:")
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=1000,
        streamer=streamer,
        do_sample=True,
        temperature=0.7,
        top_p=0.9
    )
```

### Deployment Options

- **Transformers**: Python, PyTorch integration
- **vLLM**: High-throughput inference server
- **Ollama**: Easy local deployment (20GB model size)
- **169Pi API**: Production-ready hosted inference

---

## Citation

```bibtex
@misc{169pi2025alpiecore,
  title     = {Alpie-Core: A 4-Bit Quantized Reasoning Model from India that Outperforms Full-Precision Models},
  author    = {169Pi AI},
  year      = {2025},
  url       = {https://huggingface.co/169Pi/Alpie-Core}
}
```

---

## Community & Contributions

Released under Apache 2.0 - we welcome the community to build, extend, and improve!

1. **Issues & Discussions**: Report bugs or suggest features on Hugging Face
2. **Contributions**: Pull requests welcome for improvements
3. **Share Results**: Post your fine-tuning experiments and benchmarks
4. **Collaborate**: Join us in shaping the future of efficient AI

---

## License

**Apache 2.0 License** – Permissive for research and commercial use

---

## Acknowledgements

Thanks to **DeepSeek** for the original model foundation. We also acknowledge:

- **Hugging Face** ecosystem (Transformers, PEFT, vLLM, bitsandbytes)
- Open-source datasets (MMLU, GSM8K, SWE-Bench, etc.)
- Cloud infrastructure providers
- The broader AI research community

---

## Contact

**Technical Support**: support@169pi.com

---

*Alpie Core represents a milestone for open-source AI from India, demonstrating that 4-bit reasoning models can rival frontier-scale systems. We hope this release empowers developers, researchers, and organizations worldwide to build more efficient, inclusive, and impactful AI.*

**Get started today with 5 million free tokens at [169pi.ai](https://169pi.ai/)**