File size: 12,290 Bytes
861a266
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53c3c73
 
 
861a266
 
 
 
53c3c73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2eae6d7
53c3c73
31db0f5
 
 
 
0634d89
730abb3
0634d89
730abb3
0634d89
730abb3
0634d89
730abb3
0634d89
730abb3
0634d89
31db0f5
 
 
 
53c3c73
 
 
 
 
 
 
 
 
 
 
31db0f5
53c3c73
f3dd06b
 
 
 
 
 
 
 
 
 
 
53c3c73
31db0f5
53c3c73
 
 
 
f3dd06b
 
53c3c73
f3dd06b
 
53c3c73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31db0f5
53c3c73
58c0997
53c3c73
 
 
 
 
 
 
 
 
31db0f5
53c3c73
64a951c
831a704
b93a5f6
831a704
64a951c
831a704
64a951c
831a704
b93a5f6
831a704
 
 
 
 
 
 
 
 
 
53c3c73
31db0f5
53c3c73
f3dd06b
 
 
 
 
 
 
 
 
861a266
53c3c73
31db0f5
53c3c73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31db0f5
53c3c73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31db0f5
53c3c73
 
 
 
 
 
 
 
 
 
31db0f5
53c3c73
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
---
tags:
- text-generation
- reasoning
- coding
- mathematics
- quantization
license: apache-2.0
datasets:
- synthetic
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
language:
- en
- hi
library_name: transformers
pipeline_tag: text-generation
---
# Alpie-Core: 4-bit Quantized Reasoning Model

---
<p align="center">
  <img src="./Frame%202018777151.png" alt="Alpie-Core Architecture" width="700"/>
</p>
*[Space reserved for blog paper, technical report links]*
---

## 1. Introduction

Alpie-Core is one of the world's first fine-tuned 4-bit reasoning models, proving that aggressive quantization can surpass full-precision baselines in reasoning, mathematics, and coding. By combining cutting-edge quantization-aware training with synthetic STEM-rich datasets, Alpie-Core achieves frontier-level reasoning while being practical for real-world deployment at scale.

## 2. Model Summary

- **Base Architecture**: DeepSeek-R1-Distill-Qwen-32B
- **Parameters**: 32 billion (quantized to 4-bit)
- **Training Method**: Supervised Fine-Tuning (SFT) using LoRA/QLoRA techniques
- **Quantization**: 4-bit NF4 with double quantization
- **Context Length**: 65,536 tokens
- **Max Output Length**: 16,384 tokens
- **License**: Apache 2.0


## 3. Approach

**Alpie-Core** has undergone extensive **supervised fine-tuning (SFT)** to strengthen reasoning, robustness, and safety. The training leveraged a diverse mixture of curated open-source datasets and proprietary synthetic data, optimized with high-quality LLM-generated responses. The fine-tuning process emphasized adherence to rigorous safety and usability standards, including:

1)**User Understanding and Clarity** – ensuring outputs are direct, interpretable, and pedagogically sound.

2)**Security and Ethical Guidelines** – filtering unsafe or harmful generations during and after training.

3)**Limitations, Disclaimers, and Knowledge Boundaries** – transparently communicating uncertainty and scope.

4)**Handling Complex and Sensitive Topics** – balancing informativeness with responsible guardrails.

5)**Safety and Respectful Engagement** – maintaining politeness, inclusivity, and cultural sensitivity.

6)**Confidentiality and Responsible Use** – preventing leakage of private training data, proprietary prompts, or internal reasoning traces.

This SFT approach enables Alpie-Core to deliver reliable, aligned, and context-aware responses while maintaining safety across a broad range of use cases.

## 4. Model Features

1. **Supports Streaming** – Real-time token-level responses
2. **OpenAI-Compatible API** – Seamless integration with OpenAI client libraries
3. **65K Context Length** – Handles very large inputs and conversations
4. **16,384 Max Output Length** – Enables extremely long generations
5. **4-Bit Quantization** – Memory-efficient and optimized for deployment
6. **High Throughput Inference** – Powered by vLLM for efficient large-scale serving
7. **Low Latency Inference** – Fast response times optimized for production
8. **Customizable Safety & Moderation Filters** – Built-in guardrails for safer outputs
9. **Supports Function Calling / Tool Use** – Enables structured outputs and external API integration

## 5. Key Highlights

1. **Frontier Performance in 4-bit**: 81.28% MMLU, 92.75% GSM8K, 57.8% SWE-Bench Verified

2) **STEM + Coding Excellence**: Outperforms full-precision peers in mathematics and programming

3) **Enhanced Content Access**: Provides factual responses to geopolitically sensitive topics

4) **Quantization Efficiency**: A 4-bit quantized variant achieves competitive performance retention compared to full-precision models, demonstrating that aggressive quantization can preserve task accuracy while substantially reducing hardware requirements.

5) **Benchmark Competitiveness**: Across more than ten standard evaluation benchmarks, the model demonstrates performance on par with or exceeding that of larger 70B+ parameter systems, highlighting the effectiveness of our training and optimization strategies.

6) **Environmental Benefits**: Through quantization and efficiency-focused design, the model requires significantly fewer computational resources. This translates into lower energy consumption and reduced carbon footprint relative to full-precision deployments.

## 6. Benchmark Results

| Benchmark | Alpie-Core (32B-4bit) | DeepSeek-V2 (236B) | Qwen2.5 72B | Llama 3.1 405B | Llama 3.1 70B | Gemma-3 27B-PT | Mistral-Small-24B-Base-2501 |
|-----------|----------------------|-------------------|-------------|---------------|---------------|----------------|----------------------------|
| MMLU (5-shot) | **81.28%** | 78.4% | 85.0% | 84.4% | 79.3% | 78.6% | 80.73% |
| GSM8K (8-shot) | **92.75%** | 81.6% | 88.3% | 83.5% | - | 82.2% | 80.73% |
| BBH (3-shot) | **85.12%** | 78.8% | 79.8% | 82.9% | 81.6% | 77.7% | - |
| MMLU-Pro (5-shot) | **64.78%** | 51.4% | 58.3% | 52.8% | 53.8% | 52.2% | 54.37% |
| MBPP (pass@1) | **75.20%** | 65.0% | 72.6% | 68.4% | - | 65.6% | 69.64% |
| HumanEval (pass@1) | **57.23%** | 43.3% | 53.0% | 54.9% | - | 48.8% | = |

### SWE-Bench Verified Performance

| Rank | Model | Accuracy (%) | Performance vs Alpie |
|------|-------|-------------|---------------------|
| **1** | **Alpie Core** | **57.8** | **Alpie** |
| 2 | Qwen3-Coder-30B-A3B-Instruct | 51.6 | Below Alpie |
| 3 | o1 | 48.9 | Below Alpie |
| 4 | o3-mini (high) | 49.3 | Below Alpie |
| 5 | Claude 3.5 Sonnet | 49.0 | Below Alpie |
| 6 | DeepSeek R1 | 49.2 | Below Alpie |
| 7 | Devstral | 46.8 | Below Alpie |

### Humanity's Last Exam Leaderboard Performance

| Rank | Model | Accuracy (%) | Performance vs Alpie |
|------|-------|-------------|---------------------|
| 1 | GPT 4.5 Preview | 5.8 | Above Alpie |
| 2 | Claude Sonnet 4 | 5.42 | Above Alpie |
| **3** | **Alpie Core 32B (4-bit)** | **5.41** | **Alpie** |
| 4 | Llama 4 Maverik | 5.34 | Below Alpie |
| 5 | GPT 4.1 | 4.97 | Below Alpie |
| 6 | Kimi K2 Instruct | 4.68 | Below Alpie |
| 7 | DeepSeek V3 | 4.55 | Below Alpie |
| 8 | Gemini 1.5 Pro 002 | 4.55 | Below Alpie |

### Additional Benchmarks

| Benchmark | Alpie-Core (32B-4bit) | Category |
|-----------|----------------------|----------|
| AIME | **47.34%** | Advanced Mathematics |
| GPQA (Diamond) | **40.91%** | Graduate-level QA |
| TruthfulQA (MC2) | **60.05%** | Truthfulness |
| HellaSwag | **84.66%** | Commonsense |
| PIQA | **83.24%** | Physical Reasoning |
| ARC Challenge | **67.58%** | Science QA |
| CommonSenseQA | **87.06%** | Commonsense |
| AGIEval | **64.98%** | General Intelligence |
| Winogrande | **79.53%** | Commonsense Reasoning |

## 7. Training Details

- **Hardware**: 8× NVIDIA H100-80GB GPUs
- **Training Duration**: 408 hours
- **Fine-tuning Method**: LoRA/QLoRA with the following configuration:
  - LoRA Alpha: 8
  - LoRA Dropout: 0.05
  - LoRA Rank: 8
- **Quantization**: 4-bit NF4 + Double Quantization + FP16 compute
- **Dataset Domains**: Mathematics, coding, reasoning, science, general knowledge, competitive exams, Indian context + law, multilingual (Hindi and Hinglish)
- **Synthetic Data Advantage**: +15-20% performance boost in STEM & coding domains

## 8. Environmental Impact

**Carbon Footprint**: We estimated the environmental impact of training Alpie-Core (32B) on 8× NVIDIA H100-80GB GPUs by calculating carbon emissions from GPU energy consumption. The calculation follows the formula:
CO₂e (kg) = Grid CO₂ Factor (kg/kWh) × Runtime (hours) × Power per GPU (kW) × Number of GPUs

Training Parameters:
Grid CO₂ Factor (Azure average): 0.364 kg CO₂e per kWh 
Runtime: 408 hours
GPUs: 8× H100-80GB
We report results under two assumption modes:

Realistic mode (average training draw ≈ 250 W per GPU = 0.25 kWh/hr): 0.364 × 408 × 0.25 × 8 ≈ 298 kg CO₂e


Conservative mode (near TDP ≈ 700 W per GPU = 0.70 kWh/hr): 0.364 × 408 × 0.70 × 8 ≈ 835 kg CO₂e


Total training footprint ranges from ~298 kg CO₂e (realistic) to ~835 kg CO₂e (conservative worst-case)




## 9. Use Cases

Best for **STEM**, **complex mathematical reasoning**, **coding**, and **Indian context**

1)**STEM**: Excels at solving advanced problems in science, technology, engineering, and mathematics with high accuracy.

2)**Complex Mathematical Reasoning**: Handles multi-step logical and quantitative reasoning tasks with strong reliability.

3)**Coding**: Supports software development, debugging, and algorithmic problem-solving across multiple programming languages.

4)**Indian Context**: Provides culturally aware insights, competitive exam assistance (JEE, NEET, UPSC), and multilingual support in Hindi/Hinglish.


## 10. Safety and Limitations

### Enhanced Content Access
Unlike the base DeepSeek model, Alpie-Core provides factual, balanced responses to geopolitically sensitive questions, offering global accessibility and factual accuracy on topics like Taiwan's status, Arunachal Pradesh sovereignty, and other sensitive geopolitical issues.

### Current Limitations
- Multilingual reasoning in Hindi/Hinglish shows room for improvement
- Fixed knowledge cutoff without real-time information retrieval
- Occasional struggles with complex multi-hop mathematical reasoning
- Potential hallucinations in factual question-answering

### Mitigations
- Safety classifiers and output filtering systems
- Model-assisted safety pipeline using RLHF
- Comprehensive adversarial testing by domain experts

## 11. How to Use

### Non-Streaming Inference
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig
import torch

# Load LoRA adapter configuration to find the base model
peft_model_id = "169Pi/Alpie-core"
config = PeftConfig.from_pretrained(peft_model_id)

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load LoRA weights
model = PeftModel.from_pretrained(base_model, peft_model_id)

# Ensure evaluation mode
model.eval()

# Sample inference
prompt = "Solve the Riemann Hypothesis and provide a final answer?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=1000)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("Response:\n", response)
```

### Streaming Inference
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
from peft import PeftModel, PeftConfig
import torch

# Load LoRA adapter configuration to find the base model
peft_model_id = "169Pi/Alpie-core"
config = PeftConfig.from_pretrained(peft_model_id)

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load LoRA weights
model = PeftModel.from_pretrained(base_model, peft_model_id)

# Ensure evaluation mode
model.eval()

# Initialize streamer
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

# Sample streaming inference
prompt = "Solve the Riemann Hypothesis and provide a final answer?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

print("Streaming Response:")
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=1000,
        streamer=streamer,
        do_sample=True,
        temperature=0.7,
        top_p=0.9
    )
```

### Deployment Options
- **Transformers**: Python, PyTorch integration
- **vLLM**: High-throughput inference
- **LMDeploy/Ollama/TensorRT-LLM**: Production deployments

## 12. Citation

```bibtex
@misc{alpie2025core,
  title     = {Alpie-Core: A 4-bit Quantized Reasoning Model Surpassing Full-Precision Benchmarks},
  author    = {Alpie AI},
  year      = {2025},
  url       = {https://huggingface.co/alpie/Alpie-Core-4bit}
}
```

## 13. License

Apache 2.0 – Free for research and commercial use

---

*For technical details, training methodology, and comprehensive evaluation results, please refer to our technical report.*