Update README.md
Browse files
README.md
CHANGED
|
@@ -17,9 +17,8 @@ language:
|
|
| 17 |
library_name: transformers
|
| 18 |
pipeline_tag: text-generation
|
| 19 |
---
|
| 20 |
-
# Alpie Core: 4-bit Quantized Reasoning Model
|
| 21 |
|
| 22 |
-
|
| 23 |
|
| 24 |
<p align="center">
|
| 25 |
<a href="https://169pi.ai/"><img src="https://img.shields.io/badge/🌐%20Website-169Pi%20AI-blue" alt="Website"></a>
|
|
@@ -29,73 +28,178 @@ pipeline_tag: text-generation
|
|
| 29 |
<a href="https://x.com/169Pi_ai"><img src="https://img.shields.io/badge/X-169Pi%20AI-black" alt="X"></a>
|
| 30 |
</p>
|
| 31 |
|
| 32 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
-
|
|
|
|
| 35 |
|
| 36 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |

|
| 39 |
|
| 40 |
-
|
|
|
|
|
|
|
| 41 |
|
| 42 |
- **Base Architecture**: DeepSeek-R1-Distill-Qwen-32B
|
| 43 |
- **Parameters**: 32 billion (quantized to 4-bit)
|
| 44 |
-
- **Training Method**: Supervised Fine-Tuning (SFT) using LoRA/QLoRA
|
| 45 |
- **Quantization**: 4-bit NF4 with double quantization
|
| 46 |
- **Context Length**: 65k tokens
|
| 47 |
- **Max Output Length**: 16,384 tokens
|
| 48 |
-
- **Training Data
|
| 49 |
- **License**: Apache 2.0
|
| 50 |
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
**Alpie Core** has undergone extensive **supervised fine-tuning (SFT)** to strengthen reasoning, robustness, and safety. The training leveraged a diverse mixture of curated open-source datasets and proprietary synthetic data, optimised with high-quality LLM-generated responses. The fine-tuning process emphasised adherence to rigorous safety and usability standards, including:
|
| 54 |
|
| 55 |
-
|
| 56 |
|
| 57 |
-
|
| 58 |
|
| 59 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
-
|
| 62 |
|
| 63 |
-
|
| 64 |
|
| 65 |
-
|
| 66 |
|
| 67 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 68 |
|
| 69 |
-
|
| 70 |
|
| 71 |
-
|
| 72 |
-
2. **OpenAI-Compatible API** – Seamless integration with OpenAI client libraries
|
| 73 |
-
3. **65K Context Length** – Handles very large inputs and conversations
|
| 74 |
-
4. **16,384 Max Output Length** – Enables extremely long generations
|
| 75 |
-
5. **4-Bit Quantization** – Memory-efficient and optimised for deployment
|
| 76 |
-
6. **High Throughput Inference** – Powered by vLLM for efficient large-scale serving
|
| 77 |
-
7. **Low Latency Inference** – Fast response times optimized for production
|
| 78 |
-
8. **Customizable Safety & Moderation Filters** – Built-in guardrails for safer outputs
|
| 79 |
-
9. **Supports Function Calling / Tool Use** – Enables structured outputs and external API integration
|
| 80 |
-
10. **Instruction Following** – Optimised for reasoning and chain-of-thought stepwise answers.
|
| 81 |
-
11. **Education & Research Ready** – Tailored for competitive exams, STEM reasoning, and knowledge-intensive tasks.
|
| 82 |
|
| 83 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
|
| 85 |
-
|
| 86 |
-
2. **Benchmark Competitiveness**: Outperforms or matches 70B+ models across reasoning, math, and coding
|
| 87 |
-
3. **STEM & Coding Strength**: Excellent on GSM8K, MATH-500, HumanEval, SWE-Bench Verified
|
| 88 |
-
4. **Efficiency & Deployment**: 16 GB VRAM footprint, runs on commodity GPUs with vLLM
|
| 89 |
-
5. **Extended Context Length**: 65K tokens for research papers, conversations, multi-document reasoning
|
| 90 |
-
6. **Environmental Benefits**: ~298–835 kg CO₂e, 2–3× more efficient than FP16 training
|
| 91 |
-
7. **Open-Source Commitment**: Released under Apache 2.0 for global use
|
| 92 |
|
| 93 |
-
##
|
| 94 |
|
| 95 |

|
| 96 |
|
| 97 |
-
|
| 98 |
-
|
|
|
|
|
|
|
| 99 |
| MMLU (5-shot) | **81.28%** | 78.4% | 85.0% | 84.4% | 79.3% | 78.6% | 80.73% |
|
| 100 |
| GSM8K (8-shot) | **92.75%** | 81.6% | 88.3% | 83.5% | - | 82.2% | 80.73% |
|
| 101 |
| BBH (3-shot) | **85.12%** | 78.8% | 79.8% | 82.9% | 81.6% | 77.7% | - |
|
|
@@ -103,39 +207,36 @@ This SFT approach enables Alpie Core to deliver reliable, aligned, and context-a
|
|
| 103 |
| MBPP (pass@1) | **75.20%** | 65.0% | 72.6% | 68.4% | - | 65.6% | 69.64% |
|
| 104 |
| HumanEval (pass@1) | **57.23%** | 43.3% | 53.0% | 54.9% | - | 48.8% | - |
|
| 105 |
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
|
| 111 |
-
|
| 112 |
-
|
|
| 113 |
-
|
|
| 114 |
-
|
|
| 115 |
-
|
|
| 116 |
-
|
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
|
| 123 |
-
|
| 124 |
-
|
|
| 125 |
-
|
|
| 126 |
-
|
|
| 127 |
-
|
|
| 128 |
-
|
|
| 129 |
-
| 6 | Kimi K2 Instruct | 4.68 | Below Alpie |
|
| 130 |
-
| 7 | DeepSeek V3 | 4.55 | Below Alpie |
|
| 131 |
-
| 8 | Gemini 1.5 Pro 002 | 4.55 | Below Alpie |
|
| 132 |
|
| 133 |

|
| 134 |
|
| 135 |
### Additional Benchmarks
|
| 136 |
|
| 137 |
-
| Benchmark | Alpie Core
|
| 138 |
-
|
| 139 |
| AIME | **47.34%** | Advanced Mathematics |
|
| 140 |
| GPQA (Diamond) | **40.91%** | Graduate-level QA |
|
| 141 |
| TruthfulQA (MC2) | **60.05%** | Truthfulness |
|
|
@@ -149,173 +250,129 @@ These results demonstrate Alpie Core's ability to rival or surpass leading propr
|
|
| 149 |
|
| 150 |

|
| 151 |
|
| 152 |
-
|
|
|
|
|
|
|
| 153 |
|
| 154 |
-
- **Hardware**: 8× NVIDIA
|
| 155 |
-
- **Fine-tuning Method**: LoRA/QLoRA
|
| 156 |
- LoRA Alpha: 16
|
| 157 |
- LoRA Dropout: 0.05
|
| 158 |
- LoRA Rank: 16
|
| 159 |
- **Quantization**: 4-bit NF4 + Double Quantization + FP16 compute
|
| 160 |
-
- **Dataset Domains**: Mathematics, coding, reasoning, science,
|
| 161 |
-
- **Synthetic Data Advantage**: +15-20% performance boost in STEM & coding
|
| 162 |
-
- **Training Strategy**: Multi-stage distillation → SFT → safety alignment
|
| 163 |
-
- **
|
|
|
|
|
|
|
| 164 |
|
| 165 |
-
##
|
| 166 |
|
| 167 |

|
| 168 |
|
| 169 |
-
|
| 170 |
|
| 171 |
-
CO₂e (kg) = Grid CO₂ Factor
|
| 172 |
|
| 173 |
-
Training Parameters
|
| 174 |
-
- Grid CO₂ Factor (Azure
|
| 175 |
- Runtime: 408 hours
|
| 176 |
- GPUs: 8× H100-80GB
|
| 177 |
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
**
|
| 181 |
-
|
| 182 |
-
**Conservative mode** (near TDP ≈ 700 W per GPU = 0.70 kWh/hr): 0.364 × 408 × 0.70 × 8 ≈ **835 kg CO₂e**
|
| 183 |
-
|
| 184 |
-
Total training footprint ranges from ~298 kg CO₂e (realistic) to ~835 kg CO₂e (conservative worst-case)
|
| 185 |
|
| 186 |
*This makes Alpie Core one of the most carbon-efficient reasoning models released to date.*
|
| 187 |
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
Best for **STEM**, **complex mathematical reasoning**, **coding**, and **Indian context**
|
| 191 |
-
|
| 192 |
-
1. **STEM**: Excels at solving advanced problems in science, technology, engineering, and mathematics with high accuracy.
|
| 193 |
|
| 194 |
-
|
| 195 |
|
| 196 |
-
|
| 197 |
|
| 198 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 199 |
|
| 200 |
-
|
| 201 |
|
| 202 |
-
##
|
| 203 |
|
| 204 |
### Enhanced Content Access
|
| 205 |
-
|
|
|
|
| 206 |
|
| 207 |
### Current Limitations
|
|
|
|
| 208 |
- Multilingual reasoning in Hindi/Hinglish shows room for improvement
|
| 209 |
- Fixed knowledge cutoff without real-time information retrieval
|
| 210 |
- Occasional struggles with complex multi-hop mathematical reasoning
|
| 211 |
- Potential hallucinations in factual question-answering
|
| 212 |
-
-
|
| 213 |
-
- Biases: Training on synthetic + curated datasets reduces bias, but some risks may persist.
|
| 214 |
|
| 215 |
### Mitigations
|
|
|
|
| 216 |
- Safety classifiers and output filtering systems
|
| 217 |
- Model-assisted safety pipeline using RLHF
|
| 218 |
- Comprehensive adversarial testing by domain experts
|
| 219 |
|
| 220 |
-
|
| 221 |
|
| 222 |
-
|
| 223 |
|
| 224 |
```bash
|
| 225 |
-
# Install
|
| 226 |
pip install pi169
|
| 227 |
|
| 228 |
-
# Set
|
| 229 |
export ALPIE_API_KEY="your_key_here"
|
| 230 |
|
| 231 |
-
#
|
| 232 |
-
pi169 "Explain 4-bit quantization
|
| 233 |
```
|
| 234 |
|
| 235 |
### SDK Features
|
| 236 |
|
| 237 |
-
- **CLI Integration** for quick
|
| 238 |
-
- Streaming & Non-Streaming
|
| 239 |
-
- **Async/Await Support** for
|
| 240 |
-
-
|
| 241 |
-
- Robust Error Handling
|
| 242 |
-
-
|
| 243 |
-
- Fully Tested with pytest
|
| 244 |
-
- Optimized for Reasoning Models
|
| 245 |
-
- **OpenAI-Compatible Client**: Drop-in replacement for OpenAI SDK with full compatibility
|
| 246 |
-
|
| 247 |
-
For more details, visit the [pi169 PyPI package](https://pypi.org/project/pi169/0.1/).
|
| 248 |
|
| 249 |
-
|
| 250 |
-
|
| 251 |
-
### Non-Streaming Inference
|
| 252 |
-
```python
|
| 253 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 254 |
-
from peft import PeftModel, PeftConfig
|
| 255 |
-
import torch
|
| 256 |
-
|
| 257 |
-
# Load LoRA adapter configuration to find the base model
|
| 258 |
-
peft_model_id = "169Pi/Alpie-Core"
|
| 259 |
-
config = PeftConfig.from_pretrained(peft_model_id)
|
| 260 |
|
| 261 |
-
|
| 262 |
-
base_model = AutoModelForCausalLM.from_pretrained(
|
| 263 |
-
config.base_model_name_or_path,
|
| 264 |
-
torch_dtype=torch.float16,
|
| 265 |
-
device_map="auto"
|
| 266 |
-
)
|
| 267 |
-
|
| 268 |
-
# Load tokenizer
|
| 269 |
-
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
|
| 270 |
-
|
| 271 |
-
# Load LoRA weights
|
| 272 |
-
model = PeftModel.from_pretrained(base_model, peft_model_id)
|
| 273 |
-
|
| 274 |
-
# Ensure evaluation mode
|
| 275 |
-
model.eval()
|
| 276 |
-
|
| 277 |
-
# Sample inference
|
| 278 |
-
prompt = "Solve the Riemann Hypothesis and provide a final answer?"
|
| 279 |
-
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
| 280 |
|
| 281 |
-
|
| 282 |
-
outputs = model.generate(**inputs, max_new_tokens=1000)
|
| 283 |
-
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 284 |
|
| 285 |
-
|
| 286 |
-
```
|
| 287 |
|
| 288 |
-
### Streaming Inference
|
| 289 |
```python
|
| 290 |
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
|
| 291 |
from peft import PeftModel, PeftConfig
|
| 292 |
import torch
|
| 293 |
|
| 294 |
-
# Load LoRA adapter configuration to find the base model
|
| 295 |
peft_model_id = "169Pi/Alpie-Core"
|
| 296 |
config = PeftConfig.from_pretrained(peft_model_id)
|
| 297 |
|
| 298 |
-
# Load the base model
|
| 299 |
base_model = AutoModelForCausalLM.from_pretrained(
|
| 300 |
config.base_model_name_or_path,
|
| 301 |
torch_dtype=torch.float16,
|
| 302 |
device_map="auto"
|
| 303 |
)
|
| 304 |
|
| 305 |
-
# Load tokenizer
|
| 306 |
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
|
| 307 |
-
|
| 308 |
-
# Load LoRA weights
|
| 309 |
model = PeftModel.from_pretrained(base_model, peft_model_id)
|
| 310 |
-
|
| 311 |
-
# Ensure evaluation mode
|
| 312 |
model.eval()
|
| 313 |
|
| 314 |
-
# Initialize streamer
|
| 315 |
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
|
| 316 |
|
| 317 |
-
|
| 318 |
-
prompt = "Solve the Riemann Hypothesis and provide a final answer?"
|
| 319 |
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
| 320 |
|
| 321 |
print("Streaming Response:")
|
|
@@ -331,22 +388,15 @@ with torch.no_grad():
|
|
| 331 |
```
|
| 332 |
|
| 333 |
### Deployment Options
|
| 334 |
-
- **Transformers**: Python, PyTorch integration
|
| 335 |
-
- **vLLM**: High-throughput inference
|
| 336 |
-
- **Ollama**: Easy local deployment and inference
|
| 337 |
-
- **Size**: 20GB
|
| 338 |
-
- **Requirements**: Minimum 20GB RAM/VRAM for local execution
|
| 339 |
-
- **Local Deployment**: Runs efficiently on local machines with sufficient resources
|
| 340 |
|
| 341 |
-
|
| 342 |
-
|
| 343 |
-
|
|
|
|
| 344 |
|
| 345 |
-
|
| 346 |
-
ollama run 169pi/alpie-core
|
| 347 |
-
```
|
| 348 |
|
| 349 |
-
##
|
| 350 |
|
| 351 |
```bibtex
|
| 352 |
@misc{169pi2025alpiecore,
|
|
@@ -357,34 +407,42 @@ ollama run 169pi/alpie-core
|
|
| 357 |
}
|
| 358 |
```
|
| 359 |
|
| 360 |
-
|
| 361 |
|
| 362 |
-
|
| 363 |
|
| 364 |
-
|
| 365 |
|
| 366 |
-
|
|
|
|
|
|
|
|
|
|
| 367 |
|
| 368 |
-
|
| 369 |
|
| 370 |
-
|
| 371 |
|
| 372 |
-
|
| 373 |
|
| 374 |
-
|
| 375 |
|
| 376 |
-
|
| 377 |
|
| 378 |
-
|
| 379 |
|
| 380 |
-
|
|
|
|
|
|
|
|
|
|
| 381 |
|
| 382 |
-
|
| 383 |
|
| 384 |
-
##
|
| 385 |
|
| 386 |
-
|
| 387 |
|
| 388 |
---
|
| 389 |
-
|
| 390 |
-
*
|
|
|
|
|
|
|
|
|
| 17 |
library_name: transformers
|
| 18 |
pipeline_tag: text-generation
|
| 19 |
---
|
|
|
|
| 20 |
|
| 21 |
+
# Alpie Core: 4-bit Quantized Reasoning Model
|
| 22 |
|
| 23 |
<p align="center">
|
| 24 |
<a href="https://169pi.ai/"><img src="https://img.shields.io/badge/🌐%20Website-169Pi%20AI-blue" alt="Website"></a>
|
|
|
|
| 28 |
<a href="https://x.com/169Pi_ai"><img src="https://img.shields.io/badge/X-169Pi%20AI-black" alt="X"></a>
|
| 29 |
</p>
|
| 30 |
|
| 31 |
+
## TL;DR
|
| 32 |
+
|
| 33 |
+
- **32B reasoning model**, trained & served at **4-bit quantization**
|
| 34 |
+
- **Competitive with GPT-4o / Claude 3.5 Sonnet** on reasoning & coding benchmarks
|
| 35 |
+
- **65K context length** for long-document reasoning
|
| 36 |
+
- **Open source** (Apache 2.0) - fully permissive for commercial use
|
| 37 |
+
- Available via **Ollama**, **Hugging Face**, and **hosted API** with 5M free tokens
|
| 38 |
+
|
| 39 |
+
📄 **[Technical Report: Alpie Core.pdf](./Alpie_Core.pdf)**
|
| 40 |
+
|
| 41 |
+
---
|
| 42 |
+
|
| 43 |
+
## How to Use Alpie Core
|
| 44 |
+
|
| 45 |
+
### Option 1: Local Inference with Ollama (Recommended for Quick Start)
|
| 46 |
+
|
| 47 |
+
```bash
|
| 48 |
+
# Pull the model (20GB)
|
| 49 |
+
ollama pull 169pi/alpie-core
|
| 50 |
+
|
| 51 |
+
# Run inference
|
| 52 |
+
ollama run 169pi/alpie-core
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
**Requirements**: 20GB RAM/VRAM minimum
|
| 56 |
+
|
| 57 |
+
### Option 2: Hosted Inference via 169Pi API
|
| 58 |
+
|
| 59 |
+
Get started instantly with our **hosted API** - no setup required!
|
| 60 |
+
|
| 61 |
+
**Get your first free API key** including **5 million tokens** to test real workloads
|
| 62 |
+
|
| 63 |
+
- **OpenAI-compatible** - drop-in replacement for OpenAI SDK
|
| 64 |
+
- Supports **streaming**, **async**, and **long-context reasoning**
|
| 65 |
+
- Production-ready with low latency
|
| 66 |
+
|
| 67 |
+
**[Get your API key at 169pi.ai](https://169pi.ai/)**
|
| 68 |
+
|
| 69 |
+
### Option 3: Programmatic Access with Python SDK
|
| 70 |
+
|
| 71 |
+
```bash
|
| 72 |
+
# Install the official SDK
|
| 73 |
+
pip install pi169
|
| 74 |
+
|
| 75 |
+
# Set your API key
|
| 76 |
+
export ALPIE_API_KEY="your_key_here"
|
| 77 |
+
|
| 78 |
+
# Use via CLI
|
| 79 |
+
pi169 "Explain quantum entanglement"
|
| 80 |
|
| 81 |
+
# Or use in Python
|
| 82 |
+
from pi169 import AlpieClient
|
| 83 |
|
| 84 |
+
client = AlpieClient(api_key="your_key_here")
|
| 85 |
+
response = client.chat.completions.create(
|
| 86 |
+
model="alpie-core",
|
| 87 |
+
messages=[{"role": "user", "content": "Solve this coding problem..."}],
|
| 88 |
+
stream=True
|
| 89 |
+
)
|
| 90 |
+
```
|
| 91 |
+
|
| 92 |
+
**SDK Features**: Streaming, async/await, OpenAI compatibility, type-safe interface
|
| 93 |
+
|
| 94 |
+
### Option 4: Load Directly with Transformers (Advanced)
|
| 95 |
+
|
| 96 |
+
```python
|
| 97 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 98 |
+
from peft import PeftModel, PeftConfig
|
| 99 |
+
import torch
|
| 100 |
+
|
| 101 |
+
# Load LoRA adapter configuration
|
| 102 |
+
peft_model_id = "169Pi/Alpie-Core"
|
| 103 |
+
config = PeftConfig.from_pretrained(peft_model_id)
|
| 104 |
+
|
| 105 |
+
# Load base model + LoRA weights
|
| 106 |
+
base_model = AutoModelForCausalLM.from_pretrained(
|
| 107 |
+
config.base_model_name_or_path,
|
| 108 |
+
torch_dtype=torch.float16,
|
| 109 |
+
device_map="auto"
|
| 110 |
+
)
|
| 111 |
+
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
|
| 112 |
+
model = PeftModel.from_pretrained(base_model, peft_model_id)
|
| 113 |
+
|
| 114 |
+
# Inference
|
| 115 |
+
prompt = "Solve: What is the integral of x^2?"
|
| 116 |
+
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
| 117 |
+
outputs = model.generate(**inputs, max_new_tokens=1000)
|
| 118 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
---
|
| 122 |
+
|
| 123 |
+
## Why Alpie Core?
|
| 124 |
+
|
| 125 |
+
**Alpie Core is one of the first fine-tuned 4-bit reasoning models from India, and among the first worldwide at this scale.** Trained on just 8 Hopper GPUs using LoRA and QLoRA 4-bit quantization with synthetic STEM-rich datasets, it proves that aggressive quantization can match and even surpass full-precision baselines.
|
| 126 |
+
|
| 127 |
+
With a dramatically reduced memory footprint, Alpie Core delivers competitive, frontier-level reasoning performance, even beating top proprietary models. It achieves:
|
| 128 |
+
|
| 129 |
+
- **81.28% on MMLU** (5-shot)
|
| 130 |
+
- **92.75% on GSM8K** (8-shot)
|
| 131 |
+
- **57.8% on SWE-Bench Verified** (ranked #1 globally)
|
| 132 |
+
|
| 133 |
+
This demonstrates that efficient models can rival frontier systems while remaining practical for real-world deployment at scale.
|
| 134 |
|
| 135 |

|
| 136 |
|
| 137 |
+
---
|
| 138 |
+
|
| 139 |
+
## Model Summary
|
| 140 |
|
| 141 |
- **Base Architecture**: DeepSeek-R1-Distill-Qwen-32B
|
| 142 |
- **Parameters**: 32 billion (quantized to 4-bit)
|
| 143 |
+
- **Training Method**: Supervised Fine-Tuning (SFT) using LoRA/QLoRA
|
| 144 |
- **Quantization**: 4-bit NF4 with double quantization
|
| 145 |
- **Context Length**: 65k tokens
|
| 146 |
- **Max Output Length**: 16,384 tokens
|
| 147 |
+
- **Training Data**: Synthetic (STEM, reasoning, coding) + curated data (law, Indian context, exams, multilingual)
|
| 148 |
- **License**: Apache 2.0
|
| 149 |
|
| 150 |
+
---
|
|
|
|
|
|
|
| 151 |
|
| 152 |
+
## Approach
|
| 153 |
|
| 154 |
+
**Alpie Core** underwent extensive **supervised fine-tuning (SFT)** to strengthen reasoning, robustness, and safety. The training leveraged a diverse mixture of curated open-source datasets and proprietary synthetic data, optimized with high-quality LLM-generated responses. The fine-tuning process emphasized:
|
| 155 |
|
| 156 |
+
1. **User Understanding and Clarity** – ensuring outputs are direct, interpretable, and pedagogically sound
|
| 157 |
+
2. **Security and Ethical Guidelines** – filtering unsafe or harmful generations
|
| 158 |
+
3. **Limitations and Knowledge Boundaries** – transparently communicating uncertainty
|
| 159 |
+
4. **Handling Complex and Sensitive Topics** – balancing informativeness with responsible guardrails
|
| 160 |
+
5. **Safety and Respectful Engagement** – maintaining politeness, inclusivity, and cultural sensitivity
|
| 161 |
+
6. **Confidentiality and Responsible Use** – preventing leakage of private data or internal reasoning traces
|
| 162 |
|
| 163 |
+
This approach enables Alpie Core to deliver reliable, aligned, and context-aware responses while maintaining safety across a broad range of use cases, generalizing across global and Indian contexts.
|
| 164 |
|
| 165 |
+
---
|
| 166 |
|
| 167 |
+
## Model Features
|
| 168 |
|
| 169 |
+
1. **Supports Streaming** – Real-time token-level responses
|
| 170 |
+
2. **OpenAI-Compatible API** – Seamless integration with OpenAI client libraries
|
| 171 |
+
3. **65K Context Length** – Handles very large inputs and conversations
|
| 172 |
+
4. **16,384 Max Output Length** – Enables extremely long generations
|
| 173 |
+
5. **4-Bit Quantization** – Memory-efficient and optimized for deployment
|
| 174 |
+
6. **High Throughput Inference** – Powered by vLLM for efficient large-scale serving
|
| 175 |
+
7. **Low Latency Inference** – Fast response times optimized for production
|
| 176 |
+
8. **Customizable Safety & Moderation** – Built-in guardrails for safer outputs
|
| 177 |
+
9. **Supports Function Calling / Tool Use** – Structured outputs and external API integration
|
| 178 |
+
10. **Instruction Following** – Optimized for reasoning and chain-of-thought answers
|
| 179 |
+
11. **Education & Research Ready** – Tailored for competitive exams, STEM reasoning, and knowledge tasks
|
| 180 |
|
| 181 |
+
---
|
| 182 |
|
| 183 |
+
## Key Highlights
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 184 |
|
| 185 |
+
1. **First 4-bit Reasoning Model from India**: Competitive globally with frontier models
|
| 186 |
+
2. **Benchmark Competitiveness**: Outperforms or matches 70B+ models across reasoning, math, and coding
|
| 187 |
+
3. **STEM & Coding Strength**: Excellent on GSM8K, MATH-500, HumanEval, SWE-Bench Verified
|
| 188 |
+
4. **Efficiency & Deployment**: 16 GB VRAM footprint, runs on commodity GPUs
|
| 189 |
+
5. **Extended Context Length**: 65K tokens for research papers, multi-document reasoning
|
| 190 |
+
6. **Environmental Benefits**: ~298–835 kg CO₂e, 2–3× more efficient than FP16 training
|
| 191 |
+
7. **Open-Source Commitment**: Released under Apache 2.0 for global use
|
| 192 |
|
| 193 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 194 |
|
| 195 |
+
## Benchmark Results
|
| 196 |
|
| 197 |

|
| 198 |
|
| 199 |
+
### Core Benchmarks
|
| 200 |
+
|
| 201 |
+
| Benchmark | Alpie Core (32B-4bit) | DeepSeek-V2 (236B) | Qwen2.5 72B | Llama 3.1 405B | Llama 3.1 70B | Gemma-3 27B-PT | Mistral-Small-24B |
|
| 202 |
+
|-----------|----------------------|-------------------|-------------|---------------|---------------|----------------|-------------------|
|
| 203 |
| MMLU (5-shot) | **81.28%** | 78.4% | 85.0% | 84.4% | 79.3% | 78.6% | 80.73% |
|
| 204 |
| GSM8K (8-shot) | **92.75%** | 81.6% | 88.3% | 83.5% | - | 82.2% | 80.73% |
|
| 205 |
| BBH (3-shot) | **85.12%** | 78.8% | 79.8% | 82.9% | 81.6% | 77.7% | - |
|
|
|
|
| 207 |
| MBPP (pass@1) | **75.20%** | 65.0% | 72.6% | 68.4% | - | 65.6% | 69.64% |
|
| 208 |
| HumanEval (pass@1) | **57.23%** | 43.3% | 53.0% | 54.9% | - | 48.8% | - |
|
| 209 |
|
| 210 |
+
### SWE-Bench Verified Performance (#1 Globally)
|
| 211 |
+
|
| 212 |
+
| Rank | Model | Accuracy (%) | vs Alpie |
|
| 213 |
+
|------|-------|-------------|----------|
|
| 214 |
+
| **1** | **Alpie Core** | **57.8** | **—** |
|
| 215 |
+
| 2 | Qwen3-Coder-30B-A3B-Instruct | 51.6 | -6.2% |
|
| 216 |
+
| 3 | o1 | 48.9 | -8.9% |
|
| 217 |
+
| 4 | o3-mini (high) | 49.3 | -8.5% |
|
| 218 |
+
| 5 | Claude 3.5 Sonnet | 49.0 | -8.8% |
|
| 219 |
+
| 6 | DeepSeek R1 | 49.2 | -8.6% |
|
| 220 |
+
| 7 | Devstral | 46.8 | -11.0% |
|
| 221 |
+
|
| 222 |
+
### Humanity's Last Exam Leaderboard (#3 Globally)
|
| 223 |
+
|
| 224 |
+
| Rank | Model | Accuracy (%) | vs Alpie |
|
| 225 |
+
|------|-------|-------------|----------|
|
| 226 |
+
| 1 | GPT 4.5 Preview | 5.8 | +0.39% |
|
| 227 |
+
| 2 | Claude Sonnet 4 | 5.42 | +0.01% |
|
| 228 |
+
| **3** | **Alpie Core 32B (4-bit)** | **5.41** | **—** |
|
| 229 |
+
| 4 | Llama 4 Maverik | 5.34 | -0.07% |
|
| 230 |
+
| 5 | GPT 4.1 | 4.97 | -0.44% |
|
| 231 |
+
| 6 | Kimi K2 Instruct | 4.68 | -0.73% |
|
| 232 |
+
| 7 | DeepSeek V3 | 4.55 | -0.86% |
|
|
|
|
|
|
|
|
|
|
| 233 |
|
| 234 |

|
| 235 |
|
| 236 |
### Additional Benchmarks
|
| 237 |
|
| 238 |
+
| Benchmark | Alpie Core | Category |
|
| 239 |
+
|-----------|-----------|----------|
|
| 240 |
| AIME | **47.34%** | Advanced Mathematics |
|
| 241 |
| GPQA (Diamond) | **40.91%** | Graduate-level QA |
|
| 242 |
| TruthfulQA (MC2) | **60.05%** | Truthfulness |
|
|
|
|
| 250 |
|
| 251 |

|
| 252 |
|
| 253 |
+
---
|
| 254 |
+
|
| 255 |
+
## Training Details
|
| 256 |
|
| 257 |
+
- **Hardware**: 8× NVIDIA H100-80GB GPUs
|
| 258 |
+
- **Fine-tuning Method**: LoRA/QLoRA
|
| 259 |
- LoRA Alpha: 16
|
| 260 |
- LoRA Dropout: 0.05
|
| 261 |
- LoRA Rank: 16
|
| 262 |
- **Quantization**: 4-bit NF4 + Double Quantization + FP16 compute
|
| 263 |
+
- **Dataset Domains**: Mathematics, coding, reasoning, science, competitive exams, Indian context + law, multilingual (Hindi/Hinglish)
|
| 264 |
+
- **Synthetic Data Advantage**: +15-20% performance boost in STEM & coding
|
| 265 |
+
- **Training Strategy**: Multi-stage distillation → SFT → safety alignment
|
| 266 |
+
- **Total Training Time**: 408 hours
|
| 267 |
+
|
| 268 |
+
---
|
| 269 |
|
| 270 |
+
## Environmental Impact
|
| 271 |
|
| 272 |

|
| 273 |
|
| 274 |
+
We estimated the carbon footprint of training Alpie Core on 8× NVIDIA H100-80GB GPUs:
|
| 275 |
|
| 276 |
+
**Formula**: CO₂e (kg) = Grid CO₂ Factor × Runtime × Power per GPU × Number of GPUs
|
| 277 |
|
| 278 |
+
**Training Parameters**:
|
| 279 |
+
- Grid CO₂ Factor (Azure): 0.364 kg CO₂e/kWh
|
| 280 |
- Runtime: 408 hours
|
| 281 |
- GPUs: 8× H100-80GB
|
| 282 |
|
| 283 |
+
**Results**:
|
| 284 |
+
- **Realistic mode** (250W avg per GPU): **~298 kg CO₂e**
|
| 285 |
+
- **Conservative mode** (700W TDP per GPU): **~835 kg CO₂e**
|
|
|
|
|
|
|
|
|
|
|
|
|
| 286 |
|
| 287 |
*This makes Alpie Core one of the most carbon-efficient reasoning models released to date.*
|
| 288 |
|
| 289 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
| 290 |
|
| 291 |
+
## Use Cases
|
| 292 |
|
| 293 |
+
Best for **STEM**, **complex mathematical reasoning**, **coding**, and **Indian context**
|
| 294 |
|
| 295 |
+
1. **STEM Education**: Advanced problem-solving in science, technology, engineering, mathematics
|
| 296 |
+
2. **Mathematical Reasoning**: Multi-step logical and quantitative reasoning
|
| 297 |
+
3. **Software Development**: Code generation, debugging, algorithmic problem-solving
|
| 298 |
+
4. **Indian Context**: Competitive exam assistance (JEE, NEET, UPSC), Hindi/Hinglish support
|
| 299 |
+
5. **Research & Legal**: 65K context for academic papers, legal documents, long-form analysis
|
| 300 |
|
| 301 |
+
---
|
| 302 |
|
| 303 |
+
## Safety and Limitations
|
| 304 |
|
| 305 |
### Enhanced Content Access
|
| 306 |
+
|
| 307 |
+
Unlike the base DeepSeek model, Alpie Core provides factual, balanced responses to geopolitically sensitive questions, offering global accessibility on topics like Taiwan's status, Arunachal Pradesh sovereignty, and other sensitive issues.
|
| 308 |
|
| 309 |
### Current Limitations
|
| 310 |
+
|
| 311 |
- Multilingual reasoning in Hindi/Hinglish shows room for improvement
|
| 312 |
- Fixed knowledge cutoff without real-time information retrieval
|
| 313 |
- Occasional struggles with complex multi-hop mathematical reasoning
|
| 314 |
- Potential hallucinations in factual question-answering
|
| 315 |
+
- Should not be used for medical/legal advice without expert oversight
|
|
|
|
| 316 |
|
| 317 |
### Mitigations
|
| 318 |
+
|
| 319 |
- Safety classifiers and output filtering systems
|
| 320 |
- Model-assisted safety pipeline using RLHF
|
| 321 |
- Comprehensive adversarial testing by domain experts
|
| 322 |
|
| 323 |
+
---
|
| 324 |
|
| 325 |
+
## Python SDK Quick Start
|
| 326 |
|
| 327 |
```bash
|
| 328 |
+
# Install
|
| 329 |
pip install pi169
|
| 330 |
|
| 331 |
+
# Set API key
|
| 332 |
export ALPIE_API_KEY="your_key_here"
|
| 333 |
|
| 334 |
+
# CLI usage
|
| 335 |
+
pi169 "Explain 4-bit quantization"
|
| 336 |
```
|
| 337 |
|
| 338 |
### SDK Features
|
| 339 |
|
| 340 |
+
- **CLI Integration** for quick interactions
|
| 341 |
+
- **Streaming & Non-Streaming** completions
|
| 342 |
+
- **Async/Await Support** for concurrent requests
|
| 343 |
+
- **Type-safe Interface** with dataclasses
|
| 344 |
+
- **Robust Error Handling**
|
| 345 |
+
- **OpenAI-Compatible**: Drop-in replacement
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 346 |
|
| 347 |
+
[Full SDK documentation on PyPI](https://pypi.org/project/pi169/0.1/)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 348 |
|
| 349 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 350 |
|
| 351 |
+
## Advanced Usage Examples
|
|
|
|
|
|
|
| 352 |
|
| 353 |
+
### Streaming Inference with Transformers
|
|
|
|
| 354 |
|
|
|
|
| 355 |
```python
|
| 356 |
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
|
| 357 |
from peft import PeftModel, PeftConfig
|
| 358 |
import torch
|
| 359 |
|
|
|
|
| 360 |
peft_model_id = "169Pi/Alpie-Core"
|
| 361 |
config = PeftConfig.from_pretrained(peft_model_id)
|
| 362 |
|
|
|
|
| 363 |
base_model = AutoModelForCausalLM.from_pretrained(
|
| 364 |
config.base_model_name_or_path,
|
| 365 |
torch_dtype=torch.float16,
|
| 366 |
device_map="auto"
|
| 367 |
)
|
| 368 |
|
|
|
|
| 369 |
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
|
|
|
|
|
|
|
| 370 |
model = PeftModel.from_pretrained(base_model, peft_model_id)
|
|
|
|
|
|
|
| 371 |
model.eval()
|
| 372 |
|
|
|
|
| 373 |
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
|
| 374 |
|
| 375 |
+
prompt = "Explain the P vs NP problem"
|
|
|
|
| 376 |
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
| 377 |
|
| 378 |
print("Streaming Response:")
|
|
|
|
| 388 |
```
|
| 389 |
|
| 390 |
### Deployment Options
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 391 |
|
| 392 |
+
- **Transformers**: Python, PyTorch integration
|
| 393 |
+
- **vLLM**: High-throughput inference server
|
| 394 |
+
- **Ollama**: Easy local deployment (20GB model size)
|
| 395 |
+
- **169Pi API**: Production-ready hosted inference
|
| 396 |
|
| 397 |
+
---
|
|
|
|
|
|
|
| 398 |
|
| 399 |
+
## Citation
|
| 400 |
|
| 401 |
```bibtex
|
| 402 |
@misc{169pi2025alpiecore,
|
|
|
|
| 407 |
}
|
| 408 |
```
|
| 409 |
|
| 410 |
+
---
|
| 411 |
|
| 412 |
+
## Community & Contributions
|
| 413 |
|
| 414 |
+
Released under Apache 2.0 - we welcome the community to build, extend, and improve!
|
| 415 |
|
| 416 |
+
1. **Issues & Discussions**: Report bugs or suggest features on Hugging Face
|
| 417 |
+
2. **Contributions**: Pull requests welcome for improvements
|
| 418 |
+
3. **Share Results**: Post your fine-tuning experiments and benchmarks
|
| 419 |
+
4. **Collaborate**: Join us in shaping the future of efficient AI
|
| 420 |
|
| 421 |
+
---
|
| 422 |
|
| 423 |
+
## License
|
| 424 |
|
| 425 |
+
**Apache 2.0 License** – Permissive for research and commercial use
|
| 426 |
|
| 427 |
+
---
|
| 428 |
|
| 429 |
+
## Acknowledgements
|
| 430 |
|
| 431 |
+
Thanks to **DeepSeek** for the original model foundation. We also acknowledge:
|
| 432 |
|
| 433 |
+
- **Hugging Face** ecosystem (Transformers, PEFT, vLLM, bitsandbytes)
|
| 434 |
+
- Open-source datasets (MMLU, GSM8K, SWE-Bench, etc.)
|
| 435 |
+
- Cloud infrastructure providers
|
| 436 |
+
- The broader AI research community
|
| 437 |
|
| 438 |
+
---
|
| 439 |
|
| 440 |
+
## Contact
|
| 441 |
|
| 442 |
+
**Technical Support**: support@169pi.com
|
| 443 |
|
| 444 |
---
|
| 445 |
+
|
| 446 |
+
*Alpie Core represents a milestone for open-source AI from India, demonstrating that 4-bit reasoning models can rival frontier-scale systems. We hope this release empowers developers, researchers, and organizations worldwide to build more efficient, inclusive, and impactful AI.*
|
| 447 |
+
|
| 448 |
+
**Get started today with 5 million free tokens at [169pi.ai](https://169pi.ai/)**
|