deepanshupillm commited on
Commit
e37f6af
·
verified ·
1 Parent(s): 8676821

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +251 -193
README.md CHANGED
@@ -17,9 +17,8 @@ language:
17
  library_name: transformers
18
  pipeline_tag: text-generation
19
  ---
20
- # Alpie Core: 4-bit Quantized Reasoning Model
21
 
22
- 📄 **[Technical Report: Alpie Core.pdf](./Alpie_Core.pdf)**
23
 
24
  <p align="center">
25
  <a href="https://169pi.ai/"><img src="https://img.shields.io/badge/🌐%20Website-169Pi%20AI-blue" alt="Website"></a>
@@ -29,73 +28,178 @@ pipeline_tag: text-generation
29
  <a href="https://x.com/169Pi_ai"><img src="https://img.shields.io/badge/X-169Pi%20AI-black" alt="X"></a>
30
  </p>
31
 
32
- ## 1. Introduction
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
- **Alpie Core is one of the first fine-tuned 4-bit reasoning models from India, and among one of the first worldwide at this scale.** Trained on just 8 Hopper GPUs using LoRA for parameter-efficient fine-tuning, combined with QLoRA 4-bit quantization, and synthetic STEM-rich dataset distillation, it proves that aggressive quantization can not only match but also surpass full-precision baselines.
 
35
 
36
- With a dramatically reduced memory footprint, Alpie Core delivers competitive, frontier-level reasoning performance, even beating some top proprietary models. It achieves **81.28% on MMLU, 92.75% on GSM8K, and 57.8% on SWE-Bench Verified**, ranking top globally on competitive leaderboards, a demonstration that efficient models can rival frontier systems while remaining practical for real-world deployment at scale.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
  ![Bench](https://cdn-uploads.huggingface.co/production/uploads/66e2f8a815879154e1f9e023/i2SOWOOHdsTx5RajIkyrE.png)
39
 
40
- ## 2. Model Summary
 
 
41
 
42
  - **Base Architecture**: DeepSeek-R1-Distill-Qwen-32B
43
  - **Parameters**: 32 billion (quantized to 4-bit)
44
- - **Training Method**: Supervised Fine-Tuning (SFT) using LoRA/QLoRA techniques
45
  - **Quantization**: 4-bit NF4 with double quantization
46
  - **Context Length**: 65k tokens
47
  - **Max Output Length**: 16,384 tokens
48
- - **Training Data Sources:** Synthetic (STEM, reasoning, coding) + domain-rich curated data (law, Indian context, exams, multilingual).
49
  - **License**: Apache 2.0
50
 
51
- ## 3. Approach
52
-
53
- **Alpie Core** has undergone extensive **supervised fine-tuning (SFT)** to strengthen reasoning, robustness, and safety. The training leveraged a diverse mixture of curated open-source datasets and proprietary synthetic data, optimised with high-quality LLM-generated responses. The fine-tuning process emphasised adherence to rigorous safety and usability standards, including:
54
 
55
- 1. **User Understanding and Clarity** – ensuring outputs are direct, interpretable, and pedagogically sound.
56
 
57
- 2. **Security and Ethical Guidelines** filtering unsafe or harmful generations during and after training.
58
 
59
- 3. **Limitations, Disclaimers, and Knowledge Boundaries** – transparently communicating uncertainty and scope.
 
 
 
 
 
60
 
61
- 4. **Handling Complex and Sensitive Topics** balancing informativeness with responsible guardrails.
62
 
63
- 5. **Safety and Respectful Engagement** – maintaining politeness, inclusivity, and cultural sensitivity.
64
 
65
- 6. **Confidentiality and Responsible Use** – preventing leakage of private training data, proprietary prompts, or internal reasoning traces.
66
 
67
- This SFT approach enables Alpie Core to deliver reliable, aligned, and context-aware responses while maintaining safety across a broad range of use cases. This approach allows Alpie Core to generalize across global and Indian contexts while staying aligned to safe and responsible use guidelines.
 
 
 
 
 
 
 
 
 
 
68
 
69
- ## 4. Model Features
70
 
71
- 1. **Supports Streaming** – Real-time token-level responses
72
- 2. **OpenAI-Compatible API** – Seamless integration with OpenAI client libraries
73
- 3. **65K Context Length** – Handles very large inputs and conversations
74
- 4. **16,384 Max Output Length** – Enables extremely long generations
75
- 5. **4-Bit Quantization** – Memory-efficient and optimised for deployment
76
- 6. **High Throughput Inference** – Powered by vLLM for efficient large-scale serving
77
- 7. **Low Latency Inference** – Fast response times optimized for production
78
- 8. **Customizable Safety & Moderation Filters** – Built-in guardrails for safer outputs
79
- 9. **Supports Function Calling / Tool Use** – Enables structured outputs and external API integration
80
- 10. **Instruction Following** – Optimised for reasoning and chain-of-thought stepwise answers.
81
- 11. **Education & Research Ready** – Tailored for competitive exams, STEM reasoning, and knowledge-intensive tasks.
82
 
83
- ## 5. Key Highlights
 
 
 
 
 
 
84
 
85
- 1. **First 4-bit Reasoning Model from India**: Competitive globally with frontier models
86
- 2. **Benchmark Competitiveness**: Outperforms or matches 70B+ models across reasoning, math, and coding
87
- 3. **STEM & Coding Strength**: Excellent on GSM8K, MATH-500, HumanEval, SWE-Bench Verified
88
- 4. **Efficiency & Deployment**: 16 GB VRAM footprint, runs on commodity GPUs with vLLM
89
- 5. **Extended Context Length**: 65K tokens for research papers, conversations, multi-document reasoning
90
- 6. **Environmental Benefits**: ~298–835 kg CO₂e, 2–3× more efficient than FP16 training
91
- 7. **Open-Source Commitment**: Released under Apache 2.0 for global use
92
 
93
- ## 6. Benchmark Results
94
 
95
  ![Combined Benchmark](combined_benchmark.png)
96
 
97
- | Benchmark | Alpie Core (32B-4bit) | DeepSeek-V2 (236B) | Qwen2.5 72B | Llama 3.1 405B | Llama 3.1 70B | Gemma-3 27B-PT | Mistral-Small-24B-Base-2501 |
98
- |-----------|----------------------|-------------------|-------------|---------------|---------------|----------------|----------------------------|
 
 
99
  | MMLU (5-shot) | **81.28%** | 78.4% | 85.0% | 84.4% | 79.3% | 78.6% | 80.73% |
100
  | GSM8K (8-shot) | **92.75%** | 81.6% | 88.3% | 83.5% | - | 82.2% | 80.73% |
101
  | BBH (3-shot) | **85.12%** | 78.8% | 79.8% | 82.9% | 81.6% | 77.7% | - |
@@ -103,39 +207,36 @@ This SFT approach enables Alpie Core to deliver reliable, aligned, and context-a
103
  | MBPP (pass@1) | **75.20%** | 65.0% | 72.6% | 68.4% | - | 65.6% | 69.64% |
104
  | HumanEval (pass@1) | **57.23%** | 43.3% | 53.0% | 54.9% | - | 48.8% | - |
105
 
106
- These results demonstrate Alpie Core's ability to rival or surpass leading proprietary and open-source models, despite being 4-bit quantized.
107
-
108
- ### SWE-Bench Verified Performance
109
-
110
- | Rank | Model | Accuracy (%) | Performance vs Alpie |
111
- |------|-------|-------------|---------------------|
112
- | **1** | **Alpie Core** | **57.8** | **Alpie** |
113
- | 2 | Qwen3-Coder-30B-A3B-Instruct | 51.6 | Below Alpie |
114
- | 3 | o1 | 48.9 | Below Alpie |
115
- | 4 | o3-mini (high) | 49.3 | Below Alpie |
116
- | 5 | Claude 3.5 Sonnet | 49.0 | Below Alpie |
117
- | 6 | DeepSeek R1 | 49.2 | Below Alpie |
118
- | 7 | Devstral | 46.8 | Below Alpie |
119
-
120
- ### Humanity's Last Exam Leaderboard Performance
121
-
122
- | Rank | Model | Accuracy (%) | Performance vs Alpie |
123
- |------|-------|-------------|---------------------|
124
- | 1 | GPT 4.5 Preview | 5.8 | Above Alpie |
125
- | 2 | Claude Sonnet 4 | 5.42 | Above Alpie |
126
- | **3** | **Alpie Core 32B (4-bit)** | **5.41** | **Alpie** |
127
- | 4 | Llama 4 Maverik | 5.34 | Below Alpie |
128
- | 5 | GPT 4.1 | 4.97 | Below Alpie |
129
- | 6 | Kimi K2 Instruct | 4.68 | Below Alpie |
130
- | 7 | DeepSeek V3 | 4.55 | Below Alpie |
131
- | 8 | Gemini 1.5 Pro 002 | 4.55 | Below Alpie |
132
 
133
  ![Humanity's Last Exam](HLE.png)
134
 
135
  ### Additional Benchmarks
136
 
137
- | Benchmark | Alpie Core (32B-4bit) | Category |
138
- |-----------|----------------------|----------|
139
  | AIME | **47.34%** | Advanced Mathematics |
140
  | GPQA (Diamond) | **40.91%** | Graduate-level QA |
141
  | TruthfulQA (MC2) | **60.05%** | Truthfulness |
@@ -149,173 +250,129 @@ These results demonstrate Alpie Core's ability to rival or surpass leading propr
149
 
150
  ![AIME Benchmark](AIME.png)
151
 
152
- ## 7. Training Details
 
 
153
 
154
- - **Hardware**: 8× NVIDIA HOPPER-80GB GPUs
155
- - **Fine-tuning Method**: LoRA/QLoRA with the following configuration:
156
  - LoRA Alpha: 16
157
  - LoRA Dropout: 0.05
158
  - LoRA Rank: 16
159
  - **Quantization**: 4-bit NF4 + Double Quantization + FP16 compute
160
- - **Dataset Domains**: Mathematics, coding, reasoning, science, general knowledge, competitive exams, Indian context + law, multilingual (Hindi and Hinglish)
161
- - **Synthetic Data Advantage**: +15-20% performance boost in STEM & coding domains
162
- - **Training Strategy**: Multi-stage distillation → SFT → safety alignment.
163
- - **Synthetic Data Source**: LLM-generated, curated with multi-turn reasoning traces for STEM/coding.
 
 
164
 
165
- ## 8. Environmental Impact
166
 
167
  ![Carbon Footprint](carbon_footprint.png)
168
 
169
- **Carbon Footprint**: We estimated the environmental impact of training Alpie Core (32B) on 8× NVIDIA H100-80GB GPUs by calculating carbon emissions from GPU energy consumption. The calculation follows the formula:
170
 
171
- CO₂e (kg) = Grid CO₂ Factor (kg/kWh) × Runtime (hours) × Power per GPU (kW) × Number of GPUs
172
 
173
- Training Parameters:
174
- - Grid CO₂ Factor (Azure average): 0.364 kg CO₂e per kWh
175
  - Runtime: 408 hours
176
  - GPUs: 8× H100-80GB
177
 
178
- We report results under two assumption modes:
179
-
180
- **Realistic mode** (average training draw ≈ 250 W per GPU = 0.25 kWh/hr): 0.364 × 408 × 0.25 × 8 ≈ **298 kg CO₂e**
181
-
182
- **Conservative mode** (near TDP ≈ 700 W per GPU = 0.70 kWh/hr): 0.364 × 408 × 0.70 × 8 ≈ **835 kg CO₂e**
183
-
184
- Total training footprint ranges from ~298 kg CO₂e (realistic) to ~835 kg CO₂e (conservative worst-case)
185
 
186
  *This makes Alpie Core one of the most carbon-efficient reasoning models released to date.*
187
 
188
- ## 9. Use Cases
189
-
190
- Best for **STEM**, **complex mathematical reasoning**, **coding**, and **Indian context**
191
-
192
- 1. **STEM**: Excels at solving advanced problems in science, technology, engineering, and mathematics with high accuracy.
193
 
194
- 2. **Complex Mathematical Reasoning**: Handles multi-step logical and quantitative reasoning tasks with strong reliability.
195
 
196
- 3. **Coding**: Supports software development, debugging, algorithmic problem-solving, and structured reasoning in code.
197
 
198
- 4. **Indian Context**: Provides culturally aware insights, competitive exam assistance (JEE, NEET, UPSC), and multilingual support in Hindi/Hinglish.
 
 
 
 
199
 
200
- 5. **Research Assistants**: Handle long contexts (65K) for academic and legal research.
201
 
202
- ## 10. Safety and Limitations
203
 
204
  ### Enhanced Content Access
205
- Unlike the base DeepSeek model, Alpie Core provides factual, balanced responses to geopolitically sensitive questions, offering global accessibility and factual accuracy on topics like Taiwan's status, Arunachal Pradesh sovereignty, and other sensitive geopolitical issues.
 
206
 
207
  ### Current Limitations
 
208
  - Multilingual reasoning in Hindi/Hinglish shows room for improvement
209
  - Fixed knowledge cutoff without real-time information retrieval
210
  - Occasional struggles with complex multi-hop mathematical reasoning
211
  - Potential hallucinations in factual question-answering
212
- - Hallucinations: As with all LLMs, outputs should not be used for medical/legal advice without expert oversight.
213
- - Biases: Training on synthetic + curated datasets reduces bias, but some risks may persist.
214
 
215
  ### Mitigations
 
216
  - Safety classifiers and output filtering systems
217
  - Model-assisted safety pipeline using RLHF
218
  - Comprehensive adversarial testing by domain experts
219
 
220
- ## 11. Quick Start with SDK
221
 
222
- Access Alpie Core easily through our **official Python SDK (`pi169`)** for seamless API integration and CLI usage.
223
 
224
  ```bash
225
- # Install the SDK
226
  pip install pi169
227
 
228
- # Set your API key
229
  export ALPIE_API_KEY="your_key_here"
230
 
231
- # Start using the CLI
232
- pi169 "Explain 4-bit quantization in simple terms"
233
  ```
234
 
235
  ### SDK Features
236
 
237
- - **CLI Integration** for quick command-line interactions
238
- - Streaming & Non-Streaming Chat Completions
239
- - **Async/Await Support** for high-performance concurrent requests
240
- - Clean, type-safe Python Interface (dataclasses, type hints)
241
- - Robust Error Handling with typed exceptions
242
- - Production-Ready Networking (retries, timeouts, httpx)
243
- - Fully Tested with pytest
244
- - Optimized for Reasoning Models
245
- - **OpenAI-Compatible Client**: Drop-in replacement for OpenAI SDK with full compatibility
246
-
247
- For more details, visit the [pi169 PyPI package](https://pypi.org/project/pi169/0.1/).
248
 
249
- ## 12. How to Use
250
-
251
- ### Non-Streaming Inference
252
- ```python
253
- from transformers import AutoModelForCausalLM, AutoTokenizer
254
- from peft import PeftModel, PeftConfig
255
- import torch
256
-
257
- # Load LoRA adapter configuration to find the base model
258
- peft_model_id = "169Pi/Alpie-Core"
259
- config = PeftConfig.from_pretrained(peft_model_id)
260
 
261
- # Load the base model
262
- base_model = AutoModelForCausalLM.from_pretrained(
263
- config.base_model_name_or_path,
264
- torch_dtype=torch.float16,
265
- device_map="auto"
266
- )
267
-
268
- # Load tokenizer
269
- tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
270
-
271
- # Load LoRA weights
272
- model = PeftModel.from_pretrained(base_model, peft_model_id)
273
-
274
- # Ensure evaluation mode
275
- model.eval()
276
-
277
- # Sample inference
278
- prompt = "Solve the Riemann Hypothesis and provide a final answer?"
279
- inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
280
 
281
- with torch.no_grad():
282
- outputs = model.generate(**inputs, max_new_tokens=1000)
283
- response = tokenizer.decode(outputs[0], skip_special_tokens=True)
284
 
285
- print("Response:\n", response)
286
- ```
287
 
288
- ### Streaming Inference
289
  ```python
290
  from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
291
  from peft import PeftModel, PeftConfig
292
  import torch
293
 
294
- # Load LoRA adapter configuration to find the base model
295
  peft_model_id = "169Pi/Alpie-Core"
296
  config = PeftConfig.from_pretrained(peft_model_id)
297
 
298
- # Load the base model
299
  base_model = AutoModelForCausalLM.from_pretrained(
300
  config.base_model_name_or_path,
301
  torch_dtype=torch.float16,
302
  device_map="auto"
303
  )
304
 
305
- # Load tokenizer
306
  tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
307
-
308
- # Load LoRA weights
309
  model = PeftModel.from_pretrained(base_model, peft_model_id)
310
-
311
- # Ensure evaluation mode
312
  model.eval()
313
 
314
- # Initialize streamer
315
  streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
316
 
317
- # Sample streaming inference
318
- prompt = "Solve the Riemann Hypothesis and provide a final answer?"
319
  inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
320
 
321
  print("Streaming Response:")
@@ -331,22 +388,15 @@ with torch.no_grad():
331
  ```
332
 
333
  ### Deployment Options
334
- - **Transformers**: Python, PyTorch integration
335
- - **vLLM**: High-throughput inference
336
- - **Ollama**: Easy local deployment and inference
337
- - **Size**: 20GB
338
- - **Requirements**: Minimum 20GB RAM/VRAM for local execution
339
- - **Local Deployment**: Runs efficiently on local machines with sufficient resources
340
 
341
- ```bash
342
- # Pull the model
343
- ollama pull 169pi/alpie-core
 
344
 
345
- # Run the model
346
- ollama run 169pi/alpie-core
347
- ```
348
 
349
- ## 13. Citation
350
 
351
  ```bibtex
352
  @misc{169pi2025alpiecore,
@@ -357,34 +407,42 @@ ollama run 169pi/alpie-core
357
  }
358
  ```
359
 
360
- ## 14. Community & Contributions
361
 
362
- This model is released under the Apache 2.0 license, and we warmly welcome the community to build, download, and extend it.
363
 
364
- 1. **Issues & Discussions:** Report bugs, suggest features, or start conversations on the Hugging Face model page.
365
 
366
- 2. **Contributions:** Pull requests are welcome for error fixes, performance improvements, and extended functionality.
 
 
 
367
 
368
- 3. **Fine-tuning Results:** Share your experiments, benchmarks, and downstream applications with the community.
369
 
370
- 4. **Collaboration:** We encourage researchers, developers, and organisations to join in shaping the future of this model.
371
 
372
- Together, we can continue to improve accessibility, safety, and performance for real-world AI applications.
373
 
374
- ## 15. License
375
 
376
- Apache 2.0 License – Permissive, allowing free use, modification, and distribution for both research and commercial purposes.
377
 
378
- ## 16. Acknowledgements / Credits
379
 
380
- We would like to thank DeepSeek for their original model, which served as the foundation for this work. Our team fine-tuned the model and implemented 4-bit quantization, achieving improved efficiency and accuracy for downstream tasks. This model is built with respect to the contributions of the original authors and aims to provide a safe, high-performance solution for reasoning and inference.
 
 
 
381
 
382
- We are also grateful to the Hugging Face ecosystem (Transformers, PEFT, vLLM, bitsandbytes), the open-source community datasets (MMLU, GSM8K, SWE-Bench, and others), and the support of various cloud providers. Finally, we acknowledge the broader AI research community and companies whose innovations and insights continue to inspire our work.
383
 
384
- ## 17. Contact
385
 
386
- For technical inquiries and support: **support@169pi.com**
387
 
388
  ---
389
- Alpie Core represents a milestone for open-source AI from India, one of the first globally to show that 4-bit reasoning models can rival frontier-scale systems. We hope this release empowers developers, researchers, and organisations worldwide to build more efficient, inclusive, and impactful AI.
390
- *For technical details, training methodology, and comprehensive evaluation results, please refer to our technical report.*
 
 
 
17
  library_name: transformers
18
  pipeline_tag: text-generation
19
  ---
 
20
 
21
+ # Alpie Core: 4-bit Quantized Reasoning Model
22
 
23
  <p align="center">
24
  <a href="https://169pi.ai/"><img src="https://img.shields.io/badge/🌐%20Website-169Pi%20AI-blue" alt="Website"></a>
 
28
  <a href="https://x.com/169Pi_ai"><img src="https://img.shields.io/badge/X-169Pi%20AI-black" alt="X"></a>
29
  </p>
30
 
31
+ ## TL;DR
32
+
33
+ - **32B reasoning model**, trained & served at **4-bit quantization**
34
+ - **Competitive with GPT-4o / Claude 3.5 Sonnet** on reasoning & coding benchmarks
35
+ - **65K context length** for long-document reasoning
36
+ - **Open source** (Apache 2.0) - fully permissive for commercial use
37
+ - Available via **Ollama**, **Hugging Face**, and **hosted API** with 5M free tokens
38
+
39
+ 📄 **[Technical Report: Alpie Core.pdf](./Alpie_Core.pdf)**
40
+
41
+ ---
42
+
43
+ ## How to Use Alpie Core
44
+
45
+ ### Option 1: Local Inference with Ollama (Recommended for Quick Start)
46
+
47
+ ```bash
48
+ # Pull the model (20GB)
49
+ ollama pull 169pi/alpie-core
50
+
51
+ # Run inference
52
+ ollama run 169pi/alpie-core
53
+ ```
54
+
55
+ **Requirements**: 20GB RAM/VRAM minimum
56
+
57
+ ### Option 2: Hosted Inference via 169Pi API
58
+
59
+ Get started instantly with our **hosted API** - no setup required!
60
+
61
+ **Get your first free API key** including **5 million tokens** to test real workloads
62
+
63
+ - **OpenAI-compatible** - drop-in replacement for OpenAI SDK
64
+ - Supports **streaming**, **async**, and **long-context reasoning**
65
+ - Production-ready with low latency
66
+
67
+ **[Get your API key at 169pi.ai](https://169pi.ai/)**
68
+
69
+ ### Option 3: Programmatic Access with Python SDK
70
+
71
+ ```bash
72
+ # Install the official SDK
73
+ pip install pi169
74
+
75
+ # Set your API key
76
+ export ALPIE_API_KEY="your_key_here"
77
+
78
+ # Use via CLI
79
+ pi169 "Explain quantum entanglement"
80
 
81
+ # Or use in Python
82
+ from pi169 import AlpieClient
83
 
84
+ client = AlpieClient(api_key="your_key_here")
85
+ response = client.chat.completions.create(
86
+ model="alpie-core",
87
+ messages=[{"role": "user", "content": "Solve this coding problem..."}],
88
+ stream=True
89
+ )
90
+ ```
91
+
92
+ **SDK Features**: Streaming, async/await, OpenAI compatibility, type-safe interface
93
+
94
+ ### Option 4: Load Directly with Transformers (Advanced)
95
+
96
+ ```python
97
+ from transformers import AutoModelForCausalLM, AutoTokenizer
98
+ from peft import PeftModel, PeftConfig
99
+ import torch
100
+
101
+ # Load LoRA adapter configuration
102
+ peft_model_id = "169Pi/Alpie-Core"
103
+ config = PeftConfig.from_pretrained(peft_model_id)
104
+
105
+ # Load base model + LoRA weights
106
+ base_model = AutoModelForCausalLM.from_pretrained(
107
+ config.base_model_name_or_path,
108
+ torch_dtype=torch.float16,
109
+ device_map="auto"
110
+ )
111
+ tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
112
+ model = PeftModel.from_pretrained(base_model, peft_model_id)
113
+
114
+ # Inference
115
+ prompt = "Solve: What is the integral of x^2?"
116
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
117
+ outputs = model.generate(**inputs, max_new_tokens=1000)
118
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
119
+ ```
120
+
121
+ ---
122
+
123
+ ## Why Alpie Core?
124
+
125
+ **Alpie Core is one of the first fine-tuned 4-bit reasoning models from India, and among the first worldwide at this scale.** Trained on just 8 Hopper GPUs using LoRA and QLoRA 4-bit quantization with synthetic STEM-rich datasets, it proves that aggressive quantization can match and even surpass full-precision baselines.
126
+
127
+ With a dramatically reduced memory footprint, Alpie Core delivers competitive, frontier-level reasoning performance, even beating top proprietary models. It achieves:
128
+
129
+ - **81.28% on MMLU** (5-shot)
130
+ - **92.75% on GSM8K** (8-shot)
131
+ - **57.8% on SWE-Bench Verified** (ranked #1 globally)
132
+
133
+ This demonstrates that efficient models can rival frontier systems while remaining practical for real-world deployment at scale.
134
 
135
  ![Bench](https://cdn-uploads.huggingface.co/production/uploads/66e2f8a815879154e1f9e023/i2SOWOOHdsTx5RajIkyrE.png)
136
 
137
+ ---
138
+
139
+ ## Model Summary
140
 
141
  - **Base Architecture**: DeepSeek-R1-Distill-Qwen-32B
142
  - **Parameters**: 32 billion (quantized to 4-bit)
143
+ - **Training Method**: Supervised Fine-Tuning (SFT) using LoRA/QLoRA
144
  - **Quantization**: 4-bit NF4 with double quantization
145
  - **Context Length**: 65k tokens
146
  - **Max Output Length**: 16,384 tokens
147
+ - **Training Data**: Synthetic (STEM, reasoning, coding) + curated data (law, Indian context, exams, multilingual)
148
  - **License**: Apache 2.0
149
 
150
+ ---
 
 
151
 
152
+ ## Approach
153
 
154
+ **Alpie Core** underwent extensive **supervised fine-tuning (SFT)** to strengthen reasoning, robustness, and safety. The training leveraged a diverse mixture of curated open-source datasets and proprietary synthetic data, optimized with high-quality LLM-generated responses. The fine-tuning process emphasized:
155
 
156
+ 1. **User Understanding and Clarity** – ensuring outputs are direct, interpretable, and pedagogically sound
157
+ 2. **Security and Ethical Guidelines** – filtering unsafe or harmful generations
158
+ 3. **Limitations and Knowledge Boundaries** – transparently communicating uncertainty
159
+ 4. **Handling Complex and Sensitive Topics** – balancing informativeness with responsible guardrails
160
+ 5. **Safety and Respectful Engagement** – maintaining politeness, inclusivity, and cultural sensitivity
161
+ 6. **Confidentiality and Responsible Use** – preventing leakage of private data or internal reasoning traces
162
 
163
+ This approach enables Alpie Core to deliver reliable, aligned, and context-aware responses while maintaining safety across a broad range of use cases, generalizing across global and Indian contexts.
164
 
165
+ ---
166
 
167
+ ## Model Features
168
 
169
+ 1. **Supports Streaming** Real-time token-level responses
170
+ 2. **OpenAI-Compatible API** – Seamless integration with OpenAI client libraries
171
+ 3. **65K Context Length** – Handles very large inputs and conversations
172
+ 4. **16,384 Max Output Length** – Enables extremely long generations
173
+ 5. **4-Bit Quantization** – Memory-efficient and optimized for deployment
174
+ 6. **High Throughput Inference** – Powered by vLLM for efficient large-scale serving
175
+ 7. **Low Latency Inference** – Fast response times optimized for production
176
+ 8. **Customizable Safety & Moderation** – Built-in guardrails for safer outputs
177
+ 9. **Supports Function Calling / Tool Use** – Structured outputs and external API integration
178
+ 10. **Instruction Following** – Optimized for reasoning and chain-of-thought answers
179
+ 11. **Education & Research Ready** – Tailored for competitive exams, STEM reasoning, and knowledge tasks
180
 
181
+ ---
182
 
183
+ ## Key Highlights
 
 
 
 
 
 
 
 
 
 
184
 
185
+ 1. **First 4-bit Reasoning Model from India**: Competitive globally with frontier models
186
+ 2. **Benchmark Competitiveness**: Outperforms or matches 70B+ models across reasoning, math, and coding
187
+ 3. **STEM & Coding Strength**: Excellent on GSM8K, MATH-500, HumanEval, SWE-Bench Verified
188
+ 4. **Efficiency & Deployment**: 16 GB VRAM footprint, runs on commodity GPUs
189
+ 5. **Extended Context Length**: 65K tokens for research papers, multi-document reasoning
190
+ 6. **Environmental Benefits**: ~298–835 kg CO₂e, 2–3× more efficient than FP16 training
191
+ 7. **Open-Source Commitment**: Released under Apache 2.0 for global use
192
 
193
+ ---
 
 
 
 
 
 
194
 
195
+ ## Benchmark Results
196
 
197
  ![Combined Benchmark](combined_benchmark.png)
198
 
199
+ ### Core Benchmarks
200
+
201
+ | Benchmark | Alpie Core (32B-4bit) | DeepSeek-V2 (236B) | Qwen2.5 72B | Llama 3.1 405B | Llama 3.1 70B | Gemma-3 27B-PT | Mistral-Small-24B |
202
+ |-----------|----------------------|-------------------|-------------|---------------|---------------|----------------|-------------------|
203
  | MMLU (5-shot) | **81.28%** | 78.4% | 85.0% | 84.4% | 79.3% | 78.6% | 80.73% |
204
  | GSM8K (8-shot) | **92.75%** | 81.6% | 88.3% | 83.5% | - | 82.2% | 80.73% |
205
  | BBH (3-shot) | **85.12%** | 78.8% | 79.8% | 82.9% | 81.6% | 77.7% | - |
 
207
  | MBPP (pass@1) | **75.20%** | 65.0% | 72.6% | 68.4% | - | 65.6% | 69.64% |
208
  | HumanEval (pass@1) | **57.23%** | 43.3% | 53.0% | 54.9% | - | 48.8% | - |
209
 
210
+ ### SWE-Bench Verified Performance (#1 Globally)
211
+
212
+ | Rank | Model | Accuracy (%) | vs Alpie |
213
+ |------|-------|-------------|----------|
214
+ | **1** | **Alpie Core** | **57.8** | **—** |
215
+ | 2 | Qwen3-Coder-30B-A3B-Instruct | 51.6 | -6.2% |
216
+ | 3 | o1 | 48.9 | -8.9% |
217
+ | 4 | o3-mini (high) | 49.3 | -8.5% |
218
+ | 5 | Claude 3.5 Sonnet | 49.0 | -8.8% |
219
+ | 6 | DeepSeek R1 | 49.2 | -8.6% |
220
+ | 7 | Devstral | 46.8 | -11.0% |
221
+
222
+ ### Humanity's Last Exam Leaderboard (#3 Globally)
223
+
224
+ | Rank | Model | Accuracy (%) | vs Alpie |
225
+ |------|-------|-------------|----------|
226
+ | 1 | GPT 4.5 Preview | 5.8 | +0.39% |
227
+ | 2 | Claude Sonnet 4 | 5.42 | +0.01% |
228
+ | **3** | **Alpie Core 32B (4-bit)** | **5.41** | **—** |
229
+ | 4 | Llama 4 Maverik | 5.34 | -0.07% |
230
+ | 5 | GPT 4.1 | 4.97 | -0.44% |
231
+ | 6 | Kimi K2 Instruct | 4.68 | -0.73% |
232
+ | 7 | DeepSeek V3 | 4.55 | -0.86% |
 
 
 
233
 
234
  ![Humanity's Last Exam](HLE.png)
235
 
236
  ### Additional Benchmarks
237
 
238
+ | Benchmark | Alpie Core | Category |
239
+ |-----------|-----------|----------|
240
  | AIME | **47.34%** | Advanced Mathematics |
241
  | GPQA (Diamond) | **40.91%** | Graduate-level QA |
242
  | TruthfulQA (MC2) | **60.05%** | Truthfulness |
 
250
 
251
  ![AIME Benchmark](AIME.png)
252
 
253
+ ---
254
+
255
+ ## Training Details
256
 
257
+ - **Hardware**: 8× NVIDIA H100-80GB GPUs
258
+ - **Fine-tuning Method**: LoRA/QLoRA
259
  - LoRA Alpha: 16
260
  - LoRA Dropout: 0.05
261
  - LoRA Rank: 16
262
  - **Quantization**: 4-bit NF4 + Double Quantization + FP16 compute
263
+ - **Dataset Domains**: Mathematics, coding, reasoning, science, competitive exams, Indian context + law, multilingual (Hindi/Hinglish)
264
+ - **Synthetic Data Advantage**: +15-20% performance boost in STEM & coding
265
+ - **Training Strategy**: Multi-stage distillation → SFT → safety alignment
266
+ - **Total Training Time**: 408 hours
267
+
268
+ ---
269
 
270
+ ## Environmental Impact
271
 
272
  ![Carbon Footprint](carbon_footprint.png)
273
 
274
+ We estimated the carbon footprint of training Alpie Core on 8× NVIDIA H100-80GB GPUs:
275
 
276
+ **Formula**: CO₂e (kg) = Grid CO₂ Factor × Runtime × Power per GPU × Number of GPUs
277
 
278
+ **Training Parameters**:
279
+ - Grid CO₂ Factor (Azure): 0.364 kg CO₂e/kWh
280
  - Runtime: 408 hours
281
  - GPUs: 8× H100-80GB
282
 
283
+ **Results**:
284
+ - **Realistic mode** (250W avg per GPU): **~298 kg CO₂e**
285
+ - **Conservative mode** (700W TDP per GPU): **~835 kg CO₂e**
 
 
 
 
286
 
287
  *This makes Alpie Core one of the most carbon-efficient reasoning models released to date.*
288
 
289
+ ---
 
 
 
 
290
 
291
+ ## Use Cases
292
 
293
+ Best for **STEM**, **complex mathematical reasoning**, **coding**, and **Indian context**
294
 
295
+ 1. **STEM Education**: Advanced problem-solving in science, technology, engineering, mathematics
296
+ 2. **Mathematical Reasoning**: Multi-step logical and quantitative reasoning
297
+ 3. **Software Development**: Code generation, debugging, algorithmic problem-solving
298
+ 4. **Indian Context**: Competitive exam assistance (JEE, NEET, UPSC), Hindi/Hinglish support
299
+ 5. **Research & Legal**: 65K context for academic papers, legal documents, long-form analysis
300
 
301
+ ---
302
 
303
+ ## Safety and Limitations
304
 
305
  ### Enhanced Content Access
306
+
307
+ Unlike the base DeepSeek model, Alpie Core provides factual, balanced responses to geopolitically sensitive questions, offering global accessibility on topics like Taiwan's status, Arunachal Pradesh sovereignty, and other sensitive issues.
308
 
309
  ### Current Limitations
310
+
311
  - Multilingual reasoning in Hindi/Hinglish shows room for improvement
312
  - Fixed knowledge cutoff without real-time information retrieval
313
  - Occasional struggles with complex multi-hop mathematical reasoning
314
  - Potential hallucinations in factual question-answering
315
+ - Should not be used for medical/legal advice without expert oversight
 
316
 
317
  ### Mitigations
318
+
319
  - Safety classifiers and output filtering systems
320
  - Model-assisted safety pipeline using RLHF
321
  - Comprehensive adversarial testing by domain experts
322
 
323
+ ---
324
 
325
+ ## Python SDK Quick Start
326
 
327
  ```bash
328
+ # Install
329
  pip install pi169
330
 
331
+ # Set API key
332
  export ALPIE_API_KEY="your_key_here"
333
 
334
+ # CLI usage
335
+ pi169 "Explain 4-bit quantization"
336
  ```
337
 
338
  ### SDK Features
339
 
340
+ - **CLI Integration** for quick interactions
341
+ - **Streaming & Non-Streaming** completions
342
+ - **Async/Await Support** for concurrent requests
343
+ - **Type-safe Interface** with dataclasses
344
+ - **Robust Error Handling**
345
+ - **OpenAI-Compatible**: Drop-in replacement
 
 
 
 
 
346
 
347
+ [Full SDK documentation on PyPI](https://pypi.org/project/pi169/0.1/)
 
 
 
 
 
 
 
 
 
 
348
 
349
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
350
 
351
+ ## Advanced Usage Examples
 
 
352
 
353
+ ### Streaming Inference with Transformers
 
354
 
 
355
  ```python
356
  from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
357
  from peft import PeftModel, PeftConfig
358
  import torch
359
 
 
360
  peft_model_id = "169Pi/Alpie-Core"
361
  config = PeftConfig.from_pretrained(peft_model_id)
362
 
 
363
  base_model = AutoModelForCausalLM.from_pretrained(
364
  config.base_model_name_or_path,
365
  torch_dtype=torch.float16,
366
  device_map="auto"
367
  )
368
 
 
369
  tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
 
 
370
  model = PeftModel.from_pretrained(base_model, peft_model_id)
 
 
371
  model.eval()
372
 
 
373
  streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
374
 
375
+ prompt = "Explain the P vs NP problem"
 
376
  inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
377
 
378
  print("Streaming Response:")
 
388
  ```
389
 
390
  ### Deployment Options
 
 
 
 
 
 
391
 
392
+ - **Transformers**: Python, PyTorch integration
393
+ - **vLLM**: High-throughput inference server
394
+ - **Ollama**: Easy local deployment (20GB model size)
395
+ - **169Pi API**: Production-ready hosted inference
396
 
397
+ ---
 
 
398
 
399
+ ## Citation
400
 
401
  ```bibtex
402
  @misc{169pi2025alpiecore,
 
407
  }
408
  ```
409
 
410
+ ---
411
 
412
+ ## Community & Contributions
413
 
414
+ Released under Apache 2.0 - we welcome the community to build, extend, and improve!
415
 
416
+ 1. **Issues & Discussions**: Report bugs or suggest features on Hugging Face
417
+ 2. **Contributions**: Pull requests welcome for improvements
418
+ 3. **Share Results**: Post your fine-tuning experiments and benchmarks
419
+ 4. **Collaborate**: Join us in shaping the future of efficient AI
420
 
421
+ ---
422
 
423
+ ## License
424
 
425
+ **Apache 2.0 License** Permissive for research and commercial use
426
 
427
+ ---
428
 
429
+ ## Acknowledgements
430
 
431
+ Thanks to **DeepSeek** for the original model foundation. We also acknowledge:
432
 
433
+ - **Hugging Face** ecosystem (Transformers, PEFT, vLLM, bitsandbytes)
434
+ - Open-source datasets (MMLU, GSM8K, SWE-Bench, etc.)
435
+ - Cloud infrastructure providers
436
+ - The broader AI research community
437
 
438
+ ---
439
 
440
+ ## Contact
441
 
442
+ **Technical Support**: support@169pi.com
443
 
444
  ---
445
+
446
+ *Alpie Core represents a milestone for open-source AI from India, demonstrating that 4-bit reasoning models can rival frontier-scale systems. We hope this release empowers developers, researchers, and organizations worldwide to build more efficient, inclusive, and impactful AI.*
447
+
448
+ **Get started today with 5 million free tokens at [169pi.ai](https://169pi.ai/)**