Instructions to use 169Pi/Alpie-Core with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use 169Pi/Alpie-Core with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="169Pi/Alpie-Core")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("169Pi/Alpie-Core")
model = AutoModelForCausalLM.from_pretrained("169Pi/Alpie-Core")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use 169Pi/Alpie-Core with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "169Pi/Alpie-Core"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "169Pi/Alpie-Core",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/169Pi/Alpie-Core

SGLang

How to use 169Pi/Alpie-Core with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "169Pi/Alpie-Core" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "169Pi/Alpie-Core",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "169Pi/Alpie-Core" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "169Pi/Alpie-Core",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use 169Pi/Alpie-Core with Docker Model Runner:
```
docker model run hf.co/169Pi/Alpie-Core
```

deepanshupillm commited on Jan 7

Commit

e37f6af

verified ·

1 Parent(s): 8676821

Update README.md

Browse files

Files changed (1) hide show

README.md +251 -193

README.md CHANGED Viewed

@@ -17,9 +17,8 @@ language:
 library_name: transformers
 pipeline_tag: text-generation
 ---
-# Alpie Core: 4-bit Quantized Reasoning Model
-📄 **[Technical Report: Alpie Core.pdf](./Alpie_Core.pdf)**
 <p align="center">
   <a href="https://169pi.ai/"><img src="https://img.shields.io/badge/🌐%20Website-169Pi%20AI-blue" alt="Website"></a>
@@ -29,73 +28,178 @@ pipeline_tag: text-generation
   <a href="https://x.com/169Pi_ai"><img src="https://img.shields.io/badge/X-169Pi%20AI-black" alt="X"></a>
 </p>
-## 1. Introduction
-**Alpie Core is one of the first fine-tuned 4-bit reasoning models from India, and among one of the first worldwide at this scale.** Trained on just 8 Hopper GPUs using LoRA for parameter-efficient fine-tuning, combined with QLoRA 4-bit quantization, and synthetic STEM-rich dataset distillation, it proves that aggressive quantization can not only match but also surpass full-precision baselines.
-With a dramatically reduced memory footprint, Alpie Core delivers competitive, frontier-level reasoning performance, even beating some top proprietary models. It achieves **81.28% on MMLU, 92.75% on GSM8K, and 57.8% on SWE-Bench Verified**, ranking top globally on competitive leaderboards, a demonstration that efficient models can rival frontier systems while remaining practical for real-world deployment at scale.
 ![Bench](https://cdn-uploads.huggingface.co/production/uploads/66e2f8a815879154e1f9e023/i2SOWOOHdsTx5RajIkyrE.png)
-## 2. Model Summary
 - **Base Architecture**: DeepSeek-R1-Distill-Qwen-32B
 - **Parameters**: 32 billion (quantized to 4-bit)
-- **Training Method**: Supervised Fine-Tuning (SFT) using LoRA/QLoRA techniques
 - **Quantization**: 4-bit NF4 with double quantization
 - **Context Length**: 65k tokens
 - **Max Output Length**: 16,384 tokens
-- **Training Data Sources:** Synthetic (STEM, reasoning, coding) + domain-rich curated data (law, Indian context, exams, multilingual).
 - **License**: Apache 2.0
-## 3. Approach
-**Alpie Core** has undergone extensive **supervised fine-tuning (SFT)** to strengthen reasoning, robustness, and safety. The training leveraged a diverse mixture of curated open-source datasets and proprietary synthetic data, optimised with high-quality LLM-generated responses. The fine-tuning process emphasised adherence to rigorous safety and usability standards, including:
-1. **User Understanding and Clarity** – ensuring outputs are direct, interpretable, and pedagogically sound.
-2. **Security and Ethical Guidelines** – filtering unsafe or harmful generations during and after training.
-3. **Limitations, Disclaimers, and Knowledge Boundaries** – transparently communicating uncertainty and scope.
-4. **Handling Complex and Sensitive Topics** – balancing informativeness with responsible guardrails.
-5. **Safety and Respectful Engagement** – maintaining politeness, inclusivity, and cultural sensitivity.
-6. **Confidentiality and Responsible Use** – preventing leakage of private training data, proprietary prompts, or internal reasoning traces.
-This SFT approach enables Alpie Core to deliver reliable, aligned, and context-aware responses while maintaining safety across a broad range of use cases. This approach allows Alpie Core to generalize across global and Indian contexts while staying aligned to safe and responsible use guidelines.
-## 4. Model Features
-1. **Supports Streaming** – Real-time token-level responses
-2. **OpenAI-Compatible API** – Seamless integration with OpenAI client libraries
-3. **65K Context Length** – Handles very large inputs and conversations
-4. **16,384 Max Output Length** – Enables extremely long generations
-5. **4-Bit Quantization** – Memory-efficient and optimised for deployment
-6. **High Throughput Inference** – Powered by vLLM for efficient large-scale serving
-7. **Low Latency Inference** – Fast response times optimized for production
-8. **Customizable Safety & Moderation Filters** – Built-in guardrails for safer outputs
-9. **Supports Function Calling / Tool Use** – Enables structured outputs and external API integration
-10. **Instruction Following** – Optimised for reasoning and chain-of-thought stepwise answers.
-11. **Education & Research Ready** – Tailored for competitive exams, STEM reasoning, and knowledge-intensive tasks.
-## 5. Key Highlights
-1. **First 4-bit Reasoning Model from India**: Competitive globally with frontier models
-2. **Benchmark Competitiveness**: Outperforms or matches 70B+ models across reasoning, math, and coding
-3. **STEM & Coding Strength**: Excellent on GSM8K, MATH-500, HumanEval, SWE-Bench Verified
-4. **Efficiency & Deployment**: 16 GB VRAM footprint, runs on commodity GPUs with vLLM
-5. **Extended Context Length**: 65K tokens for research papers, conversations, multi-document reasoning
-6. **Environmental Benefits**: ~298–835 kg CO₂e, 2–3× more efficient than FP16 training
-7. **Open-Source Commitment**: Released under Apache 2.0 for global use
-## 6. Benchmark Results
 ![Combined Benchmark](combined_benchmark.png)
-| Benchmark | Alpie Core (32B-4bit) | DeepSeek-V2 (236B) | Qwen2.5 72B | Llama 3.1 405B | Llama 3.1 70B | Gemma-3 27B-PT | Mistral-Small-24B-Base-2501 |
-|-----------|----------------------|-------------------|-------------|---------------|---------------|----------------|----------------------------|
 | MMLU (5-shot) | **81.28%** | 78.4% | 85.0% | 84.4% | 79.3% | 78.6% | 80.73% |
 | GSM8K (8-shot) | **92.75%** | 81.6% | 88.3% | 83.5% | - | 82.2% | 80.73% |
 | BBH (3-shot) | **85.12%** | 78.8% | 79.8% | 82.9% | 81.6% | 77.7% | - |
@@ -103,39 +207,36 @@ This SFT approach enables Alpie Core to deliver reliable, aligned, and context-a
 | MBPP (pass@1) | **75.20%** | 65.0% | 72.6% | 68.4% | - | 65.6% | 69.64% |
 | HumanEval (pass@1) | **57.23%** | 43.3% | 53.0% | 54.9% | - | 48.8% | - |
-These results demonstrate Alpie Core's ability to rival or surpass leading proprietary and open-source models, despite being 4-bit quantized.
-### SWE-Bench Verified Performance
-| Rank | Model | Accuracy (%) | Performance vs Alpie |
-|------|-------|-------------|---------------------|
-| **1** | **Alpie Core** | **57.8** | **Alpie** |
-| 2 | Qwen3-Coder-30B-A3B-Instruct | 51.6 | Below Alpie |
-| 3 | o1 | 48.9 | Below Alpie |
-| 4 | o3-mini (high) | 49.3 | Below Alpie |
-| 5 | Claude 3.5 Sonnet | 49.0 | Below Alpie |
-| 6 | DeepSeek R1 | 49.2 | Below Alpie |
-| 7 | Devstral | 46.8 | Below Alpie |
-### Humanity's Last Exam Leaderboard Performance
-| Rank | Model | Accuracy (%) | Performance vs Alpie |
-|------|-------|-------------|---------------------|
-| 1 | GPT 4.5 Preview | 5.8 | Above Alpie |
-| 2 | Claude Sonnet 4 | 5.42 | Above Alpie |
-| **3** | **Alpie Core 32B (4-bit)** | **5.41** | **Alpie** |
-| 4 | Llama 4 Maverik | 5.34 | Below Alpie |
-| 5 | GPT 4.1 | 4.97 | Below Alpie |
-| 6 | Kimi K2 Instruct | 4.68 | Below Alpie |
-| 7 | DeepSeek V3 | 4.55 | Below Alpie |
-| 8 | Gemini 1.5 Pro 002 | 4.55 | Below Alpie |
 ![Humanity's Last Exam](HLE.png)
 ### Additional Benchmarks
-| Benchmark | Alpie Core (32B-4bit) | Category |
-|-----------|----------------------|----------|
 | AIME | **47.34%** | Advanced Mathematics |
 | GPQA (Diamond) | **40.91%** | Graduate-level QA |
 | TruthfulQA (MC2) | **60.05%** | Truthfulness |
@@ -149,173 +250,129 @@ These results demonstrate Alpie Core's ability to rival or surpass leading propr
 ![AIME Benchmark](AIME.png)
-## 7. Training Details
-- **Hardware**: 8× NVIDIA HOPPER-80GB GPUs
-- **Fine-tuning Method**: LoRA/QLoRA with the following configuration:
   - LoRA Alpha: 16
   - LoRA Dropout: 0.05
   - LoRA Rank: 16
 - **Quantization**: 4-bit NF4 + Double Quantization + FP16 compute
-- **Dataset Domains**: Mathematics, coding, reasoning, science, general knowledge, competitive exams, Indian context + law, multilingual (Hindi and Hinglish)
-- **Synthetic Data Advantage**: +15-20% performance boost in STEM & coding domains
-- **Training Strategy**: Multi-stage distillation → SFT → safety alignment.
-- **Synthetic Data Source**: LLM-generated, curated with multi-turn reasoning traces for STEM/coding.
-## 8. Environmental Impact
 ![Carbon Footprint](carbon_footprint.png)
-**Carbon Footprint**: We estimated the environmental impact of training Alpie Core (32B) on 8× NVIDIA H100-80GB GPUs by calculating carbon emissions from GPU energy consumption. The calculation follows the formula:
-CO₂e (kg) = Grid CO₂ Factor (kg/kWh) × Runtime (hours) × Power per GPU (kW) × Number of GPUs
-Training Parameters:
-- Grid CO₂ Factor (Azure average): 0.364 kg CO₂e per kWh
 - Runtime: 408 hours
 - GPUs: 8× H100-80GB
-We report results under two assumption modes:
-**Realistic mode** (average training draw ≈ 250 W per GPU = 0.25 kWh/hr): 0.364 × 408 × 0.25 × 8 ≈ **298 kg CO₂e**
-**Conservative mode** (near TDP ≈ 700 W per GPU = 0.70 kWh/hr): 0.364 × 408 × 0.70 × 8 ≈ **835 kg CO₂e**
-Total training footprint ranges from ~298 kg CO₂e (realistic) to ~835 kg CO₂e (conservative worst-case)
 *This makes Alpie Core one of the most carbon-efficient reasoning models released to date.*
-## 9. Use Cases
-Best for **STEM**, **complex mathematical reasoning**, **coding**, and **Indian context**
-1. **STEM**: Excels at solving advanced problems in science, technology, engineering, and mathematics with high accuracy.
-2. **Complex Mathematical Reasoning**: Handles multi-step logical and quantitative reasoning tasks with strong reliability.
-3. **Coding**: Supports software development, debugging, algorithmic problem-solving, and structured reasoning in code.
-4. **Indian Context**: Provides culturally aware insights, competitive exam assistance (JEE, NEET, UPSC), and multilingual support in Hindi/Hinglish.
-5. **Research Assistants**: Handle long contexts (65K) for academic and legal research.
-## 10. Safety and Limitations
 ### Enhanced Content Access
-Unlike the base DeepSeek model, Alpie Core provides factual, balanced responses to geopolitically sensitive questions, offering global accessibility and factual accuracy on topics like Taiwan's status, Arunachal Pradesh sovereignty, and other sensitive geopolitical issues.
 ### Current Limitations
 - Multilingual reasoning in Hindi/Hinglish shows room for improvement
 - Fixed knowledge cutoff without real-time information retrieval
 - Occasional struggles with complex multi-hop mathematical reasoning
 - Potential hallucinations in factual question-answering
-- Hallucinations: As with all LLMs, outputs should not be used for medical/legal advice without expert oversight.
-- Biases: Training on synthetic + curated datasets reduces bias, but some risks may persist.
 ### Mitigations
 - Safety classifiers and output filtering systems
 - Model-assisted safety pipeline using RLHF
 - Comprehensive adversarial testing by domain experts
-## 11. Quick Start with SDK
-Access Alpie Core easily through our **official Python SDK (`pi169`)** for seamless API integration and CLI usage.
 ```bash
-# Install the SDK
 pip install pi169
-# Set your API key
 export ALPIE_API_KEY="your_key_here"
-# Start using the CLI
-pi169 "Explain 4-bit quantization in simple terms"
 ```
 ### SDK Features
-- **CLI Integration** for quick command-line interactions
-- Streaming & Non-Streaming Chat Completions
-- **Async/Await Support** for high-performance concurrent requests
-- Clean, type-safe Python Interface (dataclasses, type hints)
-- Robust Error Handling with typed exceptions
-- Production-Ready Networking (retries, timeouts, httpx)
-- Fully Tested with pytest
-- Optimized for Reasoning Models
-- **OpenAI-Compatible Client**: Drop-in replacement for OpenAI SDK with full compatibility
-For more details, visit the [pi169 PyPI package](https://pypi.org/project/pi169/0.1/).
-## 12. How to Use
-### Non-Streaming Inference
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-from peft import PeftModel, PeftConfig
-import torch
-# Load LoRA adapter configuration to find the base model
-peft_model_id = "169Pi/Alpie-Core"
-config = PeftConfig.from_pretrained(peft_model_id)
-# Load the base model
-base_model = AutoModelForCausalLM.from_pretrained(
-    config.base_model_name_or_path,
-    torch_dtype=torch.float16,
-    device_map="auto"
-)
-# Load tokenizer
-tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
-# Load LoRA weights
-model = PeftModel.from_pretrained(base_model, peft_model_id)
-# Ensure evaluation mode
-model.eval()
-# Sample inference
-prompt = "Solve the Riemann Hypothesis and provide a final answer?"
-inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
-with torch.no_grad():
-    outputs = model.generate(**inputs, max_new_tokens=1000)
-    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
-print("Response:\n", response)
-```
-### Streaming Inference
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
 from peft import PeftModel, PeftConfig
 import torch
-# Load LoRA adapter configuration to find the base model
 peft_model_id = "169Pi/Alpie-Core"
 config = PeftConfig.from_pretrained(peft_model_id)
-# Load the base model
 base_model = AutoModelForCausalLM.from_pretrained(
     config.base_model_name_or_path,
     torch_dtype=torch.float16,
     device_map="auto"
 )
-# Load tokenizer
 tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
-# Load LoRA weights
 model = PeftModel.from_pretrained(base_model, peft_model_id)
-# Ensure evaluation mode
 model.eval()
-# Initialize streamer
 streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
-# Sample streaming inference
-prompt = "Solve the Riemann Hypothesis and provide a final answer?"
 inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
 print("Streaming Response:")
@@ -331,22 +388,15 @@ with torch.no_grad():
 ```
 ### Deployment Options
-- **Transformers**: Python, PyTorch integration
-- **vLLM**: High-throughput inference
-- **Ollama**: Easy local deployment and inference
-  - **Size**: 20GB
-  - **Requirements**: Minimum 20GB RAM/VRAM for local execution
-  - **Local Deployment**: Runs efficiently on local machines with sufficient resources
-```bash
-# Pull the model
-ollama pull 169pi/alpie-core
-# Run the model
-ollama run 169pi/alpie-core
-```
-## 13. Citation
 ```bibtex
 @misc{169pi2025alpiecore,
@@ -357,34 +407,42 @@ ollama run 169pi/alpie-core
 }
 ```
-## 14. Community & Contributions
-This model is released under the Apache 2.0 license, and we warmly welcome the community to build, download, and extend it.
-1. **Issues & Discussions:** Report bugs, suggest features, or start conversations on the Hugging Face model page.
-2. **Contributions:** Pull requests are welcome for error fixes, performance improvements, and extended functionality.
-3. **Fine-tuning Results:** Share your experiments, benchmarks, and downstream applications with the community.
-4. **Collaboration:** We encourage researchers, developers, and organisations to join in shaping the future of this model.
-Together, we can continue to improve accessibility, safety, and performance for real-world AI applications.
-## 15. License
-Apache 2.0 License – Permissive, allowing free use, modification, and distribution for both research and commercial purposes.
-## 16. Acknowledgements / Credits
-We would like to thank DeepSeek for their original model, which served as the foundation for this work. Our team fine-tuned the model and implemented 4-bit quantization, achieving improved efficiency and accuracy for downstream tasks. This model is built with respect to the contributions of the original authors and aims to provide a safe, high-performance solution for reasoning and inference.
-We are also grateful to the Hugging Face ecosystem (Transformers, PEFT, vLLM, bitsandbytes), the open-source community datasets (MMLU, GSM8K, SWE-Bench, and others), and the support of various cloud providers. Finally, we acknowledge the broader AI research community and companies whose innovations and insights continue to inspire our work.
-## 17. Contact
-For technical inquiries and support: **support@169pi.com**
 ---
-Alpie Core represents a milestone for open-source AI from India, one of the first globally to show that 4-bit reasoning models can rival frontier-scale systems. We hope this release empowers developers, researchers, and organisations worldwide to build more efficient, inclusive, and impactful AI.
-*For technical details, training methodology, and comprehensive evaluation results, please refer to our technical report.*

 library_name: transformers
 pipeline_tag: text-generation
 ---
+# Alpie Core: 4-bit Quantized Reasoning Model
 <p align="center">
   <a href="https://169pi.ai/"><img src="https://img.shields.io/badge/🌐%20Website-169Pi%20AI-blue" alt="Website"></a>
   <a href="https://x.com/169Pi_ai"><img src="https://img.shields.io/badge/X-169Pi%20AI-black" alt="X"></a>
 </p>
+##  TL;DR
+- **32B reasoning model**, trained & served at **4-bit quantization**
+- **Competitive with GPT-4o / Claude 3.5 Sonnet** on reasoning & coding benchmarks
+- **65K context length** for long-document reasoning
+- **Open source** (Apache 2.0) - fully permissive for commercial use
+- Available via **Ollama**, **Hugging Face**, and **hosted API** with 5M free tokens
+📄 **[Technical Report: Alpie Core.pdf](./Alpie_Core.pdf)**
+---
+## How to Use Alpie Core
+### Option 1: Local Inference with Ollama (Recommended for Quick Start)
+```bash
+# Pull the model (20GB)
+ollama pull 169pi/alpie-core
+# Run inference
+ollama run 169pi/alpie-core
+```
+**Requirements**: 20GB RAM/VRAM minimum
+### Option 2: Hosted Inference via 169Pi API
+Get started instantly with our **hosted API** - no setup required!
+ **Get your first free API key** including **5 million tokens** to test real workloads
+- **OpenAI-compatible** - drop-in replacement for OpenAI SDK
+- Supports **streaming**, **async**, and **long-context reasoning**
+- Production-ready with low latency
+ **[Get your API key at 169pi.ai](https://169pi.ai/)**
+### Option 3: Programmatic Access with Python SDK
+```bash
+# Install the official SDK
+pip install pi169
+# Set your API key
+export ALPIE_API_KEY="your_key_here"
+# Use via CLI
+pi169 "Explain quantum entanglement"
+# Or use in Python
+from pi169 import AlpieClient
+client = AlpieClient(api_key="your_key_here")
+response = client.chat.completions.create(
+    model="alpie-core",
+    messages=[{"role": "user", "content": "Solve this coding problem..."}],
+    stream=True
+)
+```
+**SDK Features**: Streaming, async/await, OpenAI compatibility, type-safe interface
+### Option 4: Load Directly with Transformers (Advanced)
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel, PeftConfig
+import torch
+# Load LoRA adapter configuration
+peft_model_id = "169Pi/Alpie-Core"
+config = PeftConfig.from_pretrained(peft_model_id)
+# Load base model + LoRA weights
+base_model = AutoModelForCausalLM.from_pretrained(
+    config.base_model_name_or_path,
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
+model = PeftModel.from_pretrained(base_model, peft_model_id)
+# Inference
+prompt = "Solve: What is the integral of x^2?"
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=1000)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+---
+##  Why Alpie Core?
+**Alpie Core is one of the first fine-tuned 4-bit reasoning models from India, and among the first worldwide at this scale.** Trained on just 8 Hopper GPUs using LoRA and QLoRA 4-bit quantization with synthetic STEM-rich datasets, it proves that aggressive quantization can match and even surpass full-precision baselines.
+With a dramatically reduced memory footprint, Alpie Core delivers competitive, frontier-level reasoning performance, even beating top proprietary models. It achieves:
+- **81.28% on MMLU** (5-shot)
+- **92.75% on GSM8K** (8-shot)
+- **57.8% on SWE-Bench Verified** (ranked #1 globally)
+This demonstrates that efficient models can rival frontier systems while remaining practical for real-world deployment at scale.
 ![Bench](https://cdn-uploads.huggingface.co/production/uploads/66e2f8a815879154e1f9e023/i2SOWOOHdsTx5RajIkyrE.png)
+---
+## Model Summary
 - **Base Architecture**: DeepSeek-R1-Distill-Qwen-32B
 - **Parameters**: 32 billion (quantized to 4-bit)
+- **Training Method**: Supervised Fine-Tuning (SFT) using LoRA/QLoRA
 - **Quantization**: 4-bit NF4 with double quantization
 - **Context Length**: 65k tokens
 - **Max Output Length**: 16,384 tokens
+- **Training Data**: Synthetic (STEM, reasoning, coding) + curated data (law, Indian context, exams, multilingual)
 - **License**: Apache 2.0
+---
+##  Approach
+**Alpie Core** underwent extensive **supervised fine-tuning (SFT)** to strengthen reasoning, robustness, and safety. The training leveraged a diverse mixture of curated open-source datasets and proprietary synthetic data, optimized with high-quality LLM-generated responses. The fine-tuning process emphasized:
+1. **User Understanding and Clarity** – ensuring outputs are direct, interpretable, and pedagogically sound
+2. **Security and Ethical Guidelines** – filtering unsafe or harmful generations
+3. **Limitations and Knowledge Boundaries** – transparently communicating uncertainty
+4. **Handling Complex and Sensitive Topics** – balancing informativeness with responsible guardrails
+5. **Safety and Respectful Engagement** – maintaining politeness, inclusivity, and cultural sensitivity
+6. **Confidentiality and Responsible Use** – preventing leakage of private data or internal reasoning traces
+This approach enables Alpie Core to deliver reliable, aligned, and context-aware responses while maintaining safety across a broad range of use cases, generalizing across global and Indian contexts.
+---
+##  Model Features
+1.  **Supports Streaming** – Real-time token-level responses
+2.  **OpenAI-Compatible API** – Seamless integration with OpenAI client libraries
+3.  **65K Context Length** – Handles very large inputs and conversations
+4.  **16,384 Max Output Length** – Enables extremely long generations
+5.  **4-Bit Quantization** – Memory-efficient and optimized for deployment
+6.  **High Throughput Inference** – Powered by vLLM for efficient large-scale serving
+7.  **Low Latency Inference** – Fast response times optimized for production
+8.  **Customizable Safety & Moderation** – Built-in guardrails for safer outputs
+9.  **Supports Function Calling / Tool Use** – Structured outputs and external API integration
+10. **Instruction Following** – Optimized for reasoning and chain-of-thought answers
+11. **Education & Research Ready** – Tailored for competitive exams, STEM reasoning, and knowledge tasks
+---
+##  Key Highlights
+1. **First 4-bit Reasoning Model from India**: Competitive globally with frontier models
+2. **Benchmark Competitiveness**: Outperforms or matches 70B+ models across reasoning, math, and coding
+3. **STEM & Coding Strength**: Excellent on GSM8K, MATH-500, HumanEval, SWE-Bench Verified
+4. **Efficiency & Deployment**: 16 GB VRAM footprint, runs on commodity GPUs
+5. **Extended Context Length**: 65K tokens for research papers, multi-document reasoning
+6. **Environmental Benefits**: ~298–835 kg CO₂e, 2–3× more efficient than FP16 training
+7. **Open-Source Commitment**: Released under Apache 2.0 for global use
+---
+## Benchmark Results
 ![Combined Benchmark](combined_benchmark.png)
+### Core Benchmarks
+| Benchmark | Alpie Core (32B-4bit) | DeepSeek-V2 (236B) | Qwen2.5 72B | Llama 3.1 405B | Llama 3.1 70B | Gemma-3 27B-PT | Mistral-Small-24B |
+|-----------|----------------------|-------------------|-------------|---------------|---------------|----------------|-------------------|
 | MMLU (5-shot) | **81.28%** | 78.4% | 85.0% | 84.4% | 79.3% | 78.6% | 80.73% |
 | GSM8K (8-shot) | **92.75%** | 81.6% | 88.3% | 83.5% | - | 82.2% | 80.73% |
 | BBH (3-shot) | **85.12%** | 78.8% | 79.8% | 82.9% | 81.6% | 77.7% | - |
 | MBPP (pass@1) | **75.20%** | 65.0% | 72.6% | 68.4% | - | 65.6% | 69.64% |
 | HumanEval (pass@1) | **57.23%** | 43.3% | 53.0% | 54.9% | - | 48.8% | - |
+### SWE-Bench Verified Performance (#1 Globally)
+| Rank | Model | Accuracy (%) | vs Alpie |
+|------|-------|-------------|----------|
+| **1** | **Alpie Core** | **57.8** | **—** |
+| 2 | Qwen3-Coder-30B-A3B-Instruct | 51.6 | -6.2% |
+| 3 | o1 | 48.9 | -8.9% |
+| 4 | o3-mini (high) | 49.3 | -8.5% |
+| 5 | Claude 3.5 Sonnet | 49.0 | -8.8% |
+| 6 | DeepSeek R1 | 49.2 | -8.6% |
+| 7 | Devstral | 46.8 | -11.0% |
+### Humanity's Last Exam Leaderboard (#3 Globally)
+| Rank | Model | Accuracy (%) | vs Alpie |
+|------|-------|-------------|----------|
+| 1 | GPT 4.5 Preview | 5.8 | +0.39% |
+| 2 | Claude Sonnet 4 | 5.42 | +0.01% |
+| **3** | **Alpie Core 32B (4-bit)** | **5.41** | **—** |
+| 4 | Llama 4 Maverik | 5.34 | -0.07% |
+| 5 | GPT 4.1 | 4.97 | -0.44% |
+| 6 | Kimi K2 Instruct | 4.68 | -0.73% |
+| 7 | DeepSeek V3 | 4.55 | -0.86% |
 ![Humanity's Last Exam](HLE.png)
 ### Additional Benchmarks
+| Benchmark | Alpie Core | Category |
+|-----------|-----------|----------|
 | AIME | **47.34%** | Advanced Mathematics |
 | GPQA (Diamond) | **40.91%** | Graduate-level QA |
 | TruthfulQA (MC2) | **60.05%** | Truthfulness |
 ![AIME Benchmark](AIME.png)
+---
+## Training Details
+- **Hardware**: 8× NVIDIA H100-80GB GPUs
+- **Fine-tuning Method**: LoRA/QLoRA
   - LoRA Alpha: 16
   - LoRA Dropout: 0.05
   - LoRA Rank: 16
 - **Quantization**: 4-bit NF4 + Double Quantization + FP16 compute
+- **Dataset Domains**: Mathematics, coding, reasoning, science, competitive exams, Indian context + law, multilingual (Hindi/Hinglish)
+- **Synthetic Data Advantage**: +15-20% performance boost in STEM & coding
+- **Training Strategy**: Multi-stage distillation → SFT → safety alignment
+- **Total Training Time**: 408 hours
+---
+## Environmental Impact
 ![Carbon Footprint](carbon_footprint.png)
+We estimated the carbon footprint of training Alpie Core on 8× NVIDIA H100-80GB GPUs:
+**Formula**: CO₂e (kg) = Grid CO₂ Factor × Runtime × Power per GPU × Number of GPUs
+**Training Parameters**:
+- Grid CO₂ Factor (Azure): 0.364 kg CO₂e/kWh
 - Runtime: 408 hours
 - GPUs: 8× H100-80GB
+**Results**:
+- **Realistic mode** (250W avg per GPU): **~298 kg CO₂e**
+- **Conservative mode** (700W TDP per GPU): **~835 kg CO₂e**
 *This makes Alpie Core one of the most carbon-efficient reasoning models released to date.*
+---
+## Use Cases
+Best for **STEM**, **complex mathematical reasoning**, **coding**, and **Indian context**
+1. **STEM Education**: Advanced problem-solving in science, technology, engineering, mathematics
+2. **Mathematical Reasoning**: Multi-step logical and quantitative reasoning
+3. **Software Development**: Code generation, debugging, algorithmic problem-solving
+4. **Indian Context**: Competitive exam assistance (JEE, NEET, UPSC), Hindi/Hinglish support
+5. **Research & Legal**: 65K context for academic papers, legal documents, long-form analysis
+---
+## Safety and Limitations
 ### Enhanced Content Access
+Unlike the base DeepSeek model, Alpie Core provides factual, balanced responses to geopolitically sensitive questions, offering global accessibility on topics like Taiwan's status, Arunachal Pradesh sovereignty, and other sensitive issues.
 ### Current Limitations
 - Multilingual reasoning in Hindi/Hinglish shows room for improvement
 - Fixed knowledge cutoff without real-time information retrieval
 - Occasional struggles with complex multi-hop mathematical reasoning
 - Potential hallucinations in factual question-answering
+- Should not be used for medical/legal advice without expert oversight
 ### Mitigations
 - Safety classifiers and output filtering systems
 - Model-assisted safety pipeline using RLHF
 - Comprehensive adversarial testing by domain experts
+---
+## Python SDK Quick Start
 ```bash
+# Install
 pip install pi169
+# Set API key
 export ALPIE_API_KEY="your_key_here"
+# CLI usage
+pi169 "Explain 4-bit quantization"
 ```
 ### SDK Features
+- **CLI Integration** for quick interactions
+- **Streaming & Non-Streaming** completions
+- **Async/Await Support** for concurrent requests
+- **Type-safe Interface** with dataclasses
+- **Robust Error Handling**
+- **OpenAI-Compatible**: Drop-in replacement
+[Full SDK documentation on PyPI](https://pypi.org/project/pi169/0.1/)
+---
+## Advanced Usage Examples
+### Streaming Inference with Transformers
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
 from peft import PeftModel, PeftConfig
 import torch
 peft_model_id = "169Pi/Alpie-Core"
 config = PeftConfig.from_pretrained(peft_model_id)
 base_model = AutoModelForCausalLM.from_pretrained(
     config.base_model_name_or_path,
     torch_dtype=torch.float16,
     device_map="auto"
 )
 tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
 model = PeftModel.from_pretrained(base_model, peft_model_id)
 model.eval()
 streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
+prompt = "Explain the P vs NP problem"
 inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
 print("Streaming Response:")
 ```
 ### Deployment Options
+- **Transformers**: Python, PyTorch integration
+- **vLLM**: High-throughput inference server
+- **Ollama**: Easy local deployment (20GB model size)
+- **169Pi API**: Production-ready hosted inference
+---
+## Citation
 ```bibtex
 @misc{169pi2025alpiecore,
 }
 ```
+---
+## Community & Contributions
+Released under Apache 2.0 - we welcome the community to build, extend, and improve!
+1. **Issues & Discussions**: Report bugs or suggest features on Hugging Face
+2. **Contributions**: Pull requests welcome for improvements
+3. **Share Results**: Post your fine-tuning experiments and benchmarks
+4. **Collaborate**: Join us in shaping the future of efficient AI
+---
+## License
+**Apache 2.0 License** – Permissive for research and commercial use
+---
+## Acknowledgements
+Thanks to **DeepSeek** for the original model foundation. We also acknowledge:
+- **Hugging Face** ecosystem (Transformers, PEFT, vLLM, bitsandbytes)
+- Open-source datasets (MMLU, GSM8K, SWE-Bench, etc.)
+- Cloud infrastructure providers
+- The broader AI research community
+---
+## Contact
+**Technical Support**: support@169pi.com
 ---
+*Alpie Core represents a milestone for open-source AI from India, demonstrating that 4-bit reasoning models can rival frontier-scale systems. We hope this release empowers developers, researchers, and organizations worldwide to build more efficient, inclusive, and impactful AI.*
+**Get started today with 5 million free tokens at [169pi.ai](https://169pi.ai/)**