deepanshupillm commited on
Commit
8ec4403
·
verified ·
1 Parent(s): 3dd23e5

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +173 -0
README.md CHANGED
@@ -0,0 +1,173 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Alpie-Core: 4-bit Quantized Reasoning Model
2
+
3
+ ---
4
+
5
+ *[Space reserved for blog paper, technical report links, and company logo]*
6
+
7
+ ---
8
+
9
+ ## 1. Introduction
10
+
11
+ Alpie-Core is one of the world's first fine-tuned 4-bit reasoning models, proving that aggressive quantization can surpass full-precision baselines in reasoning, mathematics, and coding. By combining cutting-edge quantization-aware training with synthetic STEM-rich datasets, Alpie-Core achieves frontier-level reasoning while being practical for real-world deployment at scale.
12
+
13
+ ## 2. Model Summary
14
+
15
+ - **Base Architecture**: DeepSeek-R1-Distill-Qwen-32B
16
+ - **Parameters**: 32 billion (quantized to 4-bit)
17
+ - **Training Method**: Supervised Fine-Tuning (SFT) using LoRA/QLoRA techniques
18
+ - **Quantization**: 4-bit NF4 with double quantization
19
+ - **Context Length**: 65,536 tokens
20
+ - **Max Output Length**: 16,384 tokens
21
+ - **License**: Apache 2.0
22
+ - **Memory Footprint**: ~8GB (75% reduction from full-precision)
23
+
24
+ ## 3. Model Features
25
+
26
+ 1. **Supports Streaming** – Real-time token-level responses
27
+ 2. **OpenAI-Compatible API** – Seamless integration with OpenAI client libraries
28
+ 3. **65K Context Length** – Handles very large inputs and conversations
29
+ 4. **16,384 Max Output Length** – Enables extremely long generations
30
+ 5. **4-Bit Quantization** – Memory-efficient and optimized for deployment
31
+ 6. **High Throughput Inference** – Powered by vLLM for efficient large-scale serving
32
+ 7. **Low Latency Inference** – Fast response times optimized for production
33
+ 8. **Customizable Safety & Moderation Filters** – Built-in guardrails for safer outputs
34
+ 9. **Supports Function Calling / Tool Use** – Enables structured outputs and external API integration
35
+
36
+ ## 4. Key Highlights
37
+
38
+ - **Frontier Performance in 4-bit**: 81.28% MMLU, 92.75% GSM8K, 57.8% SWE-Bench Verified
39
+ - **Global Ranking**: 3rd place on Humanity's Last Exam leaderboard
40
+ - **Cost Advantage**: 70-88% lower inference cost vs GPT-4/Claude/DeepSeek
41
+ - **Environmental Impact**: 64% lower carbon footprint per inference
42
+ - **STEM + Coding Excellence**: Outperforms full-precision peers in mathematics and programming
43
+ - **Enhanced Content Access**: Provides factual responses to geopolitically sensitive topics
44
+
45
+ ## 5. Benchmark Results
46
+
47
+ | Benchmark | Alpie-Core (32B-4bit) | DeepSeek-V2 (236B) | Qwen2.5 72B | Llama 3.1 405B | Llama 3.1 70B | Gemma-3 27B-PT | Category |
48
+ |-----------|----------------------|-------------------|-------------|---------------|---------------|----------------|----------|
49
+ | MMLU (5-shot) | **81.28%** | 78.4% | 85.0% | 84.4% | 79.3% | 78.6% | General Knowledge |
50
+ | GSM8K (8-shot) | **92.75%** | 81.6% | 88.3% | 83.5% | - | 82.2% | Mathematical Reasoning |
51
+ | BBH (3-shot) | **85.12%** | 78.8% | 79.8% | 82.9% | 81.6% | 77.7% | Complex Reasoning |
52
+ | MMLU-Pro (5-shot) | **64.78%** | 51.4% | 58.3% | 52.8% | 53.8% | 52.2% | Advanced Reasoning |
53
+ | MBPP (pass@1) | **75.20%** | 65.0% | 72.6% | 68.4% | - | 65.6% | Code Generation |
54
+ | HumanEval (pass@1) | **57.23%** | 43.3% | 53.0% | 54.9% | - | 48.8% | Code Generation |
55
+ | SWE-Bench Verified | **57.8%** | - | - | - | - | - | Software Engineering |
56
+ | AIME | **47.34%** | - | - | - | - | - | Advanced Mathematics |
57
+ | GPQA (Diamond) | **40.91%** | - | - | - | - | - | Graduate-level QA |
58
+ | TruthfulQA (MC2) | **60.05%** | - | - | - | - | - | Truthfulness |
59
+ | HellaSwag | **84.66%** | - | - | - | - | - | Commonsense |
60
+ | PIQA | **83.24%** | - | - | - | - | - | Physical Reasoning |
61
+ | ARC Challenge | **67.58%** | - | - | - | - | - | Science QA |
62
+ | CommonSenseQA | **87.06%** | - | - | - | - | - | Commonsense |
63
+ | AGIEval | **64.98%** | - | - | - | - | - | General Intelligence |
64
+ | Winogrande | **79.53%** | - | - | - | - | - | Commonsense Reasoning |
65
+
66
+ ### Humanity's Last Exam Leaderboard Performance
67
+
68
+ | Rank | Model | Accuracy (%) | Performance vs Alpie |
69
+ |------|-------|-------------|---------------------|
70
+ | 1 | GPT 4.5 Preview | 5.8 | Above Alpie |
71
+ | 2 | Claude Sonnet 4 | 5.42 | Above Alpie |
72
+ | **3** | **Alpie Core 32B (4-bit)** | **5.41** | **Alpie** |
73
+ | 4 | Llama 4 Maverik | 5.34 | Below Alpie |
74
+ | 5 | GPT 4.1 | 4.97 | Below Alpie |
75
+ | 6 | Kimi K2 Instruct | 4.68 | Below Alpie |
76
+ | 7 | DeepSeek V3 | 4.55 | Below Alpie |
77
+ | 8 | Gemini 1.5 Pro 002 | 4.55 | Below Alpie |
78
+
79
+ ## 6. Training Details
80
+
81
+ - **Hardware**: 8× NVIDIA A100-80GB GPUs
82
+ - **Training Duration**: 408 hours
83
+ - **Fine-tuning Method**: LoRA/QLoRA with the following configuration:
84
+ - LoRA Alpha: 8
85
+ - LoRA Dropout: 0.05
86
+ - LoRA Rank: 8
87
+ - **Quantization**: 4-bit NF4 + Double Quantization + FP16 compute
88
+ - **Dataset Domains**: Mathematics, coding, reasoning, science, general knowledge, competitive exams, Indian context + law, multilingual (Hindi and Hinglish)
89
+ - **Synthetic Data Advantage**: +15-20% performance boost in STEM & coding domains
90
+
91
+ ## 7. Environmental Impact
92
+
93
+ **Carbon Footprint**: 298-835 kg CO₂e (training)
94
+
95
+ ## 8. Use Cases
96
+
97
+ ### Scientific Research Excellence
98
+ - 98% performance on SciQ benchmark
99
+ - Advanced physics, chemistry, and mathematical sciences
100
+ - Literature review automation and hypothesis generation
101
+ - Experimental design optimization
102
+
103
+ ### Advanced Coding and Software Engineering
104
+ - 57.8% SWE-Bench Verified score (12% above nearest competitor)
105
+ - Automated bug detection and GitHub issue resolution
106
+ - Competitive programming and algorithm design
107
+ - Enterprise software development and architecture design
108
+
109
+ ### Indian Cultural and Religious Expertise
110
+ - Comprehensive understanding of Hindu philosophy, Buddhist traditions
111
+ - Regional diversity and cultural knowledge across Indian states
112
+ - Legal and constitutional framework understanding
113
+ - Educational support for Indian competitive exams (JEE, NEET, UPSC, SSC)
114
+
115
+ ## 9. Safety and Limitations
116
+
117
+ ### Enhanced Content Access
118
+ Unlike the base DeepSeek model, Alpie-Core provides factual, balanced responses to geopolitically sensitive questions, offering global accessibility and factual accuracy on topics like Taiwan's status, Arunachal Pradesh sovereignty, and other sensitive geopolitical issues.
119
+
120
+ ### Current Limitations
121
+ - Multilingual reasoning in Hindi/Hinglish shows room for improvement
122
+ - Fixed knowledge cutoff without real-time information retrieval
123
+ - Occasional struggles with complex multi-hop mathematical reasoning
124
+ - Potential hallucinations in factual question-answering
125
+
126
+ ### Mitigations
127
+ - Safety classifiers and output filtering systems
128
+ - Model-assisted safety pipeline using RLHF
129
+ - Comprehensive adversarial testing by domain experts
130
+
131
+ ## 10. How to Use
132
+
133
+ ### Installation
134
+ ```python
135
+ from transformers import AutoTokenizer, AutoModelForCausalLM
136
+
137
+ model_id = "alpie/Alpie-Core-4bit"
138
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
139
+ model = AutoModelForCausalLM.from_pretrained(
140
+ model_id,
141
+ device_map="auto",
142
+ torch_dtype="auto"
143
+ )
144
+
145
+ messages = [{"role": "user", "content": "Solve 2x^2 + 3x + 5 = 0"}]
146
+ inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
147
+ outputs = model.generate(**inputs, max_new_tokens=512)
148
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
149
+ ```
150
+
151
+ ### Deployment Options
152
+ - **Transformers**: Python, PyTorch integration
153
+ - **vLLM**: High-throughput inference
154
+ - **LMDeploy/Ollama/TensorRT-LLM**: Production deployments
155
+
156
+ ## 11. Citation
157
+
158
+ ```bibtex
159
+ @misc{alpie2025core,
160
+ title = {Alpie-Core: A 4-bit Quantized Reasoning Model Surpassing Full-Precision Benchmarks},
161
+ author = {Alpie AI},
162
+ year = {2025},
163
+ url = {https://huggingface.co/alpie/Alpie-Core-4bit}
164
+ }
165
+ ```
166
+
167
+ ## 12. License
168
+
169
+ Apache 2.0 – Free for research and commercial use
170
+
171
+ ---
172
+
173
+ *For technical details, training methodology, and comprehensive evaluation results, please refer to our technical report.*