deepanshupillm commited on
Commit
3dd23e5
·
verified ·
1 Parent(s): 88c3eda

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -55
README.md CHANGED
@@ -1,55 +0,0 @@
1
- # 169Pi/Alpie-core
2
-
3
- ## Model Summary
4
- `169Pi/Alpie-core` is a 32B parameter causal language model.
5
- It is the **world’s first large-scale 4-bit LoRA-trained model**, optimized over **three distinct training phases** for reasoning, knowledge integration, and benchmark performance.
6
- The model specializes in **mathematics, coding, science, competitive exams, Indian context, and law**.
7
-
8
- ---
9
-
10
- ## Model Details
11
- - **Base Model:** `deepseek-ai/DeepSeek-R1-Distill-Qwen-32B`
12
- - **Architecture:** 32B parameter causal LM (chat-optimized)
13
- - **Quantization:** 4-bit NF4 with double quantization enabled
14
- - **Precision for Inference:** 4-bit NF4
15
- - **Frameworks:** PEFT, LoRA, bitsandbytes, PyTorch
16
- - **Max Context Length:** 65k tokens
17
- - **Deployment Framework:** vLLM
18
- - **License:** *(to be filled)*
19
-
20
- ---
21
-
22
-
23
- ---
24
-
25
- ## Hyperparameters
26
- - **Epochs per phase:** 2
27
- - **Batch Size:** 256
28
- - **Gradient Accumulation Steps:** 4
29
- - **Learning Rate:** `1e-5` (initially `2e-5`, reduced to avoid early over-generalization)
30
- - **Scheduler:** Cosine
31
- - **Optimizer:** AdamW (`adamw_torch`)
32
- - **LoRA Rank (r):** 16
33
- - **LoRA Alpha:** 8
34
- - **LoRA Dropout:** 0.1
35
- - **Target Modules:** `q_proj`, `v_proj`
36
-
37
- ---
38
-
39
-
40
-
41
- ## Intended Use
42
- - **Primary:** Educational tutoring, competitive exam preparation, coding assistance, legal reasoning, general knowledge Q&A.
43
- - **Secondary:** Research support, problem-solving in science and mathematics.
44
-
45
- ---
46
-
47
- ## Limitations & Warnings
48
- - May produce inaccurate or outdated information for highly recent events.
49
- - Not suitable for tasks requiring legal or medical advice without expert review.
50
- - Performance may vary outside trained domains.
51
-
52
- ---
53
-
54
- ## Citation
55
- If you use this model in your research, please cite: