Phase-Technologies commited on
Commit
da8557d
Β·
verified Β·
1 Parent(s): 73a8f60

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -86
README.md CHANGED
@@ -14,122 +14,91 @@ tags:
14
  - custom-finetune
15
  - lor-merged
16
  base_model: unsloth/Qwen2.5-Math-1.5B
 
 
17
  ---
18
 
19
- # Qwen2.5-Math-1.5B: Generalized & Merged (Phase-Technologies)
20
-
21
- An ultra-lightweight, high-speed reasoning model heavily fine-tuned for graduate-level mathematical proofs while retaining full conversational generalization and fundamental safety alignments.
22
-
23
- This model was fine-tuned using **Unsloth** on a standard Google Colab T4 GPU, utilizing Low-Rank Adaptation (LoRA) and subsequently merged into a full 16-bit standalone model for seamless deployment.
24
-
25
- ## πŸš€ Model Overview
26
-
27
- Standard math-specific LLMs often suffer from **catastrophic forgetting**β€”when prompted with basic conversational queries, they either hallucinate lengthy pseudo-proofs or fail entirely.
28
-
29
- This model was engineered to solve that problem. It bridges the gap between hyper-specialized mathematical reasoning and general instruction-following by utilizing a carefully balanced, dual-distribution training dataset.
30
-
31
- * **Primary Capability:** Capable of outputting highly detailed, step-by-step proofs for advanced algebraic, topological, and geometric problems.
32
- * **Secondary Capability:** Capable of standard, conversational instruction-following without format-forced hallucinations.
33
- * **Hardware Optimized:** Designed to run at maximum throughput on low-VRAM environments (like the 16GB T4 GPU) using 4-bit quantization and Triton kernels.
34
-
35
- ## πŸ— Architecture & Training Methodology
36
-
37
- ### Base Model
38
- * **Original Architecture:** [Qwen2.5-Math-1.5B](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B)
39
- * **Parameters:** 1.5 Billion
40
- * **Precision:** Merged 16-bit (Fine-tuned in 4-bit via Unsloth)
41
-
42
- ### The Dataset: Dual-Distribution Blending
43
- To achieve generalization, the model was fine-tuned on a 50/50 blend of two distinct datasets, batched and streamed via high-throughput Parquet files:
44
- 1. **Xerv-AI/GRAD (1.93k rows):** A synthetic dataset containing exceptionally long (average 8,000 characters) graduate and research-level mathematical proofs formatted in strict LaTeX.
45
- 2. **yahma/alpaca-cleaned (2k rows):** A subset of the standard Alpaca dataset used to teach the model how to answer basic queries, roleplay, and recognize when *not* to use complex math.
46
-
47
- ### Training Configuration
48
- The fine-tuning process was executed via Supervised Fine-Tuning (SFT) targeting the attention mechanisms.
49
- * **Target Modules:** `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
50
- * **LoRA Rank (r):** 16
51
- * **LoRA Alpha:** 16
52
- * **Optimizer:** `adamw_8bit`
53
- * **Learning Rate:** 2e-4
54
- * **Effective Batch Size:** 8 (Batch size 2 with 4 Gradient Accumulation steps)
55
-
56
- ## πŸ›‘οΈ Safety & Alignment
57
-
58
- Despite being fine-tuned on unfiltered mathematical and conversational data, **this model retains its original safety alignment**. Because only 1-2% of the parameters were updated via LoRA (and later merged), the base Qwen2.5 weights responsible for safety remain fully intact.
59
-
60
- * **NSFW/18+ Prompts:** The model will actively refuse to generate explicit, illegal, or harmful content, relying on the RLHF and DPO safety guardrails instilled during its original pre-training phase.
61
-
62
- ## πŸ’» Usage & Inference
63
-
64
- The model is highly responsive to a strict Instruction/Response template. For best results, use a `repetition_penalty` of roughly 1.15 to prevent the model from infinitely looping through math steps on simpler problems.
65
-
66
- ### Installation
67
  ```bash
68
- pip install unsloth transformers accelerate
69
-
70
  ```
71
- ### Python Inference Script
72
  ```python
73
  from unsloth import FastLanguageModel
74
  import torch
75
-
76
- # 1. Configuration
77
- repo_name = "Phase-Technologies/qwen2.5-math-1.5b-generalized-merged"
78
  max_seq_length = 2048
79
-
80
- # 2. Load the fully merged model
81
- # Loading in 4-bit is highly recommended for low-VRAM GPUs (like T4)
82
  model, tokenizer = FastLanguageModel.from_pretrained(
83
  model_name = repo_name,
84
  max_seq_length = max_seq_length,
85
  dtype = None,
86
  load_in_4bit = True,
87
  )
88
-
89
- # 3. Switch to highly optimized inference mode
90
  FastLanguageModel.for_inference(model)
91
-
92
- # 4. Define the universal prompt template
93
  universal_prompt = """### Instruction:
94
  {}
95
-
96
  ### Response:
97
  {}"""
98
-
99
- # 5. Prepare your query
100
- query = "Provide a step-by-step proof finding the eigenvalues of the matrix [[2, 1], [1, 2]]."
101
-
102
  inputs = tokenizer(
103
  [universal_prompt.format(query, "")],
104
  return_tensors = "pt"
105
  ).to("cuda")
106
-
107
- print("Generating response...")
108
-
109
- # 6. Generate the output
110
  outputs = model.generate(
111
  **inputs,
112
  max_new_tokens = 1024,
113
- max_length = None, # Bypasses Hugging Face length warnings
114
  use_cache = True,
115
- repetition_penalty = 1.15, # Critical: prevents infinite generation loops
116
  pad_token_id = tokenizer.eos_token_id
117
  )
118
-
119
- # 7. Decode and print the result
120
  response = tokenizer.batch_decode(outputs, skip_special_tokens = True)[0]
121
  print(f"\n{'='*50}\nOutput:\n{'='*50}")
122
  print(response.split("### Response:\n")[-1])
123
-
124
- ```
125
- ## πŸ“Š Limitations & Biases
126
- * **Language:** The model is optimized exclusively for English.
127
- * **Arithmetic Hallucinations:** While highly capable of symbolic logic and structured proofs, 1.5B parameter models can occasionally suffer from minor arithmetic errors (e.g., simple subtraction mistakes) deep within long proofs.
128
- * **Prompt Sensitivity:** The model performs best when math queries explicitly ask for a "proof" or "step-by-step" breakdown in the instruction block.
129
- ## 🀝 Acknowledgements
130
- * **Alibaba Cloud:** For the phenomenal base Qwen2.5-Math architecture.
131
- * **Unsloth AI:** For the Triton-optimized training kernels that made compiling this model possible on consumer hardware.
132
- * **Xerv-AI:** For the GRAD synthetic dataset powering the advanced reasoning capabilities.
133
- ```
134
-
135
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  - custom-finetune
15
  - lor-merged
16
  base_model: unsloth/Qwen2.5-Math-1.5B
17
+ datasets:
18
+ - Xerv-AI/GRAD
19
  ---
20
 
21
+ ## 🌌 Xerv-AI/Ada: The Multi-Modal Mathematical Generalist SLM
22
+ **Ada** is an ultra-lightweight, high-speed, and highly optimized reasoning Small Language Model (SLM) derived from the powerful **Qwen2.5-Math-1.5B** architecture. Engineered specifically to bridge the gap between hyper-specialized graduate-level mathematical proofs and standard conversational utility, Ada solves the notorious "catastrophic forgetting" problem often found in math-heavy fine-tunes.
23
+ Whether you need a step-by-step calculus breakdown, a topological proof in LaTeX, or just a simple conversational assistant for daily tasks, Ada delivers state-of-the-art performance for a 1.5 Billion parameter model.
24
+
25
+ ### πŸš€ Model Overview
26
+ Standard math-specific LLMs frequently suffer from domain overfitting. When prompted with basic conversational queries, they either hallucinate lengthy pseudo-proofs or fail entirely to understand the user's intent. **Xerv-AI/Ada** was meticulously engineered to resolve this by utilizing a carefully balanced, dual-distribution training dataset, allowing it to act as both a rigorous STEM assistant and a general-purpose chat model.
27
+
28
+ | Specification | Details |
29
+ | :--- | :--- |
30
+ | **Model Name** | Xerv-AI/Ada |
31
+ | **Base Architecture** | unsloth/Qwen2.5-Math-1.5B |
32
+ | **Parameter Count** | 1.5 Billion |
33
+ | **Primary Capabilities** | Graduate-level STEM reasoning, logical deduction, and mathematical proofs. |
34
+ | **Secondary Capabilities** | General conversational instruction-following, roleplay, and basic coding. |
35
+ | **Training Framework** | QLoRA via Unsloth (Triton kernels). |
36
+ | **Precision** | Merged 16-bit (Fine-tuned in 4-bit). |
37
+ | **License** | Apache-2.0 | <br> ### πŸ”¬ Core Capabilities & Strengths <br> * **Balanced Generalization:** Ada seamlessly transitions between casual conversation and intense analytical problem-solving without format-forced hallucinations. <br> * **Advanced STEM Reasoning:** Fully optimized to generate detailed, multi-step logical proofs in advanced algebra, calculus, topology, and physics. <br> * **Hardware Optimized for Edge Deployment:** Designed to run at maximum inference throughput on low-VRAM consumer hardware (such as a single 16GB NVIDIA T4 GPU, Mac M-series chips, or edge devices) using 4-bit quantization. <br> * **Impeccable Formatting:** Native understanding of structural formatting, easily outputting highly readable markdown and structured logic steps. <br> ### πŸ— Architecture & Training Methodology <br> Ada was trained using Supervised Fine-Tuning (SFT) targeting the attention mechanisms of the base model. Utilizing **Unsloth** on a standard Google Colab NVIDIA T4 GPU, the training leveraged Low-Rank Adaptation (LoRA) to maximize efficiency before being merged into a standalone 16-bit Hugging Face model. <br> * **Target Modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj <br> * **LoRA Rank (r):** 16 <br> * **LoRA Alpha:** 16 <br> * **Optimizer:** adamw_8bit <br> * **Learning Rate:** 2e-4 <br> * **Effective Batch Size:** 8 (Batch size 2 with 4 Gradient Accumulation steps) <br> ### πŸ“š The Dataset: Dual-Distribution Blending <br> To achieve generalization and prevent catastrophic forgetting, Ada was fine-tuned on a strict 50/50 blend of two distinct datasets, batched and streamed via high-throughput Parquet files:
38
+ | Dataset | Sample Size | Description & Purpose |
39
+ | :--- | :--- | :--- |
40
+ | **Xerv-AI/GRAD** | ~1.93k rows | A proprietary synthetic dataset containing exceptionally long (average 8,000 characters) graduate and research-level mathematical proofs. This instills deep reasoning and strict formatting. |
41
+ | **yahma/alpaca-cleaned** | ~2.00k rows | A refined subset of the standard Alpaca dataset. This teaches the model conversational flow, roleplay, basic Q&A, and crucially, *when not to use complex math*. |
42
+
43
+ ### πŸ’» Usage & Python Inference Guide
44
+ The model is highly responsive to the standard **Alpaca Instruction/Response template**.
45
+ **Important Inference Note:** For best results, use a repetition_penalty of roughly **1.15**. This acts as a crucial guardrail to prevent the model from infinitely looping through mathematical steps on overly simple arithmetic queries.
46
+ **1. Installation Requirements**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  ```bash
48
+ pip install unsloth transformers accelerate torch
 
49
  ```
50
+ **2. Fast Inference Script**
51
  ```python
52
  from unsloth import FastLanguageModel
53
  import torch
54
+ # Configuration
55
+ repo_name = "Xerv-AI/Ada"
 
56
  max_seq_length = 2048
57
+ # Load the model and tokenizer (4-bit recommended for low-VRAM)
 
 
58
  model, tokenizer = FastLanguageModel.from_pretrained(
59
  model_name = repo_name,
60
  max_seq_length = max_seq_length,
61
  dtype = None,
62
  load_in_4bit = True,
63
  )
64
+ # Enable optimized inference mode
 
65
  FastLanguageModel.for_inference(model)
66
+ # Define the universal prompt template
 
67
  universal_prompt = """### Instruction:
68
  {}
 
69
  ### Response:
70
  {}"""
71
+ # Prepare your query
72
+ query = "Provide a step-by-step logical proof finding the eigenvalues of the matrix [[2, 1], [1, 2]]."
 
 
73
  inputs = tokenizer(
74
  [universal_prompt.format(query, "")],
75
  return_tensors = "pt"
76
  ).to("cuda")
77
+ print("Generating analytical response...")
78
+ # Generate the output
 
 
79
  outputs = model.generate(
80
  **inputs,
81
  max_new_tokens = 1024,
82
+ max_length = None,
83
  use_cache = True,
84
+ repetition_penalty = 1.15, # Critical: prevents generation loops
85
  pad_token_id = tokenizer.eos_token_id
86
  )
87
+ # Decode and print the result
 
88
  response = tokenizer.batch_decode(outputs, skip_special_tokens = True)[0]
89
  print(f"\n{'='*50}\nOutput:\n{'='*50}")
90
  print(response.split("### Response:\n")[-1])
 
 
 
 
 
 
 
 
 
 
 
 
91
  ```
92
+ ### πŸ›‘οΈ Safety & Alignment Guardrails
93
+ Despite being fine-tuned on raw mathematical logic and conversational instruction data, Ada successfully retains its foundational safety alignments. Because only 1% to 2% of the parameters were actively updated via LoRA (and subsequently merged), the original base Qwen2.5 weights responsible for safety remain fully intact.
94
+ * **Content Moderation:** The model actively refuses to generate explicit, illegal, or harmful content, relying on the RLHF and DPO safety guardrails instilled during Alibaba's original pre-training phase.
95
+ ### ⚠️ Limitations & Known Biases
96
+ While Ada punches well above its 1.5B weight class, it is important to acknowledge the limitations inherent to Small Language Models:
97
+ * **Arithmetic Hallucinations:** Ada is exceptionally capable at symbolic logic, structural breakdowns, and mathematical theory. However, like many SLMs, it can occasionally suffer from minor arithmetic errors (e.g., basic addition/subtraction mistakes) deep within multi-page proofs. Always verify raw calculations.
98
+ * **Language Constraint:** The model is optimized exclusively for **English** text and standard mathematical notation.
99
+ * **Prompt Sensitivity:** Ada performs at its absolute peak when math queries explicitly ask for a "proof," "step-by-step breakdown," or "logical analysis" within the instruction block.
100
+ * **World Knowledge:** It lacks the broad, encyclopedic trivia knowledge found in massive 70B+ parameter models.
101
+ ### 🀝 Acknowledgements
102
+ * **Alibaba Cloud:** For the phenomenal, state-of-the-art base Qwen2.5-Math architecture.
103
+ * **Unsloth AI:** For the Triton-optimized training kernels that made compiling and fine-tuning this model possible and highly efficient on consumer hardware.
104
+ * **Xerv-AI:** For the curation of the GRAD synthetic dataset powering the advanced reasoning capabilities.