Text Generation
Safetensors
PyTorch
English
qwen2
unsloth
qwen
qwen2.5
math
reasoning
alpaca
custom-finetune
lora-merged
Instructions to use Xerv-AI/Ada with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps Settings
- Unsloth Studio
How to use Xerv-AI/Ada with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Xerv-AI/Ada to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Xerv-AI/Ada to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Xerv-AI/Ada to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Xerv-AI/Ada", max_seq_length=2048, )
Update README.md
Browse files
README.md
CHANGED
|
@@ -14,122 +14,91 @@ tags:
|
|
| 14 |
- custom-finetune
|
| 15 |
- lor-merged
|
| 16 |
base_model: unsloth/Qwen2.5-Math-1.5B
|
|
|
|
|
|
|
| 17 |
---
|
| 18 |
|
| 19 |
-
#
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
## π Architecture & Training Methodology
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
2. **yahma/alpaca-cleaned (2k rows):** A subset of the standard Alpaca dataset used to teach the model how to answer basic queries, roleplay, and recognize when *not* to use complex math.
|
| 46 |
-
|
| 47 |
-
### Training Configuration
|
| 48 |
-
The fine-tuning process was executed via Supervised Fine-Tuning (SFT) targeting the attention mechanisms.
|
| 49 |
-
* **Target Modules:** `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
|
| 50 |
-
* **LoRA Rank (r):** 16
|
| 51 |
-
* **LoRA Alpha:** 16
|
| 52 |
-
* **Optimizer:** `adamw_8bit`
|
| 53 |
-
* **Learning Rate:** 2e-4
|
| 54 |
-
* **Effective Batch Size:** 8 (Batch size 2 with 4 Gradient Accumulation steps)
|
| 55 |
-
|
| 56 |
-
## π‘οΈ Safety & Alignment
|
| 57 |
-
|
| 58 |
-
Despite being fine-tuned on unfiltered mathematical and conversational data, **this model retains its original safety alignment**. Because only 1-2% of the parameters were updated via LoRA (and later merged), the base Qwen2.5 weights responsible for safety remain fully intact.
|
| 59 |
-
|
| 60 |
-
* **NSFW/18+ Prompts:** The model will actively refuse to generate explicit, illegal, or harmful content, relying on the RLHF and DPO safety guardrails instilled during its original pre-training phase.
|
| 61 |
-
|
| 62 |
-
## π» Usage & Inference
|
| 63 |
-
|
| 64 |
-
The model is highly responsive to a strict Instruction/Response template. For best results, use a `repetition_penalty` of roughly 1.15 to prevent the model from infinitely looping through math steps on simpler problems.
|
| 65 |
-
|
| 66 |
-
### Installation
|
| 67 |
```bash
|
| 68 |
-
pip install unsloth transformers accelerate
|
| 69 |
-
|
| 70 |
```
|
| 71 |
-
|
| 72 |
```python
|
| 73 |
from unsloth import FastLanguageModel
|
| 74 |
import torch
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
repo_name = "Phase-Technologies/qwen2.5-math-1.5b-generalized-merged"
|
| 78 |
max_seq_length = 2048
|
| 79 |
-
|
| 80 |
-
# 2. Load the fully merged model
|
| 81 |
-
# Loading in 4-bit is highly recommended for low-VRAM GPUs (like T4)
|
| 82 |
model, tokenizer = FastLanguageModel.from_pretrained(
|
| 83 |
model_name = repo_name,
|
| 84 |
max_seq_length = max_seq_length,
|
| 85 |
dtype = None,
|
| 86 |
load_in_4bit = True,
|
| 87 |
)
|
| 88 |
-
|
| 89 |
-
# 3. Switch to highly optimized inference mode
|
| 90 |
FastLanguageModel.for_inference(model)
|
| 91 |
-
|
| 92 |
-
# 4. Define the universal prompt template
|
| 93 |
universal_prompt = """### Instruction:
|
| 94 |
{}
|
| 95 |
-
|
| 96 |
### Response:
|
| 97 |
{}"""
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
query = "Provide a step-by-step proof finding the eigenvalues of the matrix [[2, 1], [1, 2]]."
|
| 101 |
-
|
| 102 |
inputs = tokenizer(
|
| 103 |
[universal_prompt.format(query, "")],
|
| 104 |
return_tensors = "pt"
|
| 105 |
).to("cuda")
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
# 6. Generate the output
|
| 110 |
outputs = model.generate(
|
| 111 |
**inputs,
|
| 112 |
max_new_tokens = 1024,
|
| 113 |
-
max_length = None,
|
| 114 |
use_cache = True,
|
| 115 |
-
repetition_penalty = 1.15, # Critical: prevents
|
| 116 |
pad_token_id = tokenizer.eos_token_id
|
| 117 |
)
|
| 118 |
-
|
| 119 |
-
# 7. Decode and print the result
|
| 120 |
response = tokenizer.batch_decode(outputs, skip_special_tokens = True)[0]
|
| 121 |
print(f"\n{'='*50}\nOutput:\n{'='*50}")
|
| 122 |
print(response.split("### Response:\n")[-1])
|
| 123 |
-
|
| 124 |
-
```
|
| 125 |
-
## π Limitations & Biases
|
| 126 |
-
* **Language:** The model is optimized exclusively for English.
|
| 127 |
-
* **Arithmetic Hallucinations:** While highly capable of symbolic logic and structured proofs, 1.5B parameter models can occasionally suffer from minor arithmetic errors (e.g., simple subtraction mistakes) deep within long proofs.
|
| 128 |
-
* **Prompt Sensitivity:** The model performs best when math queries explicitly ask for a "proof" or "step-by-step" breakdown in the instruction block.
|
| 129 |
-
## π€ Acknowledgements
|
| 130 |
-
* **Alibaba Cloud:** For the phenomenal base Qwen2.5-Math architecture.
|
| 131 |
-
* **Unsloth AI:** For the Triton-optimized training kernels that made compiling this model possible on consumer hardware.
|
| 132 |
-
* **Xerv-AI:** For the GRAD synthetic dataset powering the advanced reasoning capabilities.
|
| 133 |
-
```
|
| 134 |
-
|
| 135 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
- custom-finetune
|
| 15 |
- lor-merged
|
| 16 |
base_model: unsloth/Qwen2.5-Math-1.5B
|
| 17 |
+
datasets:
|
| 18 |
+
- Xerv-AI/GRAD
|
| 19 |
---
|
| 20 |
|
| 21 |
+
## π Xerv-AI/Ada: The Multi-Modal Mathematical Generalist SLM
|
| 22 |
+
**Ada** is an ultra-lightweight, high-speed, and highly optimized reasoning Small Language Model (SLM) derived from the powerful **Qwen2.5-Math-1.5B** architecture. Engineered specifically to bridge the gap between hyper-specialized graduate-level mathematical proofs and standard conversational utility, Ada solves the notorious "catastrophic forgetting" problem often found in math-heavy fine-tunes.
|
| 23 |
+
Whether you need a step-by-step calculus breakdown, a topological proof in LaTeX, or just a simple conversational assistant for daily tasks, Ada delivers state-of-the-art performance for a 1.5 Billion parameter model.
|
| 24 |
+
|
| 25 |
+
### π Model Overview
|
| 26 |
+
Standard math-specific LLMs frequently suffer from domain overfitting. When prompted with basic conversational queries, they either hallucinate lengthy pseudo-proofs or fail entirely to understand the user's intent. **Xerv-AI/Ada** was meticulously engineered to resolve this by utilizing a carefully balanced, dual-distribution training dataset, allowing it to act as both a rigorous STEM assistant and a general-purpose chat model.
|
| 27 |
+
|
| 28 |
+
| Specification | Details |
|
| 29 |
+
| :--- | :--- |
|
| 30 |
+
| **Model Name** | Xerv-AI/Ada |
|
| 31 |
+
| **Base Architecture** | unsloth/Qwen2.5-Math-1.5B |
|
| 32 |
+
| **Parameter Count** | 1.5 Billion |
|
| 33 |
+
| **Primary Capabilities** | Graduate-level STEM reasoning, logical deduction, and mathematical proofs. |
|
| 34 |
+
| **Secondary Capabilities** | General conversational instruction-following, roleplay, and basic coding. |
|
| 35 |
+
| **Training Framework** | QLoRA via Unsloth (Triton kernels). |
|
| 36 |
+
| **Precision** | Merged 16-bit (Fine-tuned in 4-bit). |
|
| 37 |
+
| **License** | Apache-2.0 | <br> ### π¬ Core Capabilities & Strengths <br> * **Balanced Generalization:** Ada seamlessly transitions between casual conversation and intense analytical problem-solving without format-forced hallucinations. <br> * **Advanced STEM Reasoning:** Fully optimized to generate detailed, multi-step logical proofs in advanced algebra, calculus, topology, and physics. <br> * **Hardware Optimized for Edge Deployment:** Designed to run at maximum inference throughput on low-VRAM consumer hardware (such as a single 16GB NVIDIA T4 GPU, Mac M-series chips, or edge devices) using 4-bit quantization. <br> * **Impeccable Formatting:** Native understanding of structural formatting, easily outputting highly readable markdown and structured logic steps. <br> ### π Architecture & Training Methodology <br> Ada was trained using Supervised Fine-Tuning (SFT) targeting the attention mechanisms of the base model. Utilizing **Unsloth** on a standard Google Colab NVIDIA T4 GPU, the training leveraged Low-Rank Adaptation (LoRA) to maximize efficiency before being merged into a standalone 16-bit Hugging Face model. <br> * **Target Modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj <br> * **LoRA Rank (r):** 16 <br> * **LoRA Alpha:** 16 <br> * **Optimizer:** adamw_8bit <br> * **Learning Rate:** 2e-4 <br> * **Effective Batch Size:** 8 (Batch size 2 with 4 Gradient Accumulation steps) <br> ### π The Dataset: Dual-Distribution Blending <br> To achieve generalization and prevent catastrophic forgetting, Ada was fine-tuned on a strict 50/50 blend of two distinct datasets, batched and streamed via high-throughput Parquet files:
|
| 38 |
+
| Dataset | Sample Size | Description & Purpose |
|
| 39 |
+
| :--- | :--- | :--- |
|
| 40 |
+
| **Xerv-AI/GRAD** | ~1.93k rows | A proprietary synthetic dataset containing exceptionally long (average 8,000 characters) graduate and research-level mathematical proofs. This instills deep reasoning and strict formatting. |
|
| 41 |
+
| **yahma/alpaca-cleaned** | ~2.00k rows | A refined subset of the standard Alpaca dataset. This teaches the model conversational flow, roleplay, basic Q&A, and crucially, *when not to use complex math*. |
|
| 42 |
+
|
| 43 |
+
### π» Usage & Python Inference Guide
|
| 44 |
+
The model is highly responsive to the standard **Alpaca Instruction/Response template**.
|
| 45 |
+
**Important Inference Note:** For best results, use a repetition_penalty of roughly **1.15**. This acts as a crucial guardrail to prevent the model from infinitely looping through mathematical steps on overly simple arithmetic queries.
|
| 46 |
+
**1. Installation Requirements**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
```bash
|
| 48 |
+
pip install unsloth transformers accelerate torch
|
|
|
|
| 49 |
```
|
| 50 |
+
**2. Fast Inference Script**
|
| 51 |
```python
|
| 52 |
from unsloth import FastLanguageModel
|
| 53 |
import torch
|
| 54 |
+
# Configuration
|
| 55 |
+
repo_name = "Xerv-AI/Ada"
|
|
|
|
| 56 |
max_seq_length = 2048
|
| 57 |
+
# Load the model and tokenizer (4-bit recommended for low-VRAM)
|
|
|
|
|
|
|
| 58 |
model, tokenizer = FastLanguageModel.from_pretrained(
|
| 59 |
model_name = repo_name,
|
| 60 |
max_seq_length = max_seq_length,
|
| 61 |
dtype = None,
|
| 62 |
load_in_4bit = True,
|
| 63 |
)
|
| 64 |
+
# Enable optimized inference mode
|
|
|
|
| 65 |
FastLanguageModel.for_inference(model)
|
| 66 |
+
# Define the universal prompt template
|
|
|
|
| 67 |
universal_prompt = """### Instruction:
|
| 68 |
{}
|
|
|
|
| 69 |
### Response:
|
| 70 |
{}"""
|
| 71 |
+
# Prepare your query
|
| 72 |
+
query = "Provide a step-by-step logical proof finding the eigenvalues of the matrix [[2, 1], [1, 2]]."
|
|
|
|
|
|
|
| 73 |
inputs = tokenizer(
|
| 74 |
[universal_prompt.format(query, "")],
|
| 75 |
return_tensors = "pt"
|
| 76 |
).to("cuda")
|
| 77 |
+
print("Generating analytical response...")
|
| 78 |
+
# Generate the output
|
|
|
|
|
|
|
| 79 |
outputs = model.generate(
|
| 80 |
**inputs,
|
| 81 |
max_new_tokens = 1024,
|
| 82 |
+
max_length = None,
|
| 83 |
use_cache = True,
|
| 84 |
+
repetition_penalty = 1.15, # Critical: prevents generation loops
|
| 85 |
pad_token_id = tokenizer.eos_token_id
|
| 86 |
)
|
| 87 |
+
# Decode and print the result
|
|
|
|
| 88 |
response = tokenizer.batch_decode(outputs, skip_special_tokens = True)[0]
|
| 89 |
print(f"\n{'='*50}\nOutput:\n{'='*50}")
|
| 90 |
print(response.split("### Response:\n")[-1])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
```
|
| 92 |
+
### π‘οΈ Safety & Alignment Guardrails
|
| 93 |
+
Despite being fine-tuned on raw mathematical logic and conversational instruction data, Ada successfully retains its foundational safety alignments. Because only 1% to 2% of the parameters were actively updated via LoRA (and subsequently merged), the original base Qwen2.5 weights responsible for safety remain fully intact.
|
| 94 |
+
* **Content Moderation:** The model actively refuses to generate explicit, illegal, or harmful content, relying on the RLHF and DPO safety guardrails instilled during Alibaba's original pre-training phase.
|
| 95 |
+
### β οΈ Limitations & Known Biases
|
| 96 |
+
While Ada punches well above its 1.5B weight class, it is important to acknowledge the limitations inherent to Small Language Models:
|
| 97 |
+
* **Arithmetic Hallucinations:** Ada is exceptionally capable at symbolic logic, structural breakdowns, and mathematical theory. However, like many SLMs, it can occasionally suffer from minor arithmetic errors (e.g., basic addition/subtraction mistakes) deep within multi-page proofs. Always verify raw calculations.
|
| 98 |
+
* **Language Constraint:** The model is optimized exclusively for **English** text and standard mathematical notation.
|
| 99 |
+
* **Prompt Sensitivity:** Ada performs at its absolute peak when math queries explicitly ask for a "proof," "step-by-step breakdown," or "logical analysis" within the instruction block.
|
| 100 |
+
* **World Knowledge:** It lacks the broad, encyclopedic trivia knowledge found in massive 70B+ parameter models.
|
| 101 |
+
### π€ Acknowledgements
|
| 102 |
+
* **Alibaba Cloud:** For the phenomenal, state-of-the-art base Qwen2.5-Math architecture.
|
| 103 |
+
* **Unsloth AI:** For the Triton-optimized training kernels that made compiling and fine-tuning this model possible and highly efficient on consumer hardware.
|
| 104 |
+
* **Xerv-AI:** For the curation of the GRAD synthetic dataset powering the advanced reasoning capabilities.
|