Text Generation
Safetensors
PyTorch
English
qwen2
unsloth
qwen
qwen2.5
math
reasoning
alpaca
custom-finetune
lora-merged
Instructions to use Xerv-AI/Ada with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps Settings
- Unsloth Studio
How to use Xerv-AI/Ada with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Xerv-AI/Ada to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Xerv-AI/Ada to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Xerv-AI/Ada to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Xerv-AI/Ada", max_seq_length=2048, )
File size: 8,416 Bytes
593865f 73a8f60 e89a277 73a8f60 da8557d aa29e57 e89a277 ea6cfbf 593865f e89a277 da8557d 73a8f60 da8557d 73a8f60 da8557d 73a8f60 da8557d 73a8f60 da8557d 73a8f60 da8557d 73a8f60 da8557d 73a8f60 da8557d 73a8f60 da8557d 73a8f60 da8557d 73a8f60 da8557d 73a8f60 da8557d 73a8f60 a6f19ec fd74103 a6f19ec da8557d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | ---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
tags:
- unsloth
- qwen
- qwen2.5
- math
- reasoning
- alpaca
- pytorch
- custom-finetune
- lora-merged
base_model: unsloth/Qwen2.5-Math-1.5B
datasets:
- Xerv-AI/GRAD
- yahma/alpaca-cleaned
inference:
parameters:
repetition_penalty: 1.15
max_new_tokens: 256
temperature: 0.5
examples:
- text: "### Instruction:\nProvide a step-by-step logical proof finding the eigenvalues of the matrix [[2, 1], [1, 2]].\n### Response:\n"
widget:
- example_title: Fibonacci (Python)
messages:
- role: system
content: You are a chatbot who can help code!
- role: user
content: Write me a function to calculate the first 10 digits of the fibonacci sequence in Python and print it out to the CLI.
---
## π Xerv-AI/Ada: The Multi-Modal Mathematical Generalist SLM
**Ada** is an ultra-lightweight, high-speed, and highly optimized reasoning Small Language Model (SLM) derived from the powerful **Qwen2.5-Math-1.5B** architecture. Engineered specifically to bridge the gap between hyper-specialized graduate-level mathematical proofs and standard conversational utility, Ada solves the notorious "catastrophic forgetting" problem often found in math-heavy fine-tunes.
Whether you need a step-by-step calculus breakdown, a topological proof in LaTeX, or just a simple conversational assistant for daily tasks, Ada delivers state-of-the-art performance for a 1.5 Billion parameter model.
### π Model Overview
Standard math-specific LLMs frequently suffer from domain overfitting. When prompted with basic conversational queries, they either hallucinate lengthy pseudo-proofs or fail entirely to understand the user's intent. **Xerv-AI/Ada** was meticulously engineered to resolve this by utilizing a carefully balanced, dual-distribution training dataset, allowing it to act as both a rigorous STEM assistant and a general-purpose chat model.
| Specification | Details |
| :--- | :--- |
| **Model Name** | Xerv-AI/Ada |
| **Base Architecture** | unsloth/Qwen2.5-Math-1.5B |
| **Parameter Count** | 1.5 Billion |
| **Primary Capabilities** | Graduate-level STEM reasoning, logical deduction, and mathematical proofs. |
| **Secondary Capabilities** | General conversational instruction-following, roleplay, and basic coding. |
| **Training Framework** | QLoRA via Unsloth (Triton kernels). |
| **Precision** | Merged 16-bit (Fine-tuned in 4-bit). |
| **License** | Apache-2.0 | <br> ### π¬ Core Capabilities & Strengths <br> * **Balanced Generalization:** Ada seamlessly transitions between casual conversation and intense analytical problem-solving without format-forced hallucinations. <br> * **Advanced STEM Reasoning:** Fully optimized to generate detailed, multi-step logical proofs in advanced algebra, calculus, topology, and physics. <br> * **Hardware Optimized for Edge Deployment:** Designed to run at maximum inference throughput on low-VRAM consumer hardware (such as a single 16GB NVIDIA T4 GPU, Mac M-series chips, or edge devices) using 4-bit quantization. <br> * **Impeccable Formatting:** Native understanding of structural formatting, easily outputting highly readable markdown and structured logic steps. <br> ### π Architecture & Training Methodology <br> Ada was trained using Supervised Fine-Tuning (SFT) targeting the attention mechanisms of the base model. Utilizing **Unsloth** on a standard Google Colab NVIDIA T4 GPU, the training leveraged Low-Rank Adaptation (LoRA) to maximize efficiency before being merged into a standalone 16-bit Hugging Face model. <br> * **Target Modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj <br> * **LoRA Rank (r):** 16 <br> * **LoRA Alpha:** 16 <br> * **Optimizer:** adamw_8bit <br> * **Learning Rate:** 2e-4 <br> * **Effective Batch Size:** 8 (Batch size 2 with 4 Gradient Accumulation steps) <br> ### π The Dataset: Dual-Distribution Blending <br> To achieve generalization and prevent catastrophic forgetting, Ada was fine-tuned on a strict 50/50 blend of two distinct datasets, batched and streamed via high-throughput Parquet files:
| Dataset | Sample Size | Description & Purpose |
| :--- | :--- | :--- |
| **Xerv-AI/GRAD** | ~1.93k rows | A proprietary synthetic dataset containing exceptionally long (average 8,000 characters) graduate and research-level mathematical proofs. This instills deep reasoning and strict formatting. |
| **yahma/alpaca-cleaned** | ~2.00k rows | A refined subset of the standard Alpaca dataset. This teaches the model conversational flow, roleplay, basic Q&A, and crucially, *when not to use complex math*. |
### π» Usage & Python Inference Guide
The model is highly responsive to the standard **Alpaca Instruction/Response template**.
**Important Inference Note:** For best results, use a repetition_penalty of roughly **1.15**. This acts as a crucial guardrail to prevent the model from infinitely looping through mathematical steps on overly simple arithmetic queries.
**1. Installation Requirements**
```bash
pip install unsloth transformers accelerate torch
```
**2. Fast Inference Script**
```python
from unsloth import FastLanguageModel
import torch
# Configuration
repo_name = "Xerv-AI/Ada"
max_seq_length = 2048
# Load the model and tokenizer (4-bit recommended for low-VRAM)
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = repo_name,
max_seq_length = max_seq_length,
dtype = None,
load_in_4bit = True,
)
# Enable optimized inference mode
FastLanguageModel.for_inference(model)
# Define the universal prompt template
universal_prompt = """### Instruction:
{}
### Response:
{}"""
# Prepare your query
query = "Provide a step-by-step logical proof finding the eigenvalues of the matrix [[2, 1], [1, 2]]."
inputs = tokenizer(
[universal_prompt.format(query, "")],
return_tensors = "pt"
).to("cuda")
print("Generating analytical response...")
# Generate the output
outputs = model.generate(
**inputs,
max_new_tokens = 1024,
max_length = None,
use_cache = True,
repetition_penalty = 1.15, # Critical: prevents generation loops
pad_token_id = tokenizer.eos_token_id
)
# Decode and print the result
response = tokenizer.batch_decode(outputs, skip_special_tokens = True)[0]
print(f"\n{'='*50}\nOutput:\n{'='*50}")
print(response.split("### Response:\n")[-1])
```
### Performance Summary
| Dataset | Accuracy |
| :--- | :--- |
| **GSM8K** | **40.00%** |
| **MATH** |**60.00%** |
| **MATH-Hard** |**50.00%** |
| **GRAD** |**40.00%** |
### π‘οΈ Safety & Alignment Guardrails
Despite being fine-tuned on raw mathematical logic and conversational instruction data, Ada successfully retains its foundational safety alignments. Because only 1% to 2% of the parameters were actively updated via LoRA (and subsequently merged), the original base Qwen2.5 weights responsible for safety remain fully intact.
* **Content Moderation:** The model actively refuses to generate explicit, illegal, or harmful content, relying on the RLHF and DPO safety guardrails instilled during Alibaba's original pre-training phase.
### β οΈ Limitations & Known Biases
While Ada punches well above its 1.5B weight class, it is important to acknowledge the limitations inherent to Small Language Models:
* **Arithmetic Hallucinations:** Ada is exceptionally capable at symbolic logic, structural breakdowns, and mathematical theory. However, like many SLMs, it can occasionally suffer from minor arithmetic errors (e.g., basic addition/subtraction mistakes) deep within multi-page proofs. Always verify raw calculations.
* **Language Constraint:** The model is optimized exclusively for **English** text and standard mathematical notation.
* **Prompt Sensitivity:** Ada performs at its absolute peak when math queries explicitly ask for a "proof," "step-by-step breakdown," or "logical analysis" within the instruction block.
* **World Knowledge:** It lacks the broad, encyclopedic trivia knowledge found in massive 70B+ parameter models.
### π€ Acknowledgements
* **Alibaba Cloud:** For the phenomenal, state-of-the-art base Qwen2.5-Math architecture.
* **Unsloth AI:** For the Triton-optimized training kernels that made compiling and fine-tuning this model possible and highly efficient on consumer hardware.
* **Xerv-AI:** For the curation of the GRAD synthetic dataset powering the advanced reasoning capabilities. |