Text Generation
Safetensors
PyTorch
English
qwen2
unsloth
qwen
qwen2.5
math
reasoning
alpaca
custom-finetune
lora-merged
Instructions to use Xerv-AI/Ada with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps Settings
- Unsloth Studio
How to use Xerv-AI/Ada with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Xerv-AI/Ada to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Xerv-AI/Ada to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Xerv-AI/Ada to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Xerv-AI/Ada", max_seq_length=2048, )
| license: apache-2.0 | |
| language: | |
| - en | |
| pipeline_tag: text-generation | |
| tags: | |
| - unsloth | |
| - qwen | |
| - qwen2.5 | |
| - math | |
| - reasoning | |
| - alpaca | |
| - pytorch | |
| - custom-finetune | |
| - lora-merged | |
| base_model: unsloth/Qwen2.5-Math-1.5B | |
| datasets: | |
| - Xerv-AI/GRAD | |
| - yahma/alpaca-cleaned | |
| inference: | |
| parameters: | |
| repetition_penalty: 1.15 | |
| max_new_tokens: 256 | |
| temperature: 0.5 | |
| examples: | |
| - text: "### Instruction:\nProvide a step-by-step logical proof finding the eigenvalues of the matrix [[2, 1], [1, 2]].\n### Response:\n" | |
| widget: | |
| - example_title: Fibonacci (Python) | |
| messages: | |
| - role: system | |
| content: You are a chatbot who can help code! | |
| - role: user | |
| content: Write me a function to calculate the first 10 digits of the fibonacci sequence in Python and print it out to the CLI. | |
| ## π Xerv-AI/Ada: The Multi-Modal Mathematical Generalist SLM | |
| **Ada** is an ultra-lightweight, high-speed, and highly optimized reasoning Small Language Model (SLM) derived from the powerful **Qwen2.5-Math-1.5B** architecture. Engineered specifically to bridge the gap between hyper-specialized graduate-level mathematical proofs and standard conversational utility, Ada solves the notorious "catastrophic forgetting" problem often found in math-heavy fine-tunes. | |
| Whether you need a step-by-step calculus breakdown, a topological proof in LaTeX, or just a simple conversational assistant for daily tasks, Ada delivers state-of-the-art performance for a 1.5 Billion parameter model. | |
| ### π Model Overview | |
| Standard math-specific LLMs frequently suffer from domain overfitting. When prompted with basic conversational queries, they either hallucinate lengthy pseudo-proofs or fail entirely to understand the user's intent. **Xerv-AI/Ada** was meticulously engineered to resolve this by utilizing a carefully balanced, dual-distribution training dataset, allowing it to act as both a rigorous STEM assistant and a general-purpose chat model. | |
| | Specification | Details | | |
| | :--- | :--- | | |
| | **Model Name** | Xerv-AI/Ada | | |
| | **Base Architecture** | unsloth/Qwen2.5-Math-1.5B | | |
| | **Parameter Count** | 1.5 Billion | | |
| | **Primary Capabilities** | Graduate-level STEM reasoning, logical deduction, and mathematical proofs. | | |
| | **Secondary Capabilities** | General conversational instruction-following, roleplay, and basic coding. | | |
| | **Training Framework** | QLoRA via Unsloth (Triton kernels). | | |
| | **Precision** | Merged 16-bit (Fine-tuned in 4-bit). | | |
| | **License** | Apache-2.0 | <br> ### π¬ Core Capabilities & Strengths <br> * **Balanced Generalization:** Ada seamlessly transitions between casual conversation and intense analytical problem-solving without format-forced hallucinations. <br> * **Advanced STEM Reasoning:** Fully optimized to generate detailed, multi-step logical proofs in advanced algebra, calculus, topology, and physics. <br> * **Hardware Optimized for Edge Deployment:** Designed to run at maximum inference throughput on low-VRAM consumer hardware (such as a single 16GB NVIDIA T4 GPU, Mac M-series chips, or edge devices) using 4-bit quantization. <br> * **Impeccable Formatting:** Native understanding of structural formatting, easily outputting highly readable markdown and structured logic steps. <br> ### π Architecture & Training Methodology <br> Ada was trained using Supervised Fine-Tuning (SFT) targeting the attention mechanisms of the base model. Utilizing **Unsloth** on a standard Google Colab NVIDIA T4 GPU, the training leveraged Low-Rank Adaptation (LoRA) to maximize efficiency before being merged into a standalone 16-bit Hugging Face model. <br> * **Target Modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj <br> * **LoRA Rank (r):** 16 <br> * **LoRA Alpha:** 16 <br> * **Optimizer:** adamw_8bit <br> * **Learning Rate:** 2e-4 <br> * **Effective Batch Size:** 8 (Batch size 2 with 4 Gradient Accumulation steps) <br> ### π The Dataset: Dual-Distribution Blending <br> To achieve generalization and prevent catastrophic forgetting, Ada was fine-tuned on a strict 50/50 blend of two distinct datasets, batched and streamed via high-throughput Parquet files: | |
| | Dataset | Sample Size | Description & Purpose | | |
| | :--- | :--- | :--- | | |
| | **Xerv-AI/GRAD** | ~1.93k rows | A proprietary synthetic dataset containing exceptionally long (average 8,000 characters) graduate and research-level mathematical proofs. This instills deep reasoning and strict formatting. | | |
| | **yahma/alpaca-cleaned** | ~2.00k rows | A refined subset of the standard Alpaca dataset. This teaches the model conversational flow, roleplay, basic Q&A, and crucially, *when not to use complex math*. | | |
| ### π» Usage & Python Inference Guide | |
| The model is highly responsive to the standard **Alpaca Instruction/Response template**. | |
| **Important Inference Note:** For best results, use a repetition_penalty of roughly **1.15**. This acts as a crucial guardrail to prevent the model from infinitely looping through mathematical steps on overly simple arithmetic queries. | |
| **1. Installation Requirements** | |
| ```bash | |
| pip install unsloth transformers accelerate torch | |
| ``` | |
| **2. Fast Inference Script** | |
| ```python | |
| from unsloth import FastLanguageModel | |
| import torch | |
| # Configuration | |
| repo_name = "Xerv-AI/Ada" | |
| max_seq_length = 2048 | |
| # Load the model and tokenizer (4-bit recommended for low-VRAM) | |
| model, tokenizer = FastLanguageModel.from_pretrained( | |
| model_name = repo_name, | |
| max_seq_length = max_seq_length, | |
| dtype = None, | |
| load_in_4bit = True, | |
| ) | |
| # Enable optimized inference mode | |
| FastLanguageModel.for_inference(model) | |
| # Define the universal prompt template | |
| universal_prompt = """### Instruction: | |
| {} | |
| ### Response: | |
| {}""" | |
| # Prepare your query | |
| query = "Provide a step-by-step logical proof finding the eigenvalues of the matrix [[2, 1], [1, 2]]." | |
| inputs = tokenizer( | |
| [universal_prompt.format(query, "")], | |
| return_tensors = "pt" | |
| ).to("cuda") | |
| print("Generating analytical response...") | |
| # Generate the output | |
| outputs = model.generate( | |
| **inputs, | |
| max_new_tokens = 1024, | |
| max_length = None, | |
| use_cache = True, | |
| repetition_penalty = 1.15, # Critical: prevents generation loops | |
| pad_token_id = tokenizer.eos_token_id | |
| ) | |
| # Decode and print the result | |
| response = tokenizer.batch_decode(outputs, skip_special_tokens = True)[0] | |
| print(f"\n{'='*50}\nOutput:\n{'='*50}") | |
| print(response.split("### Response:\n")[-1]) | |
| ``` | |
| ### Performance Summary | |
| | Dataset | Accuracy | | |
| | :--- | :--- | | |
| | **GSM8K** | **40.00%** | | |
| | **MATH** |**60.00%** | | |
| | **MATH-Hard** |**50.00%** | | |
| | **GRAD** |**40.00%** | | |
| ### π‘οΈ Safety & Alignment Guardrails | |
| Despite being fine-tuned on raw mathematical logic and conversational instruction data, Ada successfully retains its foundational safety alignments. Because only 1% to 2% of the parameters were actively updated via LoRA (and subsequently merged), the original base Qwen2.5 weights responsible for safety remain fully intact. | |
| * **Content Moderation:** The model actively refuses to generate explicit, illegal, or harmful content, relying on the RLHF and DPO safety guardrails instilled during Alibaba's original pre-training phase. | |
| ### β οΈ Limitations & Known Biases | |
| While Ada punches well above its 1.5B weight class, it is important to acknowledge the limitations inherent to Small Language Models: | |
| * **Arithmetic Hallucinations:** Ada is exceptionally capable at symbolic logic, structural breakdowns, and mathematical theory. However, like many SLMs, it can occasionally suffer from minor arithmetic errors (e.g., basic addition/subtraction mistakes) deep within multi-page proofs. Always verify raw calculations. | |
| * **Language Constraint:** The model is optimized exclusively for **English** text and standard mathematical notation. | |
| * **Prompt Sensitivity:** Ada performs at its absolute peak when math queries explicitly ask for a "proof," "step-by-step breakdown," or "logical analysis" within the instruction block. | |
| * **World Knowledge:** It lacks the broad, encyclopedic trivia knowledge found in massive 70B+ parameter models. | |
| ### π€ Acknowledgements | |
| * **Alibaba Cloud:** For the phenomenal, state-of-the-art base Qwen2.5-Math architecture. | |
| * **Unsloth AI:** For the Triton-optimized training kernels that made compiling and fine-tuning this model possible and highly efficient on consumer hardware. | |
| * **Xerv-AI:** For the curation of the GRAD synthetic dataset powering the advanced reasoning capabilities. |