|
|
| --- |
| language: |
| - en |
| license: apache-2.0 |
| base_model: unsloth/Qwen2.5-Math-1.5B-Instruct-bnb-4bit |
| tags: |
| - stem |
| - mathematics |
| - physics |
| - unsloth |
| - qwen2.5-math |
| - reasoning |
| - stss-framework |
| - logic |
| - analytical |
| - science |
| - meta-aggregation |
| - 4bit |
| - merged-f16 |
| library_name: transformers |
| datasets: |
| - Xerv-AI/TART |
| metrics: |
| - accuracy |
| - math_verify |
| model_creator: Xerv-AI |
| model_name: MAXWELL |
| pipeline_tag: text-generation |
|
|
| |
| model-index: |
| - name: MAXWELL (Qwen2.5-Math-1.5B-Instruct-STSS) |
| results: |
| - task: |
| type: text-generation |
| name: Grade School Mathematics |
| dataset: |
| name: GSM8K |
| type: gsm8k |
| split: test |
| metrics: |
| - type: accuracy |
| value: 70.0 |
| name: Exact Match (Zero-Shot) |
| - task: |
| type: text-generation |
| name: Competition Mathematics |
| dataset: |
| name: MATH-Hard |
| type: lighteval/MATH-Hard |
| config: default |
| split: test |
| metrics: |
| - type: accuracy |
| value: 60.0 |
| name: Exact Match (Boxed) |
| - task: |
| type: text-generation |
| name: Professional Knowledge |
| dataset: |
| name: MMLU-Pro |
| type: TIGER-Lab/MMLU-Pro |
| config: default |
| split: test |
| metrics: |
| - type: accuracy |
| value: 45.0 |
| name: Multiple Choice Accuracy |
| - task: |
| type: text-generation |
| name: Invitational Math |
| dataset: |
| name: AIME 2026 |
| type: MathArena/aime_2026 |
| split: train |
| metrics: |
| - type: accuracy |
| value: 10.0 |
| name: Accuracy |
| - task: |
| type: text-generation |
| name: Advanced Graduate Reasoning |
| dataset: |
| name: Humanity's Last Exam |
| type: cais/hle |
| config: default |
| split: test |
| metrics: |
| - type: accuracy |
| value: 0.0 |
| name: Exact String Match |
| |
| |
| model_type: qwen2 |
| quantization: 4-bit (bitsandbytes) |
| merged_format: fp16 |
| inference_framework: |
| name: STSS (Systematic Temperature-Sweep Synthesis) |
| phases: |
| - generation_sweep: [0.1, 0.3, 0.5, 0.7, 0.9] |
| - aggregation_method: neural_synthesis |
| - logic_anchor: triboelectric_induction_verification |
| max_position_embeddings: 4096 |
| rope_scaling: |
| type: linear |
| factor: 2.0 |
|
|
| |
| hardware_specification: |
| gpu: Tesla T4 |
| vram: 16GB |
| optimization: Unsloth-Fast-Inference |
| |
| --- |
| |
|
|
| # MAXWELL: Model Card |
| This document provides the technical specifications, training methodologies, and inference architecture for the MAXWELL model. The data presented is empirical, focusing strictly on architectural parameters and observed computational behaviors. |
| ## 1. Model Details |
| ### 1.1 Overview |
| MAXWELL is a fine-tuned, specialized variant of the Qwen2.5-Math-1.5B-Instruct architecture. It is optimized for high-precision analytical reasoning, mathematical computation, and physics problem-solving. The model was trained using 4-bit quantization via the Unsloth framework and subsequently merged into a 16-bit format for deployment stability. |
| ### 1.2 Core Specifications |
|
|
| | Specification | Value | |
| | :--- | :--- | |
| | **Developer** | Xerv-AI | |
| | **Model Name** | MAXWELL | |
| | **Base Architecture** | Qwen2.5-Math-1.5B-Instruct | |
| | **Parameter Count** | ~1.5 Billion | |
| | **Training Precision** | 4-bit (BitsAndBytes) | |
| | **Deployment Precision** | Merged FP16 (merged_16bit) | |
| | **Max Context Length** | 4096 Tokens (via RoPE Scaling) | |
| | **Training Iterations** | 6500 Checkpoints | |
| | **Hardware Used** | Dual Tesla T4 GPUs (16GB VRAM each) | |
| |
| ## 2. Inference Architecture: STSS |
| MAXWELL is uniquely designed to operate within a custom inference framework defined as **Systematic Temperature-Sweep Synthesis (STSS)**. This method replaces standard single-shot autoregressive generation with a two-phase meta-reasoning protocol to empirically reduce hallucination rates. |
| ### 2.1 Phase I: Spectrum Generation |
| Instead of sampling at a fixed temperature, the framework forces the model to generate a set of candidate responses \mathcal{S} across a defined temperature grid G_\tau: |
| * **Low Entropy (T \in [0.1, 0.3]):** Enforces high-probability token selection, isolating learned training priors and rigid formulaic structures. |
| * **High Entropy (T \in [0.7, 0.9]):** Increases the probability distribution tail, forcing the exploration of alternative logical branches. |
| ### 2.2 Phase II: Neural Aggregation |
| The model is re-prompted using the entire generated set \mathcal{S} as its context window. It acts as an aggregator function f_{agg} to synthesize the final output R_{final}: |
| This aggregation is explicitly executed at T=0.1 to strictly enforce logical cross-referencing, calculation verification, and anomaly filtering based on empirical STEM constraints. |
| ## 3. Empirical Performance Observations |
| Based on inference testing logs, the model exhibits the following data-driven characteristics: |
| * **Pattern-Recognition Override:** In cognitive reflection tests (e.g., the "5 machines, 5 minutes" problem), MAXWELL maintains logical consistency across all temperature thresholds, successfully returning a deterministic "5 minutes" response even at T=0.9. |
| * **Triboelectric Physics Accuracy:** Requires explicit anchoring prompts during aggregation to override common dataset biases regarding electrostatic charge polarities (e.g., explicitly defining Glass + Silk = Positive). |
| * **Zero-Shot Consensus:** When presented with non-complex strings (e.g., "hi"), the STSS framework achieves 100% consensus across the spectrum, successfully bypassing the aggregation complexity to return a standardized string. |
| ## 4. Limitations & Computational Overhead |
| ### 4.1 Token Saturation |
| Because the STSS framework requires injecting five complete reasoning paths into the Phase II prompt, long-form calculus or multi-step proofs will trigger a context truncation limit. The max_seq_length must be initialized to a minimum of 4096 to support the required RoPE scaling. |
| ### 4.2 Compute Multiplier |
| Standard LLM inference processes one generation pass. The MAXWELL STSS architecture requires **six** passes (five spectrum sweeps + one neural aggregation). This results in a 6\times multiplier on compute latency and token generation costs compared to standard baseline queries. |
| ## 5. Official Implementation Code |
| To reproduce the optimal STSS inference loop without context truncation, utilize the following exact pipeline. |
| ```python |
| from unsloth import FastLanguageModel |
| from transformers import TextStreamer |
| import torch |
| # Configuration |
| MODEL_NAME = "Xerv-AI/MAXWELL" |
| MAX_CONTEXT = 4096 |
| # Load Base |
| model, tokenizer = FastLanguageModel.from_pretrained( |
| model_name = MODEL_NAME, |
| max_seq_length = MAX_CONTEXT, |
| load_in_4bit = True, |
| ) |
| FastLanguageModel.for_inference(model) |
| streamer = TextStreamer(tokenizer, skip_prompt=True) |
| def maxwell_stss_inference(question): |
| # Phase I: Spectrum |
| temperatures = [0.1, 0.3, 0.5, 0.7, 0.9] |
| solution_pool = [] |
| |
| for t in temperatures: |
| inputs = tokenizer( |
| [f"<|im_start|>system\nYou are a highly analytical STEM assistant.<|im_end|>\n<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant\n"], |
| return_tensors = "pt" |
| ).to("cuda") |
| |
| output = model.generate( |
| **inputs, |
| max_new_tokens=450, |
| temperature=t, |
| use_cache=True |
| ) |
| decoded = tokenizer.batch_decode(output)[0].split("<|im_start|>assistant\n")[-1].replace("<|im_end|>", "").strip() |
| solution_pool.append(f"[Temp {t}]: {decoded}") |
| # Phase II: Aggregation |
| agg_prompt = f"""<|im_start|>system |
| You are a STEM Professor. Compare the 5 solutions below. |
| Even if they all agree, you must: |
| 1. Explain WHY the consensus is correct. |
| 2. Formulate a final, perfect response using LaTeX. |
| <|im_end|> |
| <|im_start|>user |
| PROBLEM: {question} |
| SOLUTIONS: |
| {chr(10).join(solution_pool)} |
| <|im_end|> |
| <|im_start|>assistant |
| <reasoning> |
| Based on the provided candidates, there is a 100% consensus. Here is the final verification:""" |
| final_inputs = tokenizer([agg_prompt], return_tensors="pt").to("cuda") |
| |
| final_output = model.generate( |
| **final_inputs, |
| max_new_tokens=1024, |
| temperature=0.1, |
| streamer=streamer, |
| use_cache=True |
| ) |
| return "Generation Complete." |
| ``` |