File size: 8,277 Bytes
078b042
3f8cb8b
bdbd5ae
 
 
3f8cb8b
 
bdbd5ae
 
 
3f8cb8b
bdbd5ae
 
 
 
 
 
 
 
 
 
 
aa390ba
bdbd5ae
 
 
e8f46bd
bdbd5ae
 
 
 
078b042
bdbd5ae
 
 
 
58162aa
bdbd5ae
 
 
 
 
 
58162aa
 
bdbd5ae
 
58162aa
bdbd5ae
58162aa
 
 
bdbd5ae
 
 
58162aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bdbd5ae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3f8cb8b
 
 
1b90051
bdbd5ae
 
 
 
 
3f8cb8b
bdbd5ae
 
078b042
bdbd5ae
 
 
 
 
 
 
 
3f8cb8b
bdbd5ae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e8f46bd
bdbd5ae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217

---
language:
- en
license: apache-2.0
base_model: unsloth/Qwen2.5-Math-1.5B-Instruct-bnb-4bit
tags:
- stem
- mathematics
- physics
- unsloth
- qwen2.5-math
- reasoning
- stss-framework
- logic
- analytical
- science
- meta-aggregation
- 4bit
- merged-f16
library_name: transformers
datasets:
- Xerv-AI/TART
metrics:
- accuracy
- math_verify
model_creator: Xerv-AI
model_name: MAXWELL
pipeline_tag: text-generation

# Leaderboard & Benchmark Specifications
model-index:
- name: MAXWELL (Qwen2.5-Math-1.5B-Instruct-STSS)
  results:
  - task:
      type: text-generation
      name: Grade School Mathematics
    dataset:
      name: GSM8K
      type: gsm8k
      split: test
    metrics:
    - type: accuracy
      value: 70.0
      name: Exact Match (Zero-Shot)
  - task:
      type: text-generation
      name: Competition Mathematics
    dataset:
      name: MATH-Hard
      type: lighteval/MATH-Hard
      config: default
      split: test
    metrics:
    - type: accuracy
      value: 60.0
      name: Exact Match (Boxed)
  - task:
      type: text-generation
      name: Professional Knowledge
    dataset:
      name: MMLU-Pro
      type: TIGER-Lab/MMLU-Pro
      config: default
      split: test
    metrics:
    - type: accuracy
      value: 45.0
      name: Multiple Choice Accuracy
  - task:
      type: text-generation
      name: Invitational Math
    dataset:
      name: AIME 2026
      type: MathArena/aime_2026
      split: train
    metrics:
    - type: accuracy
      value: 10.0
      name: Accuracy
  - task:
      type: text-generation
      name: Advanced Graduate Reasoning
    dataset:
      name: Humanity's Last Exam
      type: cais/hle
      config: default
      split: test
    metrics:
    - type: accuracy
      value: 0.0
      name: Exact String Match
      
# Technical Architecture Settings
model_type: qwen2
quantization: 4-bit (bitsandbytes)
merged_format: fp16
inference_framework: 
  name: STSS (Systematic Temperature-Sweep Synthesis)
  phases:
    - generation_sweep: [0.1, 0.3, 0.5, 0.7, 0.9]
    - aggregation_method: neural_synthesis
    - logic_anchor: triboelectric_induction_verification
max_position_embeddings: 4096
rope_scaling:
  type: linear
  factor: 2.0

# Deployment Hardware
hardware_specification:
  gpu: Tesla T4
  vram: 16GB
  optimization: Unsloth-Fast-Inference
  
---


# MAXWELL: Model Card
This document provides the technical specifications, training methodologies, and inference architecture for the MAXWELL model. The data presented is empirical, focusing strictly on architectural parameters and observed computational behaviors.
## 1. Model Details
### 1.1 Overview
MAXWELL is a fine-tuned, specialized variant of the Qwen2.5-Math-1.5B-Instruct architecture. It is optimized for high-precision analytical reasoning, mathematical computation, and physics problem-solving. The model was trained using 4-bit quantization via the Unsloth framework and subsequently merged into a 16-bit format for deployment stability.
### 1.2 Core Specifications

| Specification | Value |
| :--- | :--- |
| **Developer** | Xerv-AI |
| **Model Name** | MAXWELL |
| **Base Architecture** | Qwen2.5-Math-1.5B-Instruct |
| **Parameter Count** | ~1.5 Billion |
| **Training Precision** | 4-bit (BitsAndBytes) |
| **Deployment Precision** | Merged FP16 (merged_16bit) |
| **Max Context Length** | 4096 Tokens (via RoPE Scaling) |
| **Training Iterations** | 6500 Checkpoints |
| **Hardware Used** | Dual Tesla T4 GPUs (16GB VRAM each) |

## 2. Inference Architecture: STSS
MAXWELL is uniquely designed to operate within a custom inference framework defined as **Systematic Temperature-Sweep Synthesis (STSS)**. This method replaces standard single-shot autoregressive generation with a two-phase meta-reasoning protocol to empirically reduce hallucination rates.
### 2.1 Phase I: Spectrum Generation
Instead of sampling at a fixed temperature, the framework forces the model to generate a set of candidate responses \mathcal{S} across a defined temperature grid G_\tau:
 * **Low Entropy (T \in [0.1, 0.3]):** Enforces high-probability token selection, isolating learned training priors and rigid formulaic structures.
 * **High Entropy (T \in [0.7, 0.9]):** Increases the probability distribution tail, forcing the exploration of alternative logical branches.
### 2.2 Phase II: Neural Aggregation
The model is re-prompted using the entire generated set \mathcal{S} as its context window. It acts as an aggregator function f_{agg} to synthesize the final output R_{final}:
This aggregation is explicitly executed at T=0.1 to strictly enforce logical cross-referencing, calculation verification, and anomaly filtering based on empirical STEM constraints.
## 3. Empirical Performance Observations
Based on inference testing logs, the model exhibits the following data-driven characteristics:
 * **Pattern-Recognition Override:** In cognitive reflection tests (e.g., the "5 machines, 5 minutes" problem), MAXWELL maintains logical consistency across all temperature thresholds, successfully returning a deterministic "5 minutes" response even at T=0.9.
 * **Triboelectric Physics Accuracy:** Requires explicit anchoring prompts during aggregation to override common dataset biases regarding electrostatic charge polarities (e.g., explicitly defining Glass + Silk = Positive).
 * **Zero-Shot Consensus:** When presented with non-complex strings (e.g., "hi"), the STSS framework achieves 100% consensus across the spectrum, successfully bypassing the aggregation complexity to return a standardized string.
## 4. Limitations & Computational Overhead
### 4.1 Token Saturation
Because the STSS framework requires injecting five complete reasoning paths into the Phase II prompt, long-form calculus or multi-step proofs will trigger a context truncation limit. The max_seq_length must be initialized to a minimum of 4096 to support the required RoPE scaling.
### 4.2 Compute Multiplier
Standard LLM inference processes one generation pass. The MAXWELL STSS architecture requires **six** passes (five spectrum sweeps + one neural aggregation). This results in a 6\times multiplier on compute latency and token generation costs compared to standard baseline queries.
## 5. Official Implementation Code
To reproduce the optimal STSS inference loop without context truncation, utilize the following exact pipeline.
```python
from unsloth import FastLanguageModel
from transformers import TextStreamer
import torch
# Configuration
MODEL_NAME = "Xerv-AI/MAXWELL"
MAX_CONTEXT = 4096 
# Load Base
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = MODEL_NAME,
    max_seq_length = MAX_CONTEXT,
    load_in_4bit = True,
)
FastLanguageModel.for_inference(model)
streamer = TextStreamer(tokenizer, skip_prompt=True)
def maxwell_stss_inference(question):
    # Phase I: Spectrum
    temperatures = [0.1, 0.3, 0.5, 0.7, 0.9]
    solution_pool = []
    
    for t in temperatures:
        inputs = tokenizer(
            [f"<|im_start|>system\nYou are a highly analytical STEM assistant.<|im_end|>\n<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant\n"],
            return_tensors = "pt"
        ).to("cuda")
        
        output = model.generate(
            **inputs, 
            max_new_tokens=450, 
            temperature=t, 
            use_cache=True
        )
        decoded = tokenizer.batch_decode(output)[0].split("<|im_start|>assistant\n")[-1].replace("<|im_end|>", "").strip()
        solution_pool.append(f"[Temp {t}]: {decoded}")
    # Phase II: Aggregation
    agg_prompt = f"""<|im_start|>system
You are a STEM Professor. Compare the 5 solutions below.
Even if they all agree, you must:
1. Explain WHY the consensus is correct.
2. Formulate a final, perfect response using LaTeX.
<|im_end|>
<|im_start|>user
PROBLEM: {question}
SOLUTIONS:
{chr(10).join(solution_pool)}
<|im_end|>
<|im_start|>assistant
<reasoning>
Based on the provided candidates, there is a 100% consensus. Here is the final verification:"""
    final_inputs = tokenizer([agg_prompt], return_tensors="pt").to("cuda")
    
    final_output = model.generate(
        **final_inputs, 
        max_new_tokens=1024, 
        temperature=0.1, 
        streamer=streamer,
        use_cache=True
    )
    return "Generation Complete."
```