Instructions to use Xerv-AI/MAXWELL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Xerv-AI/MAXWELL with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Xerv-AI/MAXWELL")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Xerv-AI/MAXWELL")
model = AutoModelForCausalLM.from_pretrained("Xerv-AI/MAXWELL")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Xerv-AI/MAXWELL with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Xerv-AI/MAXWELL"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Xerv-AI/MAXWELL",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Xerv-AI/MAXWELL

SGLang

How to use Xerv-AI/MAXWELL with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Xerv-AI/MAXWELL" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Xerv-AI/MAXWELL",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Xerv-AI/MAXWELL" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Xerv-AI/MAXWELL",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use Xerv-AI/MAXWELL with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Xerv-AI/MAXWELL to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Xerv-AI/MAXWELL to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Xerv-AI/MAXWELL to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Xerv-AI/MAXWELL",
    max_seq_length=2048,
)

Docker Model Runner
How to use Xerv-AI/MAXWELL with Docker Model Runner:
```
docker model run hf.co/Xerv-AI/MAXWELL
```

MAXWELL / README.md

Phase-Technologies

Update README.md

078b042 verified 6 days ago

preview code

raw

history blame contribute delete

8.28 kB


	---
	language:
	- en
	license: apache-2.0
	base_model: unsloth/Qwen2.5-Math-1.5B-Instruct-bnb-4bit
	tags:
	- stem
	- mathematics
	- physics
	- unsloth
	- qwen2.5-math
	- reasoning
	- stss-framework
	- logic
	- analytical
	- science
	- meta-aggregation
	- 4bit
	- merged-f16
	library_name: transformers
	datasets:
	- Xerv-AI/TART
	metrics:
	- accuracy
	- math_verify
	model_creator: Xerv-AI
	model_name: MAXWELL
	pipeline_tag: text-generation

	# Leaderboard & Benchmark Specifications
	model-index:
	- name: MAXWELL (Qwen2.5-Math-1.5B-Instruct-STSS)
	results:
	- task:
	type: text-generation
	name: Grade School Mathematics
	dataset:
	name: GSM8K
	type: gsm8k
	split: test
	metrics:
	- type: accuracy
	value: 70.0
	name: Exact Match (Zero-Shot)
	- task:
	type: text-generation
	name: Competition Mathematics
	dataset:
	name: MATH-Hard
	type: lighteval/MATH-Hard
	config: default
	split: test
	metrics:
	- type: accuracy
	value: 60.0
	name: Exact Match (Boxed)
	- task:
	type: text-generation
	name: Professional Knowledge
	dataset:
	name: MMLU-Pro
	type: TIGER-Lab/MMLU-Pro
	config: default
	split: test
	metrics:
	- type: accuracy
	value: 45.0
	name: Multiple Choice Accuracy
	- task:
	type: text-generation
	name: Invitational Math
	dataset:
	name: AIME 2026
	type: MathArena/aime_2026
	split: train
	metrics:
	- type: accuracy
	value: 10.0
	name: Accuracy
	- task:
	type: text-generation
	name: Advanced Graduate Reasoning
	dataset:
	name: Humanity's Last Exam
	type: cais/hle
	config: default
	split: test
	metrics:
	- type: accuracy
	value: 0.0
	name: Exact String Match

	# Technical Architecture Settings
	model_type: qwen2
	quantization: 4-bit (bitsandbytes)
	merged_format: fp16
	inference_framework:
	name: STSS (Systematic Temperature-Sweep Synthesis)
	phases:
	- generation_sweep: [0.1, 0.3, 0.5, 0.7, 0.9]
	- aggregation_method: neural_synthesis
	- logic_anchor: triboelectric_induction_verification
	max_position_embeddings: 4096
	rope_scaling:
	type: linear
	factor: 2.0

	# Deployment Hardware
	hardware_specification:
	gpu: Tesla T4
	vram: 16GB
	optimization: Unsloth-Fast-Inference

	---


	# MAXWELL: Model Card
	This document provides the technical specifications, training methodologies, and inference architecture for the MAXWELL model. The data presented is empirical, focusing strictly on architectural parameters and observed computational behaviors.
	## 1. Model Details
	### 1.1 Overview
	MAXWELL is a fine-tuned, specialized variant of the Qwen2.5-Math-1.5B-Instruct architecture. It is optimized for high-precision analytical reasoning, mathematical computation, and physics problem-solving. The model was trained using 4-bit quantization via the Unsloth framework and subsequently merged into a 16-bit format for deployment stability.
	### 1.2 Core Specifications

	\| Specification \| Value \|
	\| :--- \| :--- \|
	\| Developer \| Xerv-AI \|
	\| Model Name \| MAXWELL \|
	\| Base Architecture \| Qwen2.5-Math-1.5B-Instruct \|
	\| Parameter Count \| ~1.5 Billion \|
	\| Training Precision \| 4-bit (BitsAndBytes) \|
	\| Deployment Precision \| Merged FP16 (merged_16bit) \|
	\| Max Context Length \| 4096 Tokens (via RoPE Scaling) \|
	\| Training Iterations \| 6500 Checkpoints \|
	\| Hardware Used \| Dual Tesla T4 GPUs (16GB VRAM each) \|

	## 2. Inference Architecture: STSS
	MAXWELL is uniquely designed to operate within a custom inference framework defined as Systematic Temperature-Sweep Synthesis (STSS). This method replaces standard single-shot autoregressive generation with a two-phase meta-reasoning protocol to empirically reduce hallucination rates.
	### 2.1 Phase I: Spectrum Generation
	Instead of sampling at a fixed temperature, the framework forces the model to generate a set of candidate responses \mathcal{S} across a defined temperature grid G_\tau:
	* Low Entropy (T \in [0.1, 0.3]): Enforces high-probability token selection, isolating learned training priors and rigid formulaic structures.
	* High Entropy (T \in [0.7, 0.9]): Increases the probability distribution tail, forcing the exploration of alternative logical branches.
	### 2.2 Phase II: Neural Aggregation
	The model is re-prompted using the entire generated set \mathcal{S} as its context window. It acts as an aggregator function f_{agg} to synthesize the final output R_{final}:
	This aggregation is explicitly executed at T=0.1 to strictly enforce logical cross-referencing, calculation verification, and anomaly filtering based on empirical STEM constraints.
	## 3. Empirical Performance Observations
	Based on inference testing logs, the model exhibits the following data-driven characteristics:
	* Pattern-Recognition Override: In cognitive reflection tests (e.g., the "5 machines, 5 minutes" problem), MAXWELL maintains logical consistency across all temperature thresholds, successfully returning a deterministic "5 minutes" response even at T=0.9.
	* Triboelectric Physics Accuracy: Requires explicit anchoring prompts during aggregation to override common dataset biases regarding electrostatic charge polarities (e.g., explicitly defining Glass + Silk = Positive).
	* Zero-Shot Consensus: When presented with non-complex strings (e.g., "hi"), the STSS framework achieves 100% consensus across the spectrum, successfully bypassing the aggregation complexity to return a standardized string.
	## 4. Limitations & Computational Overhead
	### 4.1 Token Saturation
	Because the STSS framework requires injecting five complete reasoning paths into the Phase II prompt, long-form calculus or multi-step proofs will trigger a context truncation limit. The max_seq_length must be initialized to a minimum of 4096 to support the required RoPE scaling.
	### 4.2 Compute Multiplier
	Standard LLM inference processes one generation pass. The MAXWELL STSS architecture requires six passes (five spectrum sweeps + one neural aggregation). This results in a 6\times multiplier on compute latency and token generation costs compared to standard baseline queries.
	## 5. Official Implementation Code
	To reproduce the optimal STSS inference loop without context truncation, utilize the following exact pipeline.
	```python
	from unsloth import FastLanguageModel
	from transformers import TextStreamer
	import torch
	# Configuration
	MODEL_NAME = "Xerv-AI/MAXWELL"
	MAX_CONTEXT = 4096
	# Load Base
	model, tokenizer = FastLanguageModel.from_pretrained(
	model_name = MODEL_NAME,
	max_seq_length = MAX_CONTEXT,
	load_in_4bit = True,
	)
	FastLanguageModel.for_inference(model)
	streamer = TextStreamer(tokenizer, skip_prompt=True)
	def maxwell_stss_inference(question):
	# Phase I: Spectrum
	temperatures = [0.1, 0.3, 0.5, 0.7, 0.9]
	solution_pool = []

	for t in temperatures:
	inputs = tokenizer(
	[f"<\|im_start\|>system\nYou are a highly analytical STEM assistant.<\|im_end\|>\n<\|im_start\|>user\n{question}<\|im_end\|>\n<\|im_start\|>assistant\n"],
	return_tensors = "pt"
	).to("cuda")

	output = model.generate(
	**inputs,
	max_new_tokens=450,
	temperature=t,
	use_cache=True
	)
	decoded = tokenizer.batch_decode(output)[0].split("<\|im_start\|>assistant\n")[-1].replace("<\|im_end\|>", "").strip()
	solution_pool.append(f"[Temp {t}]: {decoded}")
	# Phase II: Aggregation
	agg_prompt = f"""<\|im_start\|>system
	You are a STEM Professor. Compare the 5 solutions below.
	Even if they all agree, you must:
	1. Explain WHY the consensus is correct.
	2. Formulate a final, perfect response using LaTeX.
	<\|im_end\|>
	<\|im_start\|>user
	PROBLEM: {question}
	SOLUTIONS:
	{chr(10).join(solution_pool)}
	<\|im_end\|>
	<\|im_start\|>assistant
	<reasoning>
	Based on the provided candidates, there is a 100% consensus. Here is the final verification:"""
	final_inputs = tokenizer([agg_prompt], return_tensors="pt").to("cuda")

	final_output = model.generate(
	**final_inputs,
	max_new_tokens=1024,
	temperature=0.1,
	streamer=streamer,
	use_cache=True
	)
	return "Generation Complete."
	```