Instructions to use tensorhydra/gpt-oss-20b-numinamath with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use tensorhydra/gpt-oss-20b-numinamath with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="tensorhydra/gpt-oss-20b-numinamath")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tensorhydra/gpt-oss-20b-numinamath")
model = AutoModelForCausalLM.from_pretrained("tensorhydra/gpt-oss-20b-numinamath")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

PEFT
How to use tensorhydra/gpt-oss-20b-numinamath with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use tensorhydra/gpt-oss-20b-numinamath with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "tensorhydra/gpt-oss-20b-numinamath"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tensorhydra/gpt-oss-20b-numinamath",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/tensorhydra/gpt-oss-20b-numinamath

SGLang

How to use tensorhydra/gpt-oss-20b-numinamath with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "tensorhydra/gpt-oss-20b-numinamath" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tensorhydra/gpt-oss-20b-numinamath",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "tensorhydra/gpt-oss-20b-numinamath" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tensorhydra/gpt-oss-20b-numinamath",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use tensorhydra/gpt-oss-20b-numinamath with Docker Model Runner:
```
docker model run hf.co/tensorhydra/gpt-oss-20b-numinamath
```

GPT-OSS-20B NuminaMath

Overview

This repository provides GPT-OSS-20B model fine-tuned on the NuminaMath-TIR dataset which consists of 70k data points to improve mathematical olympiad reasoning and structured problem solving.

The adapters are designed to be used with the base model gpt-oss-20b, a Mixture-of-Experts (MoE) transformer architecture. Fine-tuning focuses on improving the model’s ability to generate step-by-step reasoning, symbolic manipulation, and detailed mathematical explanations when solving math problems.

Instead of updating the full model weights, parameter-efficient fine-tuning (PEFT) was used to modify only a small number of parameters in the attention layers. This allows the adapters to significantly improve reasoning ability while keeping training compute requirements relatively low.

The resulting LoRA adapters can be loaded on top of the base model to enhance its performance on mathematical olympiad reasoning tasks such as algebra, arithmetic, and problem-solving explanations.

Model Details

Field	Value
Base Model	gpt-oss-20b
Architecture	Mixture-of-Experts Transformer
Fine-Tuning Method	LoRA (PEFT)
Precision	BF16
Context Length	8192 tokens
Training Hardware	NVIDIA H100
Framework	PyTorch + Transformers + PEFT

Training Data

Dataset

The model was fine-tuned using the NuminaMath-TIR dataset, which contains mathematical problems paired with structured reasoning traces and final answers.

Dataset link: https://huggingface.co/datasets/AI-MO/NuminaMath-TIR

The dataset includes problems across several mathematical domains including:

arithmetic
algebra
number theory
geometry
calculus
reasoning-based problem solving

The dataset emphasizes step-by-step explanations, allowing the model to learn how to produce reasoning chains rather than only final answers.

Dataset Processing

The dataset was originally provided as a CSV file and processed prior to training.

Processing pipeline:

Loaded using pandas
Columns normalized to:
- prompt
- response
Empty rows removed
Converted to Hugging Face Dataset format
Randomized train/validation split

Dataset split:

Split	Percentage
Train	95%
Validation	5%

Instruction Format

Training samples were converted into the following chat-style instruction format compatible with the GPT-OSS tokenizer.

<|im_start|>user
{prompt}
<|im_end|>
<|im_start|>assistant
{response}
<|im_end|>

This format enables the model to learn structured conversational reasoning and aligns with the instruction format used in many modern LLMs.

Training Procedure

The model was fine-tuned using LoRA adapters applied only to attention layers.

Because gpt-oss-20b is a Mixture-of-Experts (MoE) architecture, LoRA was intentionally not applied to expert layers in order to preserve the routing structure and maintain training stability.

LoRA Target Modules

Adapters were applied to the following projection layers:

q_proj
k_proj
v_proj
o_proj

These correspond to the query, key, value, and output projections within the attention mechanism.

LoRA Configuration

Parameter	Value
Rank (r)	64
Alpha	128
Dropout	0.05
Bias	none

Only attention projections were modified, ensuring minimal disruption to the base model while still enabling meaningful behavioral improvements.

Training Hyperparameters

Parameter	Value
Epochs	2
Learning Rate	2e-4
Optimizer	AdamW (fused)
Adam β1	0.9
Adam β2	0.95
Weight Decay	0.01
Warmup Ratio	0.03
Max Gradient Norm	1.0

Batch configuration:

Parameter	Value
Per Device Batch Size	4
Gradient Accumulation	4
Effective Batch Size	16

Maximum sequence length:

8192 tokens

Training Infrastructure

Training was performed on the following hardware:

1× NVIDIA H100 GPU

Training optimizations included:

Flash Attention 2
BF16 mixed precision
TF32 enabled
Gradient checkpointing
memory-optimized LoRA configuration

MoE compatibility adjustments included:

LoRA applied only to attention layers
CPU offloading disabled
gradient checkpointing configured with use_reentrant=False

Training frameworks used:

PyTorch
Hugging Face Transformers
PEFT
Hugging Face Datasets

Evaluation

Validation was performed periodically during training using validation loss.

Metrics monitored:

training loss
validation loss

The model was trained for exactly 2 epochs on the entire dataset without automated checkpoint selection. The final validation loss is 0.4039 for 2 full epochs.

Intended Use

This model is intended for:

mathematical reasoning research
educational demonstrations
experimentation with reasoning-focused fine-tuning
evaluation of math-capable language models

It is not intended for high-stakes mathematical or scientific applications.

Limitations

Despite improvements from fine-tuning, the model still has several limitations:

The model may generate incorrect reasoning steps.
Mathematical derivations may lack formal rigor.
Some areas of mathematics may be underrepresented in the dataset.
Performance depends strongly on the capabilities of the base model.

Users should treat model outputs as assistive suggestions rather than authoritative answers.

Ethical Considerations

Language models trained for reasoning may produce confident but incorrect explanations.

For educational or academic use:

outputs should be verified independently
the model should not be treated as an authoritative mathematical source

Acknowledgements

This work builds upon the open-source ecosystem including:

Hugging Face Transformers
the PEFT library for parameter-efficient fine-tuning
the NuminaMath dataset
research on Mixture-of-Experts transformer architectures

Citation

Dataset:

https://huggingface.co/datasets/AI-MO/NuminaMath-TIR

Training Notebook:

https://www.kaggle.com/code/tensorhydra/gpt-oss-20b-finetune-numinamath

Downloads last month: 1

Safetensors

Model size

21B params

Tensor type

BF16

tensorhydra
/

gpt-oss-20b-numinamath