Instructions to use Edmon02/mathphd-plus-plus-0.5b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Edmon02/mathphd-plus-plus-0.5b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Edmon02/mathphd-plus-plus-0.5b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Edmon02/mathphd-plus-plus-0.5b")
model = AutoModelForCausalLM.from_pretrained("Edmon02/mathphd-plus-plus-0.5b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Edmon02/mathphd-plus-plus-0.5b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Edmon02/mathphd-plus-plus-0.5b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Edmon02/mathphd-plus-plus-0.5b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Edmon02/mathphd-plus-plus-0.5b

SGLang

How to use Edmon02/mathphd-plus-plus-0.5b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Edmon02/mathphd-plus-plus-0.5b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Edmon02/mathphd-plus-plus-0.5b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Edmon02/mathphd-plus-plus-0.5b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Edmon02/mathphd-plus-plus-0.5b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Edmon02/mathphd-plus-plus-0.5b with Docker Model Runner:
```
docker model run hf.co/Edmon02/mathphd-plus-plus-0.5b
```

mathphd-plus-plus-0.5b / README.md

Edmon02

Update README.md

bc174f2 verified 14 days ago

preview code

raw

history blame contribute delete

5.79 kB

	---
	language:
	- en
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- math
	- reasoning
	- chain-of-thought
	- qwen2
	- conversational
	- rlvr
	base_model: Qwen/Qwen2.5-0.5B-Instruct
	---

	# MathPhD++ 0.5B

	MathPhD++ is a small (≈0.5B parameter) language model fine-tuned for mathematical reasoning in natural language. It is built on [Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) and trained with the MathPhD++ open-source pipeline (see linked code repository in your Hub “Model sources” if you publish it): supervised fine-tuning (SFT) on curated math instruction data with structured `<thinking>` / `<answer>` (and related) tags, optional process reward modeling (PRM), and reinforcement learning from verifiable rewards (GRPO) using SymPy-backed correctness checks.

	This Hub release is intended as a reproducible checkpoint for research and experimentation on math LLMs at the edge of what fits comfortably on a single consumer or Colab GPU.

	## Model summary

	\| Attribute \| Value \|
	\|-----------\|--------\|
	\| Architecture \| Qwen2 (causal LM), ~0.5B parameters \|
	\| Precision \| FP16 (typical Hub export) \|
	\| Chat format \| ChatML (`<\|im_start\|>` / `<\|im_end\|>`) — prefer `tokenizer.apply_chat_template` when available \|
	\| Primary use \| Step-by-step math word problems, competition-style reasoning (informal proofs / chain-of-thought) \|
	\| Developed by \| Edmon (Edmon02) — community research project \|
	\| Finetuned from \| `Qwen/Qwen2.5-0.5B-Instruct` \|

	## Training data (high level)

	SFT mixes multiple public sources (non-exhaustive; see project config for exact caps):

	- MetaMath-style QA
	- Competition MATH (train)
	- GSM8K (train)
	- OpenMathInstruct-2 (subset)
	- NuminaMath-CoT (subset)

	Examples are formatted in ChatML with structured assistant outputs (reasoning blocks and final answers) to encourage verifiable extraction and consistent formatting for downstream RL.

	## Evaluation (reported from project notebook run)

	Results below are indicative and used a 200-sample cap per benchmark (`QUICK_TEST`-style eval). For publication-quality numbers, run full GSM8K test (1,319) and a standard MATH split with fixed protocol.

	\| Benchmark \| Subset / protocol \| Accuracy \|
	\|-----------\|-------------------\|----------\|
	\| GSM8K \| 200 / test \| 18.5% (37/200) \|
	\| MATH \| 200 / MATH-500 \| 6.0% (12/200) \|

	These scores reflect the SFT-loaded policy evaluated after the pipeline fix that loads fine-tuned weights from checkpoint storage (not the raw base model). A 0.5B model remains capacity-limited on MATH; GSM8K is the more informative “did SFT help?” signal at this scale.

	## How to use

	### Transformers (generate)

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_id = "Edmon02/mathphd-plus-plus-0.5b"
	tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.float16,
	device_map="auto",
	trust_remote_code=True,
	)

	problem = "What is the sum of the first 100 positive integers?"
	prompt = (
	"<\|im_start\|>system\nYou are MathPhD++, an advanced mathematical reasoning assistant. "
	"Show your complete reasoning step-by-step.<\|im_end\|>\n"
	f"<\|im_start\|>user\n{problem}<\|im_end\|>\n"
	"<\|im_start\|>assistant\n"
	)
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(
	**inputs,
	max_new_tokens=512,
	do_sample=False,
	pad_token_id=tokenizer.pad_token_id,
	)
	print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
	```

	Use greedy or low temperature for benchmarking; use sampling for exploratory interaction.

	## Limitations

	- Small model: Will underperform larger instruction models on hard competition math and long proofs.
	- Informal reasoning: Outputs are not formally verified unless you pair the model with an external proof checker or code execution sandbox.
	- Data contamination: Public math benchmarks overlap train/eval sources; treat leaderboard-style claims with care unless you hold out data strictly.
	- Language: Primarily English math text; mixed-language or non-math prompts are out of distribution.

	## Bias, safety, and responsible use

	This model inherits behaviors and limitations of the base Qwen2.5 model and the fine-tuning corpora. It may produce confident but incorrect mathematics. Do not use as a sole authority for safety-critical, financial, medical, or legal reasoning. Prefer human review and independent verification.

	## Environmental note

	If your Hub UI shows an unrelated arXiv paper (e.g. carbon footprint of ML), that is often an automatic metadata artifact. This model card is the authoritative description; consider removing incorrect `arxiv:` tags under model settings.

	## Links

	- Checkpoints / artifacts (author): [Google Drive — mathphd_checkpoints](https://drive.google.com/drive/folders/14T6zF9B_Zh0JbKUW2nFEWz7QrYtW_r85?usp=sharing) (SFT, PRM, GRPO, eval exports — access as permitted by owner)
	- Base model: [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)

	## Citation

	If you use this model, cite the base model and this Hub repository as appropriate:

	```bibtex
	@misc{mathphd_plus_plus_05b,
	title = {MathPhD++ 0.5B: Math Reasoning Model (Qwen2.5-0.5B-Instruct fine-tune)},
	author = {Edmon02},
	year = {2026},
	howpublished = {\url{https://huggingface.co/Edmon02/mathphd-plus-plus-0.5b}},
	}
	```

	---

	Model card written for professional Hub documentation. Update the GitHub URL and evaluation table when you publish full-benchmark runs.