Update README.md

b5901b3 verified 5 months ago

7.47 kB

	---
	tags:
	- Coder
	- Math
	- qwen2
	- thinking
	- reasoning
	model-index:
	- name: Palmyra-mini-thinking-b
	results: []
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	---


	<div align="center">
	<h1>Palmyra-mini-thinking-b</h1>

	</div>

	<p align="center">
	<img src="https://huggingface.co/Writer/palmyra-mini-thinking-b/resolve/main/logo-mini-b%20benchmark-performance.png?download=true" width="800"/>
	</p>

	### Model Description

	- Language(s) (NLP): English
	- License: Apache-2.0
	- Finetuned from model: nvidia/OpenReasoning-Nemotron-1.5B
	- Context window: 131,072 tokens
	- Parameters: 1.7 billion

	## Introduction

	Palmyra-mini-thinking-b represents a significant step forward in generative AI, demonstrating exceptional capabilities in complex reasoning and problem-solving domains. This model excels in mathematical and programming challenges, showcasing a robust understanding of abstract concepts and logical structures. Its performance is not just a measure of its power but a testament to its specialized training, which has honed its ability to tackle tasks that demand deep, multi-step thinking.

	## Mathematical Prowess

	The model's mathematical abilities are particularly noteworthy. It achieves an impressive score of 0.925 on the AMC23 benchmark, indicating a strong grasp of advanced high school mathematics. This is further complemented by its performance on MATH500, where it scores 0.882, proving its proficiency across a wide range of mathematical problems. The model also shows its strength in competitive mathematics, scoring 0.6 on AIME24(pass@1)(avg-of-1) and 0.5733 on Olympiadbench (extractive_match). These scores highlight the model's capacity for sophisticated mathematical reasoning, making it a powerful tool for both educational and research applications.

	## Excellence in Competitive Programming

	Beyond mathematics, Palmyra-mini-thinking-b demonstrates strong performance in the competitive programming arena. Its score of 0.6343 on the Codeforces (pass_rate) benchmark underscores its ability to understand complex algorithmic problems and generate correct, efficient code. This capability suggests the model is well-suited for tasks involving code generation, debugging, and algorithmic design, making it a valuable asset for software developers and computer science researchers.

	## Benchmark Scores (sampling params: temperature:0.6, top_p:0.95)

	Pass@1(avg-of-64)

	\| Benchmark \| Pass@1 (avg-of-64) \| Majority@64 \|
	\| :-------- \| :------------------- \| :----------- \|
	\| AIME24 \| 59.43% \| 71.67% \|
	\| AIME25 \| 49.69% \| 60.00% \|
	\| GPQA \| 42.01% \| 47.22% \|
	\| HMMT25 \| 27.86% \| 30.00% \|
	\| HLE \| 5.22% \| N/A \|
	\| MMLU-PRO \| 55.49% \| 60.60% \|
	\| MATH500 \| 93.80% \| 95.40% \|
	\| LCB \| 34.51% \| N/A \|

	LCB here is version v6_2408_2505


	Pass@1(avg-of-1)

	\| Benchmark \| Score (%) \|
	\|:-----------------------------------------------------------------\|------------:\|
	\| GSM8K (strict-match) \| 42.68% \|
	\| Minerva Math (exact match) \| 7.08% \|
	\| MMLU-PRO (exact match) \| 29.26% \|
	\| MATH (Hendrycks) \| 0.16% \|
	\| IFEval (inst_level_loose_acc) \| 32.97% \|
	\| MathQA (acc) \| 30.45% \|
	\| HumanEval (pass@1) \| 7.32% \|
	\| BBH (get-answer)(exact match) \| 28.80% \|
	\| MBPP \| 16.80% \|
	\| GPQA (diamond, pass@1: 8 samples) \| 39.58% \|
	\| AIME24 (pass@1)(avg-of-1) \| 60.00% \|
	\| AIME25 (pass@1)(avg-of-1) \| 50.00% \|
	\| Livecodebench-codegen (livecodebench/code_generation_lite v4_v5) \| 28.73% \|
	\| AMC23 \| 92.50% \|
	\| MATH500 \| 88.20% \|
	\| Minerva \| 29.41% \|
	\| Olympiadbench (extractive_match) \| 57.33% \|
	\| Codecontests (pass_rate) \| 20.18% \|
	\| Codeforces (pass_rate) \| 63.43% \|
	\| Taco (pass_rate) \| 34.56% \|
	\| APPS (all_levels) \| 5.84% \|
	\| HMMT (Feb 2025) (extractive_match) \| 23.33% \|
	\| Average \| 35.94% \|

	### Use with transformers

	You can run conversational inference using the Transformers Auto classes with the `generate()` function. Here's an example:

	```py
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_id = "Writer/palmyra-mini-thinking-b"

	tokenizer = AutoTokenizer.from_pretrained(model_id)

	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.float16,
	device_map="auto",
	attn_implementation="flash_attention_2",
	)

	messages = [
	{
	"role": "user",
	"content": "You have a 3-liter jug and a 5-liter jug. How can you measure exactly 4 liters of water?"
	}
	],

	input_ids = tokenizer.apply_chat_template(
	messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
	)

	gen_conf = {
	"max_new_tokens": 256,
	"eos_token_id": tokenizer.eos_token_id,
	"temperature": 0.3,
	"top_p": 0.9,
	}

	with torch.inference_mode():
	output_id = model.generate(input_ids, **gen_conf)

	output_text = tokenizer.decode(output_id[0][input_ids.shape[1] :])

	print(output_text)
	```

	## Running with vLLM
	```py
	vllm serve Writer/palmyra-mini-thinking-b
	```
	```py
	curl -X POST http://localhost:8000/v1/chat/completions \
	-H "Content-Type: application/json" \
	-d '{
	"model": "Writer/palmyra-mini-thinking-b",
	"messages": [
	{
	"role": "user",
	"content": "You have a 3-liter jug and a 5-liter jug. How can you measure exactly 4 liters of water?"
	}
	],
	"max_tokens": 8000,
	"temperature": 0.2
	}'
	```

	## Ethical Considerations

	As with any language model, there is a potential for generating biased or inaccurate information. Users should be aware of these limitations and use the model responsibly.


	### Footnotes

	- Base model: This model builds on NVIDIA's OpenReasoning-Nemotron-1.5B (`https://huggingface.co/nvidia/OpenReasoning-Nemotron-1.5B`).
	- Evaluation methodology:
	- Pass@1 (avg-of-1): computed using `lm_eval` and `lighteval`.
	- Pass@1 (avg-of-64) and Majority@64: computed using `nemoskills`.

	### Citation and Related Information


	To cite this model:
	```
	@misc{Palmyra-mini-thinking-b,
	author = {Writer Engineering team},
	title = {{Palmyra-mini: A powerful LLM designed for math and coding}},
	howpublished = {\url{https://dev.writer.com}},
	year = 2025,
	month = Sep
	}
	```
	Contact Hello@writer.com