Argonne-2.5-think

Argonne-2.5-think is a reasoning SFT checkpoint trained from PursuitOfDataScience/Argonne-2.5-ctx13568 on PursuitOfDataScience/0.5M-thinking.

Model architecture

Component Specification
Parameters 1,273,807,360 (~1.27B)
Layers 28 transformer blocks
Hidden size 1,792
Attention heads 14 query / 7 key-value (GQA)
Context length 13,568 tokens
Vocabulary size 151,669
Position encoding RoPE (theta = 10,000)

Training details

Item Value
Input model PursuitOfDataScience/Argonne-2.5-ctx13568
Training data PursuitOfDataScience/0.5M-thinking
Training script cot-sft.py
Checkpoint dtype bfloat16
Weight format 5 sharded safetensors

Tokenizer

This model uses the Qwen3 tokenizer family via the Qwen2Tokenizer compatibility class.

Source code

The release was built from the GitHub main branch codebase: https://github.com/PursuitOfDataScience/ArgonneAI/tree/main

Code reference:

Recommended generation config

Item Value
Context length 13,568 tokens
Continuation length 1,024 new tokens
Decoding Sampling (do_sample=True)
Temperature 0.7
Top-p 0.9
Top-k 40
No-repeat n-gram size 10
Repetition penalty 1.0
Seed <think> False

Inference

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "PursuitOfDataScience/Argonne-2.5-think"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
)

device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
model.eval()

messages = [
    {"role": "user", "content": "Why were elements heavier than lithium not produced in large amounts during Big Bang nucleosynthesis?"}
]
prompt_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    enable_thinking=True,
)
input_ids = torch.tensor([prompt_ids], dtype=torch.long, device=device)

with torch.no_grad():
    output_ids = model.generate(
        input_ids,
        max_length=min(model.config.max_position_embeddings, input_ids.shape[1] + 1024),
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        top_k=40,
        repetition_penalty=1.0,
        no_repeat_ngram_size=10,
    )

gen_ids = output_ids[0, input_ids.shape[1]:].tolist()
eos_id = tokenizer.eos_token_id
if eos_id is not None and eos_id in gen_ids:
    gen_ids = gen_ids[: gen_ids.index(eos_id)]

print(tokenizer.decode(gen_ids, skip_special_tokens=True).strip())

Usage notes

  • Load with trust_remote_code=True.
  • Use the chat template via tokenizer.apply_chat_template(..., add_generation_prompt=True, enable_thinking=True).
  • The custom generate method uses max_length, so the example sets max_length=input_length + continuation_length.
  • Weights are published as 5 bf16 safetensor shards.
  • The checkpoint inherits the tokenizer and chat template from the long-context input model.

Citation

@misc{argonne25think,
  author = {PursuitOfDataScience},
  title = {Argonne-2.5-think},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/PursuitOfDataScience/Argonne-2.5-think}
}
Downloads last month
27
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for PursuitOfDataScience/Argonne-2.5-think

Finetuned
(2)
this model

Dataset used to train PursuitOfDataScience/Argonne-2.5-think