YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Files

v3 run β€” original GRPO (Apr 2026)

  • final/adapter_model.safetensors (445 MB) β€” inference-ready, load with PEFT
  • final/adapter_config.json β€” PEFT config
  • final/{tokenizer.json, chat_template.jinja, tokenizer_config.json} β€” tokenizer
  • checkpoints/checkpoint-N/adapter_model.safetensors β€” intermediate adapters at steps 10, 20, 30, 40, 50, 60, 70, 74

v4-3 run β€” hard-curriculum continuation (May 2026)

Continued from v3's checkpoint-74 for 50 more steps with a harder curriculum and tightened reward shaping. See checkpoints/v4-3-may01-hard-curriculum/README.md for the full diff.

  • checkpoints/v4-3-may01-hard-curriculum/checkpoint-{10,20,30,40,50}/ β€” intermediate adapters (adapter_config.json + adapter_model.safetensors)

Inference

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained(
    "<path-to-Qwen3.5-9B-smtlib-sft-fixed>",
    torch_dtype="bfloat16",
    device_map="auto",
)
tok = AutoTokenizer.from_pretrained("Debargha/Qwen3.5-9B-GRPO-Adapters",
                                    subfolder="final")

# v3 final adapter:
model = PeftModel.from_pretrained(base, "Debargha/Qwen3.5-9B-GRPO-Adapters",
                                  subfolder="final")

# v4-3 latest adapter (step 50, on top of v3 checkpoint-74):
# model = PeftModel.from_pretrained(
#     base, "Debargha/Qwen3.5-9B-GRPO-Adapters",
#     subfolder="checkpoints/v4-3-may01-hard-curriculum/checkpoint-50",
# )
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support