Trace RCA β€” Qwen/Qwen3-1.7B (GRPO + LoRA)

Trained on the Trace RCA Gym environment.

Training evidence

  • reward_plot.png β€” reward curves (regenerated every 5 episodes during training)
  • metrics.jsonl β€” per-episode rich metrics (fault_type, tier, difficulty, reward components)
  • fault_type_metrics.json β€” per-fault-type rolling success
  • compare.png, compare.md β€” before/after comparison on pinned scenarios
  • adapter/ β€” the LoRA adapter (load with PEFT)
  • tb_episodes/ β€” TensorBoard episode scalars

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-1.7B")
model = PeftModel.from_pretrained(base, "siddham0909/trace-rca-qwen3-1.7b", subfolder="adapter")
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for siddham0909/trace-rca-qwen3-1.7b

Finetuned
Qwen/Qwen3-1.7B
Adapter
(510)
this model

Space using siddham0909/trace-rca-qwen3-1.7b 1