Use real LLM call for proxy check + baseline scores for task validation 5e3e79e junaid0600 commited on Apr 10
Clean inference.py using baseline scores strictly between 0 and 1 b02ec3c junaid0600 commited on Apr 10
Normalize all rewards to strictly (0.001, 0.999) range in step() 42a1cbd junaid0600 commited on Apr 10
Clamp grader scores strictly between 0.001 and 0.999 in endpoint and model f2d88cb junaid0600 commited on Apr 10
Fix rewards never exactly 0.0 or 1.0 using proper normalization 7dff36b junaid0600 commited on Apr 10