fix: clamp ALL rewards/scores to strict (0.01, 0.99) β every output path 29994af ragavrida commited on 4 days ago
fix: match EXACT sample inference pattern β HF_TOKEN or API_KEY, getenv with fallbacks eef2674 ragavrida commited on 4 days ago
fix: test LLM proxy FIRST before env connection β ensures API call is made 393727e ragavrida commited on 4 days ago
fix: use exactly os.environ[API_BASE_URL] and os.environ[API_KEY] β no fallbacks 470383e ragavrida commited on 4 days ago
fix: bulletproof inference β never crash, always emit START/END, defensive parsing 4c49198 ragavrida commited on 4 days ago
fix: add error handling for from_docker_image + full traceback logging 723c4a5 ragavrida commited on 4 days ago
fix: match reference inference pattern β HF_TOKEN, from_docker_image, no fallback URL a4e3468 ragavrida commited on 4 days ago
fix: use only platform API_BASE_URL and API_KEY, no fallbacks 2538ea3 ragavrida commited on 4 days ago
feat: 3 tasks with programmatic graders + OPENAI_API_KEY support af0f6eb ragavrida commited on 4 days ago
docs: rewrite README β adaptive curriculum front and center 45e50e5 ragavrida commited on 4 days ago
feat: adaptive curriculum β environment learns from agent and gets harder c63ea5a ragavrida commited on 4 days ago
feat: real data wired in, visual map, all 5 improvements complete 5e5efc0 ragavrida commited on 5 days ago
feat: real data, Gymnasium wrapper, baseline comparison, research framing 67e22e7 ragavrida commited on 5 days ago
feat: SupplyChainEnv β global supply chain disruption RL environment af6c6b1 ragavrida Claude Opus 4.6 (1M context) commited on 5 days ago