feat: add OpenEnv TRL wrapper, expand dataset, and add W&B eval tracking 6fa4fbd Mohammed-Altaf commited on 18 days ago
refactor: harden imports, add training extras, and rewrite README 5dd60b9 Mohammed-Altaf commited on 19 days ago