feat: add OpenEnv TRL wrapper, expand dataset, and add W&B eval tracking 6fa4fbd Mohammed-Altaf commited on Apr 25
refactor: harden imports, add training extras, and rewrite README 5dd60b9 Mohammed-Altaf commited on Apr 25