| cff-version: 1.2.0 | |
| title: 'TRL: Transformers Reinforcement Learning' | |
| message: >- | |
| If you use this software, please cite it using the | |
| metadata from this file. | |
| type: software | |
| authors: | |
| - given-names: Leandro | |
| family-names: von Werra | |
| - given-names: Younes | |
| family-names: Belkada | |
| - given-names: Lewis | |
| family-names: Tunstall | |
| - given-names: Edward | |
| family-names: Beeching | |
| - given-names: Tristan | |
| family-names: Thrush | |
| - given-names: Nathan | |
| family-names: Lambert | |
| - given-names: Shengyi | |
| family-names: Huang | |
| - given-names: Kashif | |
| family-names: Rasul | |
| - given-names: Quentin | |
| family-names: Gallouédec | |
| repository-code: 'https://github.com/huggingface/trl' | |
| abstract: >- | |
| TRL (Transformers Reinforcement Learning) is an | |
| open-source toolkit for aligning transformer models via | |
| post-training. It provides practical, scalable | |
| implementations of SFT, reward modeling, DPO, and GRPO | |
| within the Hugging Face ecosystem. | |
| keywords: | |
| - transformers | |
| - reinforcement learning | |
| - preference optimization | |
| - language model alignment | |
| - post-training | |
| license: Apache-2.0 | |
| version: '1.2' | |
| date-released: '2020-03-27' | |