Buckets:
| # Trl | |
| ## Docs | |
| - [RapidFire AI Integration](https://huggingface.co/docs/trl/pr_5607/rapidfire_integration.md) | |
| - [Installation](https://huggingface.co/docs/trl/pr_5607/installation.md) | |
| - [LoRA Without Regret](https://huggingface.co/docs/trl/pr_5607/lora_without_regret.md) | |
| - [CPO Trainer](https://huggingface.co/docs/trl/pr_5607/cpo_trainer.md) | |
| - [Quickstart](https://huggingface.co/docs/trl/pr_5607/quickstart.md) | |
| - [TRL - Transformers Reinforcement Learning](https://huggingface.co/docs/trl/pr_5607/index.md) | |
| - [Use model after training](https://huggingface.co/docs/trl/pr_5607/use_model.md) | |
| - [PPO Trainer](https://huggingface.co/docs/trl/pr_5607/ppo_trainer.md) | |
| - [KTO Trainer](https://huggingface.co/docs/trl/pr_5607/kto_trainer.md) | |
| - [SDPO](https://huggingface.co/docs/trl/pr_5607/sdpo_trainer.md) | |
| - [ORPO Trainer](https://huggingface.co/docs/trl/pr_5607/orpo_trainer.md) | |
| - [Generalized Knowledge Distillation Trainer](https://huggingface.co/docs/trl/pr_5607/gkd_trainer.md) | |
| - [Speeding Up Training](https://huggingface.co/docs/trl/pr_5607/speeding_up_training.md) | |
| - [PAPO Trainer](https://huggingface.co/docs/trl/pr_5607/papo_trainer.md) | |
| - [Reducing Memory Usage](https://huggingface.co/docs/trl/pr_5607/reducing_memory_usage.md) | |
| - [BCO Trainer](https://huggingface.co/docs/trl/pr_5607/bco_trainer.md) | |
| - [Community Tutorials](https://huggingface.co/docs/trl/pr_5607/community_tutorials.md) | |
| - [SDFT](https://huggingface.co/docs/trl/pr_5607/sdft_trainer.md) | |
| - [RLOO Trainer](https://huggingface.co/docs/trl/pr_5607/rloo_trainer.md) | |
| - [SFT Trainer](https://huggingface.co/docs/trl/pr_5607/sft_trainer.md) | |
| - [Dataset formats and types](https://huggingface.co/docs/trl/pr_5607/dataset_formats.md) | |
| - [Scripts Utilities](https://huggingface.co/docs/trl/pr_5607/script_utils.md) | |
| - [Training customization](https://huggingface.co/docs/trl/pr_5607/customization.md) | |
| - [PEFT Integration](https://huggingface.co/docs/trl/pr_5607/peft_integration.md) | |
| - [Command Line Interfaces (CLIs)](https://huggingface.co/docs/trl/pr_5607/clis.md) | |
| - [Reward Functions](https://huggingface.co/docs/trl/pr_5607/rewards.md) | |
| - [PRM Trainer](https://huggingface.co/docs/trl/pr_5607/prm_trainer.md) | |
| - [Unsloth Integration](https://huggingface.co/docs/trl/pr_5607/unsloth_integration.md) | |
| - [Trackio Integration](https://huggingface.co/docs/trl/pr_5607/trackio_integration.md) | |
| - [Experimental](https://huggingface.co/docs/trl/pr_5607/experimental_overview.md) | |
| - [GFPO](https://huggingface.co/docs/trl/pr_5607/gfpo.md) | |
| - [XPO Trainer](https://huggingface.co/docs/trl/pr_5607/xpo_trainer.md) | |
| - [Nash-MD Trainer](https://huggingface.co/docs/trl/pr_5607/nash_md_trainer.md) | |
| - [DPO Trainer](https://huggingface.co/docs/trl/pr_5607/dpo_trainer.md) | |
| - [Post-Training Toolkit Integration](https://huggingface.co/docs/trl/pr_5607/ptt_integration.md) | |
| - [Data Utilities](https://huggingface.co/docs/trl/pr_5607/data_utils.md) | |
| - [OpenEnv Integration for Training LLMs with Environments](https://huggingface.co/docs/trl/pr_5607/openenv.md) | |
| - [MiniLLM Trainer](https://huggingface.co/docs/trl/pr_5607/minillm_trainer.md) | |
| - [Distillation Trainer](https://huggingface.co/docs/trl/pr_5607/distillation_trainer.md) | |
| - [Liger Kernel Integration](https://huggingface.co/docs/trl/pr_5607/liger_kernel_integration.md) | |
| - [Kernels Hub Integration and Usage](https://huggingface.co/docs/trl/pr_5607/kernels_hub.md) | |
| - [NeMo Gym Integration](https://huggingface.co/docs/trl/pr_5607/nemo_gym.md) | |
| - [DeepSpeed Integration](https://huggingface.co/docs/trl/pr_5607/deepspeed_integration.md) | |
| - [GSPO-token](https://huggingface.co/docs/trl/pr_5607/gspo_token.md) | |
| - [Asynchronous GRPO](https://huggingface.co/docs/trl/pr_5607/async_grpo_trainer.md) | |
| - [Reward Modeling](https://huggingface.co/docs/trl/pr_5607/reward_trainer.md) | |
| - [Chat template utilities](https://huggingface.co/docs/trl/pr_5607/chat_template_utils.md) | |
| - [General Online Logit Distillation (GOLD) Trainer](https://huggingface.co/docs/trl/pr_5607/gold_trainer.md) | |
| - [Callbacks](https://huggingface.co/docs/trl/pr_5607/callbacks.md) | |
| - [Examples](https://huggingface.co/docs/trl/pr_5607/example_overview.md) | |
| - [GRPO With Replay Buffer](https://huggingface.co/docs/trl/pr_5607/grpo_with_replay_buffer.md) | |
| - [SSD](https://huggingface.co/docs/trl/pr_5607/ssd_trainer.md) | |
| - [Online DPO Trainer](https://huggingface.co/docs/trl/pr_5607/online_dpo_trainer.md) | |
| - [Paper Index](https://huggingface.co/docs/trl/pr_5607/paper_index.md) | |
| - [Training with Jobs](https://huggingface.co/docs/trl/pr_5607/jobs_training.md) | |
| - [GRPO Trainer](https://huggingface.co/docs/trl/pr_5607/grpo_trainer.md) | |
| - [Distributing Training](https://huggingface.co/docs/trl/pr_5607/distributing_training.md) | |
| - [MergeModelCallback[[trl.experimental.merge_model_callback.MergeModelCallback]]](https://huggingface.co/docs/trl/pr_5607/merge_model_callback.md) | |
| - [vLLM Integration](https://huggingface.co/docs/trl/pr_5607/vllm_integration.md) | |
| - [BEMA for Reference Model](https://huggingface.co/docs/trl/pr_5607/bema_for_reference_model.md) | |
Xet Storage Details
- Size:
- 5 kB
- Xet hash:
- 21de3ec7d246e77f2b872794bf2cf5b604d4c85a69deed94e1b75f82ac24c3ee
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.