Buckets:

HuggingFaceDocBuilder's picture
download
raw
5 kB
# Trl
## Docs
- [RapidFire AI Integration](https://huggingface.co/docs/trl/pr_5607/rapidfire_integration.md)
- [Installation](https://huggingface.co/docs/trl/pr_5607/installation.md)
- [LoRA Without Regret](https://huggingface.co/docs/trl/pr_5607/lora_without_regret.md)
- [CPO Trainer](https://huggingface.co/docs/trl/pr_5607/cpo_trainer.md)
- [Quickstart](https://huggingface.co/docs/trl/pr_5607/quickstart.md)
- [TRL - Transformers Reinforcement Learning](https://huggingface.co/docs/trl/pr_5607/index.md)
- [Use model after training](https://huggingface.co/docs/trl/pr_5607/use_model.md)
- [PPO Trainer](https://huggingface.co/docs/trl/pr_5607/ppo_trainer.md)
- [KTO Trainer](https://huggingface.co/docs/trl/pr_5607/kto_trainer.md)
- [SDPO](https://huggingface.co/docs/trl/pr_5607/sdpo_trainer.md)
- [ORPO Trainer](https://huggingface.co/docs/trl/pr_5607/orpo_trainer.md)
- [Generalized Knowledge Distillation Trainer](https://huggingface.co/docs/trl/pr_5607/gkd_trainer.md)
- [Speeding Up Training](https://huggingface.co/docs/trl/pr_5607/speeding_up_training.md)
- [PAPO Trainer](https://huggingface.co/docs/trl/pr_5607/papo_trainer.md)
- [Reducing Memory Usage](https://huggingface.co/docs/trl/pr_5607/reducing_memory_usage.md)
- [BCO Trainer](https://huggingface.co/docs/trl/pr_5607/bco_trainer.md)
- [Community Tutorials](https://huggingface.co/docs/trl/pr_5607/community_tutorials.md)
- [SDFT](https://huggingface.co/docs/trl/pr_5607/sdft_trainer.md)
- [RLOO Trainer](https://huggingface.co/docs/trl/pr_5607/rloo_trainer.md)
- [SFT Trainer](https://huggingface.co/docs/trl/pr_5607/sft_trainer.md)
- [Dataset formats and types](https://huggingface.co/docs/trl/pr_5607/dataset_formats.md)
- [Scripts Utilities](https://huggingface.co/docs/trl/pr_5607/script_utils.md)
- [Training customization](https://huggingface.co/docs/trl/pr_5607/customization.md)
- [PEFT Integration](https://huggingface.co/docs/trl/pr_5607/peft_integration.md)
- [Command Line Interfaces (CLIs)](https://huggingface.co/docs/trl/pr_5607/clis.md)
- [Reward Functions](https://huggingface.co/docs/trl/pr_5607/rewards.md)
- [PRM Trainer](https://huggingface.co/docs/trl/pr_5607/prm_trainer.md)
- [Unsloth Integration](https://huggingface.co/docs/trl/pr_5607/unsloth_integration.md)
- [Trackio Integration](https://huggingface.co/docs/trl/pr_5607/trackio_integration.md)
- [Experimental](https://huggingface.co/docs/trl/pr_5607/experimental_overview.md)
- [GFPO](https://huggingface.co/docs/trl/pr_5607/gfpo.md)
- [XPO Trainer](https://huggingface.co/docs/trl/pr_5607/xpo_trainer.md)
- [Nash-MD Trainer](https://huggingface.co/docs/trl/pr_5607/nash_md_trainer.md)
- [DPO Trainer](https://huggingface.co/docs/trl/pr_5607/dpo_trainer.md)
- [Post-Training Toolkit Integration](https://huggingface.co/docs/trl/pr_5607/ptt_integration.md)
- [Data Utilities](https://huggingface.co/docs/trl/pr_5607/data_utils.md)
- [OpenEnv Integration for Training LLMs with Environments](https://huggingface.co/docs/trl/pr_5607/openenv.md)
- [MiniLLM Trainer](https://huggingface.co/docs/trl/pr_5607/minillm_trainer.md)
- [Distillation Trainer](https://huggingface.co/docs/trl/pr_5607/distillation_trainer.md)
- [Liger Kernel Integration](https://huggingface.co/docs/trl/pr_5607/liger_kernel_integration.md)
- [Kernels Hub Integration and Usage](https://huggingface.co/docs/trl/pr_5607/kernels_hub.md)
- [NeMo Gym Integration](https://huggingface.co/docs/trl/pr_5607/nemo_gym.md)
- [DeepSpeed Integration](https://huggingface.co/docs/trl/pr_5607/deepspeed_integration.md)
- [GSPO-token](https://huggingface.co/docs/trl/pr_5607/gspo_token.md)
- [Asynchronous GRPO](https://huggingface.co/docs/trl/pr_5607/async_grpo_trainer.md)
- [Reward Modeling](https://huggingface.co/docs/trl/pr_5607/reward_trainer.md)
- [Chat template utilities](https://huggingface.co/docs/trl/pr_5607/chat_template_utils.md)
- [General Online Logit Distillation (GOLD) Trainer](https://huggingface.co/docs/trl/pr_5607/gold_trainer.md)
- [Callbacks](https://huggingface.co/docs/trl/pr_5607/callbacks.md)
- [Examples](https://huggingface.co/docs/trl/pr_5607/example_overview.md)
- [GRPO With Replay Buffer](https://huggingface.co/docs/trl/pr_5607/grpo_with_replay_buffer.md)
- [SSD](https://huggingface.co/docs/trl/pr_5607/ssd_trainer.md)
- [Online DPO Trainer](https://huggingface.co/docs/trl/pr_5607/online_dpo_trainer.md)
- [Paper Index](https://huggingface.co/docs/trl/pr_5607/paper_index.md)
- [Training with Jobs](https://huggingface.co/docs/trl/pr_5607/jobs_training.md)
- [GRPO Trainer](https://huggingface.co/docs/trl/pr_5607/grpo_trainer.md)
- [Distributing Training](https://huggingface.co/docs/trl/pr_5607/distributing_training.md)
- [MergeModelCallback[[trl.experimental.merge_model_callback.MergeModelCallback]]](https://huggingface.co/docs/trl/pr_5607/merge_model_callback.md)
- [vLLM Integration](https://huggingface.co/docs/trl/pr_5607/vllm_integration.md)
- [BEMA for Reference Model](https://huggingface.co/docs/trl/pr_5607/bema_for_reference_model.md)

Xet Storage Details

Size:
5 kB
·
Xet hash:
21de3ec7d246e77f2b872794bf2cf5b604d4c85a69deed94e1b75f82ac24c3ee

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.