Macro-Action RLHF - a ernie-research Collection

ernie-research 's Collections

NAVA (Native Audio-Visual Alignment for Generation)

Tool-Augmented Reward Models

Multilingual Code Pre-training (ERNIE-Code)

Pixel-based Pre-training (PixelGPT)

Macro-Action RLHF

Macro-Action RLHF

updated Sep 20, 2025

[ICLR'25] [MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions](https://openreview.net/forum?id=WWXjMYZxfH)

ernie-research/TLDR-Gemma-2B-MA-PPO-Fixed5

3B • Updated Feb 14, 2025 • 4 • 2
ernie-research/HH-RLHF-Gemma-2B-MA-PPO-Fixed5

3B • Updated Feb 14, 2025 • 6 • 2
ernie-research/TLDR-Gemma-7B-MA-PPO-Fixed5

9B • Updated Feb 14, 2025 • 9 • 1
ernie-research/APPS-Gemma-7B-MA-PPO-Fixed10

9B • Updated Feb 14, 2025 • 6 • 1
ernie-research/HH-RLHF-Gemma-7B-MA-PPO-Fixed5

9B • Updated Feb 14, 2025 • 4 • 1
ernie-research/APPS-Gemma-2B-MA-PPO-Fixed10

3B • Updated Feb 14, 2025 • 5 • 1
ernie-research/TLDR-Gemma-2-27B-MA-PPO-Fixed5

27B • Updated Feb 14, 2025 • 5 • 1
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions

Paper • 2410.02743 • Published Oct 3, 2024 • 9