Papers
arxiv:2605.05566

Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration

Published on May 7
· Submitted by
Chengsong Huang
on May 8
Authors:
,
,
,
,
,

Abstract

LoPE addresses the zero-advantage problem in reinforcement learning with verifiable rewards by usingLorem Ipsum perturbations to enhance exploration in large language model training.

AI-generated summary

Reinforcement learning with verifiable rewards, particularly Group Relative Policy Optimization (GRPO), has significantly advanced the reasoning capabilities of Large Language Models (LLMs). However, in complex tasks, GRPO frequently suffers from the ``zero-advantage problem'': when all sampled rollouts for a query fail, the relative advantage collapses to zero. Consequently, the model loses effective training signals for these questions, wasting the training data and computational budget. While simply increasing the sampling budget for these questions is a common remedy, the static sampling policy inherently constrains reasoning exploration, limiting the success rate. In this paper, we propose Lorem Perturbation for Exploration (LoPE), a simple yet effective training framework to break this exploration bottleneck. We posit that task-irrelevant prompt-space perturbations can shift the model's output distribution enough to unlock orthogonal reasoning pathways for hard questions. Specifically, LoPE prepends sequences stochastically assembled from Lorem Ipsum vocabulary (a pseudo-Latin placeholder text) to the prompts before resampling. Experiments across 1.7B, 4B, and 7B models demonstrate that LoPE significantly outperforms resampling with the original prompts. Further analysis reveals that other Latin-based random sequences with low perplexity are also effective perturbations. Our results establish LoPE as a strong baseline for broadening exploration in LLM reinforcement learning.

Community

prompt space perturbation broadens reasoning exploration

Interesting breakdown of this paper on arXivLens: https://arxivlens.com/PaperView/Details/nonsense-helps-prompt-space-perturbation-broadens-reasoning-exploration-9503-32304ddf
Covers the executive summary, detailed methodology, and practical applications.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.05566
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.05566 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.05566 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.