arxiv:2606.00400

Dynamic Proxy-Mixing: Transferring Replay Controllers from Small to Large Models for Continual Instruction Tuning

Published on May 29

Authors:

Abstract

PROX-YMIX addresses continual instruction tuning challenges by dynamically controlling replay through a transferred proxy model controller that adapts to varying task vulnerabilities and training stages.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Continual instruction tuning updates a language model through a sequence of new domains, yet each update can progressively erode previously learned capabilities and alignment behavior. Replay is the standard mitigation, but fixed replay ratios are inherently limited because the optimal mixture varies with the current domain, the training stage, and the evolving vulnerability of prior behaviors. We propose PROX-YMIX, a framework that learns a dynamic replay controller on a small proxy model and transfers the frozen controller to a larger target. The controller never observes future tasks and constructs its state from normalized validation losses and their temporal dynamics, producing a masked mixture over the current task and accessible replay buffers. Our core empirical hypothesis is forgetting mirroring: task vulnerability rankings remain largely consistent across model scales even when absolute loss magnitudes differ. We validate this assumption empirically before transferring controllers across scales. On LLaMA-3-8B across five continual instruction tuning sequences, PROXYMIX improves average accuracy by 3.4 points, reduces final forgetting by 3.5 points, and raises safety score by 5.8 points over the strongest non-oracle baseline, at roughly 50x lower policy learning cost than Oracle Target RL. The framework is leakage free and architecture independent at the interface level, and we also identify settings where the proxy assumption breaks down, highlighting limitations for robust deployment.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.00400

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.00400 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.00400 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.00400 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.