-
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
Paper • 2502.17262 • Published • 22 -
MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion
Paper • 2502.04235 • Published • 23 -
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection
Paper • 2505.07293 • Published • 28
shenke
shenke18
AI & ML interests
None yet
Recent Activity
authored a paper 1 day ago
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning authored a paper 1 day ago
Nexus: Same Pretraining Loss, Better Downstream Generalization via Common Minima authored a paper 11 months ago
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought
Reasoning in LLMs