Text Generation
Transformers
Safetensors
step3p5
conversational
custom_code
Eval Results

Context Management Reproducibility | 可复现性 ?

#27
by pandemo - opened

Hi StepFun team, thank you so much for open-sourcing such impressive models and sharing your research!

Just a quick question on the discard-all strategy used for BrowseComp:

When the context length exceeds the threshold and the agent “discards its entire context,”(ref from the Step 3.5 Flash paper), does that mean everything accumulated (tool calls, reasoning, observations, etc.) is removed except the system prompt and initial user message/question?

Also, is the agent framework used for BrowseComp/HLE evaluation the same as (or similar to) the one in Step-DeepResearch?

Thanks again for your amazing work! 🙏


Hi 阶跃星辰团队,感谢你们开源如此出色的模型并分享你们的研究成果!

有一个关于 BrowseComp 中使用的 discard-all 策略的小问题:

当 context length 超过阈值,agent “discards its entire context”(引用自 Step 3.5 Flash 论文)时,是否意味着此前累计的所有内容(tool calls、reasoning、observations 等)都会被移除,仅保留 system prompt 和 initial user message/question?

另外,用于 BrowseComp/HLE 评测的 agent framework,是否与 Step-DeepResearch 中使用的框架相同或类似?

再次感谢你们出色的工作!🙏

StepFun org

To answer your questions:

  1. Discard-all Strategy: You are correct. When the context length hits the threshold, the agent clears all accumulated tool calls, reasoning steps, and observations. It effectively resets the memory, retaining only the system prompt and the initial user message to keep the core objective in focus while freeing up space for new exploration.
  2. Agent Framework: The framework used for BrowseComp/HLE is indeed very similar to the Step-DeepResearch architecture. Both rely on a core ReAct loop managed by a dedicated Context Manager. The primary difference lies in the toolsets: BrowseComp utilizes a specialized suite of internal optimization tools tailored specifically for complex web browsing and high-level reasoning tasks.

Thank you for the clarification @ccchen1006 , this is really helpful for community reproducibility🙏.

Out of curiosity, are you able to share any more details about the agent framework used, in particular concerning the "specialized suite of internal optimization tools" you mentioned? Or are there any plans to open-source the agent framework in the future?


谢谢澄清 @ccchen1006 ,这对社区的可复现性真的很有帮助🙏。

出于好奇,您是否可以分享更多关于所使用的 agent framework 的细节,尤其是您提到的 “specialized suite of internal optimization tools”?另外,未来是否有计划将该 agent framework 开源?

Sign up or log in to comment