Paul S PRO
SuperPauly
AI & ML interests
None yet
Recent Activity
liked
a model 2 days ago
aifeifei798/QiMing-Gemma-3-Socratic-4b liked
a model 2 days ago
HumeAI/tada-1b liked
a Space 2 days ago
ginigen-ai/smol-worldcup Organizations
None yet
Agent Loops, Character, Work Ethics & Behavior
-
Close the Loop: Synthesizing Infinite Tool-Use Data via Multi-Agent Role-Playing
Paper • 2512.23611 • Published • 6 -
Context as a Tool: Context Management for Long-Horizon SWE-Agents
Paper • 2512.22087 • Published • 3 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 57 -
Very Large-Scale Multi-Agent Simulation in AgentScope
Paper • 2407.17789 • Published • 38
Evaluation Methods & Metrics
-
RubricBench: Aligning Model-Generated Rubrics with Human Standards
Paper • 2603.01562 • Published • 57 -
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
Paper • 2603.03790 • Published • 113 -
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
Paper • 2505.20411 • Published • 93 -
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale
Paper • 2602.23866 • Published • 83
Py
Demixing Models & Datasets
Evaluation Methods & Metrics
-
RubricBench: Aligning Model-Generated Rubrics with Human Standards
Paper • 2603.01562 • Published • 57 -
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
Paper • 2603.03790 • Published • 113 -
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
Paper • 2505.20411 • Published • 93 -
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale
Paper • 2602.23866 • Published • 83
Agent Loops, Character, Work Ethics & Behavior
-
Close the Loop: Synthesizing Infinite Tool-Use Data via Multi-Agent Role-Playing
Paper • 2512.23611 • Published • 6 -
Context as a Tool: Context Management for Long-Horizon SWE-Agents
Paper • 2512.22087 • Published • 3 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 57 -
Very Large-Scale Multi-Agent Simulation in AgentScope
Paper • 2407.17789 • Published • 38
Py