VisualSphinx

community

https://visualsphinx.github.io/

AI & ML interests

LLM Data

Recent Activity

zhangchenxu authored a paper 7 days ago

SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge

zhangchenxu authored a paper 7 days ago

Building a Foundational Guardrail for General Agentic Systems via Synthetic Data

zhangchenxu authored a paper 7 days ago

PersonaMem-v2: Towards Personalized Intelligence via Learning Implicit User Personas and Agentic Memory

View all activity

authored 7 papers 7 days ago

SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge

Paper • 2505.21605 • Published May 27, 2025

Building a Foundational Guardrail for General Agentic Systems via Synthetic Data

Paper • 2510.09781 • Published Oct 10, 2025 • 27

PersonaMem-v2: Towards Personalized Intelligence via Learning Implicit User Personas and Agentic Memory

Paper • 2512.06688 • Published Dec 7, 2025 • 2

Emergent Social Intelligence Risks in Generative Multi-Agent Systems

Paper • 2603.27771 • Published Mar 29 • 52

Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?

Paper • 2605.12684 • Published May 12 • 11

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

Paper • 2606.05080 • Published 22 days ago • 30

Steering Multimodal Large Language Models Decoding for Context-Aware Safety

Paper • 2509.19212 • Published Sep 23, 2025

authored 2 papers about 1 month ago

BadScientist: Can a Research Agent Write Convincing but Unsound Papers that Fool LLM Reviewers?

Paper • 2510.18003 • Published Oct 20, 2025

Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?

Paper • 2605.12684 • Published May 12 • 11

updated a dataset 8 months ago

VisualSphinx/VisualSphinx-V1-Raw-Panels

Viewer • Updated Oct 30, 2025 • 110k • 595

published a dataset 8 months ago

VisualSphinx/VisualSphinx-V1-Raw-Panels

Viewer • Updated Oct 30, 2025 • 110k • 595

authored a paper 9 months ago

TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments

Paper • 2510.01179 • Published Oct 1, 2025 • 29

updated a Space 11 months ago

README

updated 5 datasets about 1 year ago

VisualSphinx/VisualSphinx-V1-Rules

Viewer • Updated Jun 4, 2025 • 102k • 53

VisualSphinx/VisualSphinx-Seeds

Viewer • Updated Jun 4, 2025 • 6.25k • 74 • 1

VisualSphinx/VisualSphinx-V1-Benchmark

Viewer • Updated Jun 4, 2025 • 935 • 59

VisualSphinx/VisualSphinx-V1-RL-20K

Viewer • Updated Jun 4, 2025 • 20k • 82 • 2

VisualSphinx/VisualSphinx-V1-Raw

Viewer • Updated Jun 4, 2025 • 662k • 755 • 5

in VisualSphinx/VisualSphinx-V1-Raw about 1 year ago

Update task category

#3 opened about 1 year ago by

Add task category

#2 opened about 1 year ago by