Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2602.23866

SWE-rebench-V2 is a curated dataset of software-engineering tasks derived from real GitHub issues and pull requests.

about 20 hours ago

nebius/SWE-rebench-V2

Viewer • Updated about 20 hours ago • 32.1k • 28 • 13
nebius/SWE-rebench-V2-PRs

Viewer • Updated about 20 hours ago • 126k • 23 • 6
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale

Paper • 2602.23866 • Published 5 days ago • 48

about 6 hours ago

SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale

Paper • 2602.23866 • Published 5 days ago • 48
nebius/SWE-rebench-V2

Viewer • Updated about 20 hours ago • 32.1k • 28 • 13

about 9 hours ago

The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs

Paper • 2506.18403 • Published Jun 23, 2025 • 3
ReCode: Updating Code API Knowledge with Reinforcement Learning

Paper • 2506.20495 • Published Jun 25, 2025 • 10
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution

Paper • 2507.23348 • Published Jul 31, 2025 • 12
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering

Paper • 2509.09614 • Published Sep 11, 2025 • 7

about 9 hours ago

GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond

Paper • 2309.16583 • Published Sep 28, 2023 • 13
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 57
SO-Bench: A Structural Output Evaluation of Multimodal LLMs

Paper • 2511.21750 • Published Nov 23, 2025 • 6
LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics

Paper • 2512.21010 • Published Dec 24, 2025 • 4

agentic-coding-bench

about 3 hours ago

SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale

Paper • 2602.23866 • Published 5 days ago • 48

My notification

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Paper • 2601.15369 • Published Jan 21 • 21
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model

Paper • 2601.15892 • Published Jan 22 • 53
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Paper • 2601.16208 • Published Jan 22 • 53
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems

Paper • 2601.11004 • Published Jan 16 • 30

about 11 hours ago

DeepDistill: Enhancing LLM Reasoning Capabilities via Large-Scale Difficulty-Graded Data Training

Paper • 2504.17565 • Published Apr 24, 2025 • 2
AI-MO/NuminaMath-1.5

Viewer • Updated Jan 29 • 896k • 2.63k • 175
PrimeIntellect/synthetic-code-understanding

Viewer • Updated Feb 15, 2025 • 60.6k • 48 • 19
Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data

Paper • 2507.07095 • Published Jul 9, 2025 • 56

SWE-rebench-V2 is a curated dataset of software-engineering tasks derived from real GitHub issues and pull requests.

about 20 hours ago

nebius/SWE-rebench-V2

Viewer • Updated about 20 hours ago • 32.1k • 28 • 13
nebius/SWE-rebench-V2-PRs

Viewer • Updated about 20 hours ago • 126k • 23 • 6
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale

Paper • 2602.23866 • Published 5 days ago • 48

agentic-coding-bench

about 3 hours ago

SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale

Paper • 2602.23866 • Published 5 days ago • 48

about 6 hours ago

SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale

Paper • 2602.23866 • Published 5 days ago • 48
nebius/SWE-rebench-V2

Viewer • Updated about 20 hours ago • 32.1k • 28 • 13

My notification

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Paper • 2601.15369 • Published Jan 21 • 21
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model

Paper • 2601.15892 • Published Jan 22 • 53
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Paper • 2601.16208 • Published Jan 22 • 53
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems

Paper • 2601.11004 • Published Jan 16 • 30

about 9 hours ago

The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs

Paper • 2506.18403 • Published Jun 23, 2025 • 3
ReCode: Updating Code API Knowledge with Reinforcement Learning

Paper • 2506.20495 • Published Jun 25, 2025 • 10
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution

Paper • 2507.23348 • Published Jul 31, 2025 • 12
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering

Paper • 2509.09614 • Published Sep 11, 2025 • 7

about 11 hours ago

DeepDistill: Enhancing LLM Reasoning Capabilities via Large-Scale Difficulty-Graded Data Training

Paper • 2504.17565 • Published Apr 24, 2025 • 2
AI-MO/NuminaMath-1.5

Viewer • Updated Jan 29 • 896k • 2.63k • 175
PrimeIntellect/synthetic-code-understanding

Viewer • Updated Feb 15, 2025 • 60.6k • 48 • 19
Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data

Paper • 2507.07095 • Published Jul 9, 2025 • 56

about 9 hours ago

GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond

Paper • 2309.16583 • Published Sep 28, 2023 • 13
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 57
SO-Bench: A Structural Output Evaluation of Multimodal LLMs

Paper • 2511.21750 • Published Nov 23, 2025 • 6
LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics

Paper • 2512.21010 • Published Dec 24, 2025 • 4

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs