Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
ibragim-badΒ 
posted an update 2 days ago
Post
280
πŸŽ„ 67,074 Qwen3-Coder OpenHands trajectories + 2 RFT checkpoints.

We release: 67,000+ trajectories from 3,800 resolved issues in 1,800+ Python repos.
About 3x more successful trajectories and 1.5x more repos than our previous dataset.
Trajectories are long: on average 64 turns, up to 100 turns and 131k context length.

> RFT on this data, SWE-bench Verified:
Qwen3-30B-Instruct: 25.7% β†’ 50.3% Pass@1.
Qwen3-235B-Instruct: 46.2% β†’ 61.7% Pass@1.
Also strong gains on SWE-rebench September.

> We also did massive evals.
We run OpenHands with 100 and 500 turns.
We compare models under both limits.
We run on SWE-bench Verified and several months of SWE-rebench.

!!! We also check tests written by the models.
We measure how often tests are correct.
We check how often the final patch passes its own tests.
This gives a pool of tests for verifiers and auto graders.

> Fully permissive licenses
Dataset and models: https://huggingface.co/collections/nebius/openhands-trajectories

Blog post: https://nebius.ai/blog/posts/openhands-trajectories-with-qwen3-instruct

huggingface_case01

In this post