Add SWE-bench Pro evaluation result (50.7%)

#101
by SaylorTwift HF Staff - opened
No description provided.

What method do you use for reasoning and evaluation? My score is only around 30.

We report the result from the tech report of the model

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment