Django, Sympy and Sphinx generated datasets not released?
Hi,
The SERA paper section on repository specialization says:
"""
To emulate this scenario, we use SERA to generate data from the three largest repositories in
SWE-Bench Verified: Django, Sympy, and Sphinx...
Aggregating across commits, we obtain between 46,000 and 54,000 trajectories for each repository combined
across both rollouts. ... we train on 8,000 trajectories per repository rather than the full dataset; however, we
release all generated trajectories to enable future research to explore larger-scale specialization
"""
I did not see the SVG data / trajectories for Djano, Sympy, or Sphinx in the 6 SERA related datasets posted in: https://huggingface.co/collections/allenai/open-coding-agents. Are you still planning to release this data? If so, do you have any estimated timeline?
Thank you,
Robert
@robert-neoteny-ai here's the link: https://huggingface.co/collections/allenai/open-coding-agents-specialization.
Sorry for the delay! Let me know if you have any questions. We run SVG twice per Sphinx function because Sphinx has a smaller codebase. We notice no degradation doing this.