--- license: mit --- # SWE-Bench Trajectory Eval Bundle (v1) Companion artifact for the trajectory-probe downstream eval of the code-graph-v7 encoders (W1, I6, ...). ## Contents - `traj_full_bundle.tar.gz` (488 MB) — contains: - `specs.jsonl`: 2456 SWE-Bench Verified agent trajectories harvested from `swe-bench-submissions` S3 bucket. Fields: instance_id, traj_id, repo, base_commit, patches (1 entry = final model patch), resolved. - `repos/`: shallow (`--filter=blob:none`) clones of the 12 target repos (django, sympy, sphinx, matplotlib, scikit-learn, astropy, xarray, pytest, pylint, requests, seaborn, flask). ~671 MB uncompressed. Blobs pulled lazily per base_commit checkout. - `graphjepa/`: pipeline code (trajectory_pipeline, trajectory_realize, trajectory_probe, trajectory_harvest) plus scripts/trajectory_full.sh. - `harvest.log` — stdout from the S3 harvester that produced specs.jsonl. ## Downstream workflow ```bash tar -xzf traj_full_bundle.tar.gz rsync -a traj_full/graphjepa/ graphjepa/ mkdir -p outputs/traj_real cp traj_full/specs.jsonl outputs/traj_real/ mv traj_full/repos outputs/traj_real/repos # realize (4 sharded workers by repo) SHARDS=4 bash graphjepa/scripts/trajectory_full.sh tail -f outputs/traj_real/logs/realize_shard*.log # merge manifests + probe with each encoder cat outputs/traj_real/manifest_shard*.jsonl > outputs/traj_real/manifest.jsonl for NAME in W1_softplus_s0 I6_joint_s0; do .venv/bin/python -m graphjepa.trajectory_probe \ --manifest outputs/traj_real/manifest.jsonl \ --ckpt outputs/$NAME/ckpt_final.pt \ --pool mean --split-by repo \ --output outputs/traj_real/probe_${NAME}.json done ``` ## Provenance Specs harvested from 5 SWE-Bench Verified submissions: | Submission | N | Resolved | Rate | |---|---|---|---| | 20240620_sweagent_claude3.5sonnet | 485 | 168 | 34.6% | | 20241022_tools_claude-3-5-sonnet-updated | 483 | 245 | 50.7% | | 20241028_agentless-1.5_gpt4o | 495 | 194 | 39.2% | | 20241029_OpenHands-CodeAct-2.1-sonnet | 493 | 265 | 53.8% | | 20250405_amazon-q-developer-2025 | 500 | 330 | 66.0% | | **total** | **2456** | **1202** | **48.9%** | 500 unique instance_ids, 499 unique base_commits (median 5 trajectories per commit — different agents attempting the same task).