Spaces:
Running
Running
File size: 3,425 Bytes
3040bf7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | # APIShift Data Pyramid
Three layers of training and evaluation data.
## Layer 1 β Real scraped OpenAPI version pairs
Scraped from public Git histories of five widely used vendor APIs using
`scripts/scrape_specs.py`. All spec files are extracted verbatim from
tagged commits in the vendors' official OpenAPI repos.
| Provider | Source repo | Spec format | Versions scraped | (v1,v2) pairs |
|----------|-------------|-------------|-----------------|---------------|
| Stripe | stripe/openapi | JSON (spec3.sdk.json) | ~80 | ~240 |
| GitHub | github/rest-api-description | YAML | ~50 | ~150 |
| Twilio | twilio/twilio-oai | YAML (3 product lines) | ~40Γ3 | ~360 |
| Slack | slackapi/slack-api-specs | JSON | ~30 | ~80 |
| OpenAI | openai/openai-openapi | YAML | ~20 | ~50 |
**Total: ~200 spec versions β ~840+ real (v1, v2) pairs**
Each pair is generated three ways:
- **Adjacent** (nβn+1): realistic "I stayed one version behind" case
- **Skip-one** (nβn+2): "I missed a release" case
- **Long-range** (random 5-20 gap): "I am very far behind" case
Spec files live in `scenarios/layer1_real/<provider>/`.
The global index is `scenarios/layer1_real/_global_index.json`.
To reproduce:
```bash
python scripts/scrape_specs.py --out scenarios/layer1_real
python scripts/extract_client_samples.py --out scenarios/layer1_real
python scripts/build_pair_index.py --out scenarios/layer1_real
```
## Layer 2 β Synthetic perturbation
`scenarios/layer2_synthetic/mutator.py` takes a real OpenAPI spec and
applies one or more typed mutations from a 12-class taxonomy:
| Mutation class | Severity |
|----------------|---------|
| field_renamed | low |
| type_narrowed | medium |
| required_field_added | medium |
| endpoint_removed | high |
| enum_narrowed | high |
| response_shape_changed | medium |
| auth_scheme_changed | high |
| field_removed | high |
| param_required_added | medium |
| default_changed | low |
| method_changed | high |
| status_code_removed | high |
Each mutation produces a `(mutated_spec, ground_truth_change_record)` pair
so the grader has reliable labels.
When `layer1_real/` is populated, the mutator picks a random real spec from
any of the five providers as the mutation base β giving 200+ unique starting
points instead of one hand-typed seed, for a combinatorial expansion to
40,000+ unique synthetic scenarios.
Used for: bulk training (difficulty weights sampled by the CurriculumAgent).
## Layer 3 β Held-out evaluation
A locked set of famous real migrations the agent never sees during training:
- Stripe v22 β v23 (webhook signature change)
- GitHub PAT deprecation (token β bearer)
- Twilio Messages.json β /v2/messaging restructure
- Slack auth.revoke β admin.tokens.revoke deprecation
- Stripe v18 invoice field rename
These are the five inline seed scenarios in `scenarios/library.py`, plus the
20 most migration-rich adjacent pairs from the scraped set that were carved
out into `scenarios/layer3_holdout/` after the scrape.
Used for: the final benchmark reported in the README.
## Real-data discipline
Every Layer 1 and Layer 3 scenario links back to a real public Git commit
in a vendor's official OpenAPI repo. Layer 2 mutations are typed against the
documented OpenAPI breaking change taxonomy. Nothing is fabricated.
Deterministic seeds (per-provider sort + fixed random seed 42) make every
training run reproducible from a fresh clone.
|