Spaces:
Running
Running
| # APIShift Data Pyramid | |
| Three layers of training and evaluation data. | |
| ## Layer 1 β Real scraped OpenAPI version pairs | |
| Scraped from public Git histories of five widely used vendor APIs using | |
| `scripts/scrape_specs.py`. All spec files are extracted verbatim from | |
| tagged commits in the vendors' official OpenAPI repos. | |
| | Provider | Source repo | Spec format | Versions scraped | (v1,v2) pairs | | |
| |----------|-------------|-------------|-----------------|---------------| | |
| | Stripe | stripe/openapi | JSON (spec3.sdk.json) | ~80 | ~240 | | |
| | GitHub | github/rest-api-description | YAML | ~50 | ~150 | | |
| | Twilio | twilio/twilio-oai | YAML (3 product lines) | ~40Γ3 | ~360 | | |
| | Slack | slackapi/slack-api-specs | JSON | ~30 | ~80 | | |
| | OpenAI | openai/openai-openapi | YAML | ~20 | ~50 | | |
| **Total: ~200 spec versions β ~840+ real (v1, v2) pairs** | |
| Each pair is generated three ways: | |
| - **Adjacent** (nβn+1): realistic "I stayed one version behind" case | |
| - **Skip-one** (nβn+2): "I missed a release" case | |
| - **Long-range** (random 5-20 gap): "I am very far behind" case | |
| Spec files live in `scenarios/layer1_real/<provider>/`. | |
| The global index is `scenarios/layer1_real/_global_index.json`. | |
| To reproduce: | |
| ```bash | |
| python scripts/scrape_specs.py --out scenarios/layer1_real | |
| python scripts/extract_client_samples.py --out scenarios/layer1_real | |
| python scripts/build_pair_index.py --out scenarios/layer1_real | |
| ``` | |
| ## Layer 2 β Synthetic perturbation | |
| `scenarios/layer2_synthetic/mutator.py` takes a real OpenAPI spec and | |
| applies one or more typed mutations from a 12-class taxonomy: | |
| | Mutation class | Severity | | |
| |----------------|---------| | |
| | field_renamed | low | | |
| | type_narrowed | medium | | |
| | required_field_added | medium | | |
| | endpoint_removed | high | | |
| | enum_narrowed | high | | |
| | response_shape_changed | medium | | |
| | auth_scheme_changed | high | | |
| | field_removed | high | | |
| | param_required_added | medium | | |
| | default_changed | low | | |
| | method_changed | high | | |
| | status_code_removed | high | | |
| Each mutation produces a `(mutated_spec, ground_truth_change_record)` pair | |
| so the grader has reliable labels. | |
| When `layer1_real/` is populated, the mutator picks a random real spec from | |
| any of the five providers as the mutation base β giving 200+ unique starting | |
| points instead of one hand-typed seed, for a combinatorial expansion to | |
| 40,000+ unique synthetic scenarios. | |
| Used for: bulk training (difficulty weights sampled by the CurriculumAgent). | |
| ## Layer 3 β Held-out evaluation | |
| A locked set of famous real migrations the agent never sees during training: | |
| - Stripe v22 β v23 (webhook signature change) | |
| - GitHub PAT deprecation (token β bearer) | |
| - Twilio Messages.json β /v2/messaging restructure | |
| - Slack auth.revoke β admin.tokens.revoke deprecation | |
| - Stripe v18 invoice field rename | |
| These are the five inline seed scenarios in `scenarios/library.py`, plus the | |
| 20 most migration-rich adjacent pairs from the scraped set that were carved | |
| out into `scenarios/layer3_holdout/` after the scrape. | |
| Used for: the final benchmark reported in the README. | |
| ## Real-data discipline | |
| Every Layer 1 and Layer 3 scenario links back to a real public Git commit | |
| in a vendor's official OpenAPI repo. Layer 2 mutations are typed against the | |
| documented OpenAPI breaking change taxonomy. Nothing is fabricated. | |
| Deterministic seeds (per-provider sort + fixed random seed 42) make every | |
| training run reproducible from a fresh clone. | |