Spaces:

yaswanth169
/

apishift-env

Running

App Files Files Community

apishift-env / docs /DATA_PYRAMID.md

yaswanth169

Initial APIShift env push

3040bf7 verified about 1 month ago

preview code

raw

history blame contribute delete

3.43 kB

APIShift Data Pyramid

Three layers of training and evaluation data.

Layer 1 — Real scraped OpenAPI version pairs

Scraped from public Git histories of five widely used vendor APIs using scripts/scrape_specs.py. All spec files are extracted verbatim from tagged commits in the vendors' official OpenAPI repos.

Provider	Source repo	Spec format	Versions scraped	(v1,v2) pairs
Stripe	stripe/openapi	JSON (spec3.sdk.json)	~80	~240
GitHub	github/rest-api-description	YAML	~50	~150
Twilio	twilio/twilio-oai	YAML (3 product lines)	~40×3	~360
Slack	slackapi/slack-api-specs	JSON	~30	~80
OpenAI	openai/openai-openapi	YAML	~20	~50

Total: ~200 spec versions → ~840+ real (v1, v2) pairs

Each pair is generated three ways:

Adjacent (n→n+1): realistic "I stayed one version behind" case
Skip-one (n→n+2): "I missed a release" case
Long-range (random 5-20 gap): "I am very far behind" case

Spec files live in scenarios/layer1_real/<provider>/. The global index is scenarios/layer1_real/_global_index.json.

To reproduce:

python scripts/scrape_specs.py --out scenarios/layer1_real
python scripts/extract_client_samples.py --out scenarios/layer1_real
python scripts/build_pair_index.py --out scenarios/layer1_real

Layer 2 — Synthetic perturbation

scenarios/layer2_synthetic/mutator.py takes a real OpenAPI spec and applies one or more typed mutations from a 12-class taxonomy:

Mutation class	Severity
field_renamed	low
type_narrowed	medium
required_field_added	medium
endpoint_removed	high
enum_narrowed	high
response_shape_changed	medium
auth_scheme_changed	high
field_removed	high
param_required_added	medium
default_changed	low
method_changed	high
status_code_removed	high

Each mutation produces a (mutated_spec, ground_truth_change_record) pair so the grader has reliable labels.

When layer1_real/ is populated, the mutator picks a random real spec from any of the five providers as the mutation base — giving 200+ unique starting points instead of one hand-typed seed, for a combinatorial expansion to 40,000+ unique synthetic scenarios.

Used for: bulk training (difficulty weights sampled by the CurriculumAgent).

Layer 3 — Held-out evaluation

A locked set of famous real migrations the agent never sees during training:

Stripe v22 → v23 (webhook signature change)
GitHub PAT deprecation (token → bearer)
Twilio Messages.json → /v2/messaging restructure
Slack auth.revoke → admin.tokens.revoke deprecation
Stripe v18 invoice field rename

These are the five inline seed scenarios in scenarios/library.py, plus the 20 most migration-rich adjacent pairs from the scraped set that were carved out into scenarios/layer3_holdout/ after the scrape.

Used for: the final benchmark reported in the README.

Real-data discipline

Every Layer 1 and Layer 3 scenario links back to a real public Git commit in a vendor's official OpenAPI repo. Layer 2 mutations are typed against the documented OpenAPI breaking change taxonomy. Nothing is fabricated. Deterministic seeds (per-provider sort + fixed random seed 42) make every training run reproducible from a fresh clone.