payops_env / TESTING.md
padmapriyagosakan's picture
fix: align step_reward with grade_episode, pin deps, update docs, clean inference
3f78483
|
Raw
History Blame Contribute Delete
25.6 kB

PayOps Environment — Test Cases & Testing Guide

This document covers every testable behaviour of the PayOps OpenEnv, organised by endpoint and scenario. Each test shows the exact command to run, the expected response, and what a failure looks like.


Prerequisites

# 1. Start the server (run from /Users/padmapriya)
PYTHONPATH=/Users/padmapriya uvicorn payops_env.server.app:app --host 0.0.0.0 --port 8000

# 2. Confirm it is up (should return {"status":"ok",...})
curl -s http://localhost:8000/health

All curl commands below assume the server is running on localhost:8000.


T-01 Health Check

Goal: Confirm the server is alive and returns version metadata.

curl -s http://localhost:8000/health

Expected output

{"status": "ok", "environment": "payops_env", "version": "2.0.0"}

Failure indicator: Connection refused, or any field missing / wrong value.


T-02 Schema Endpoint

Goal: Verify that action, observation, and state JSON schemas are served correctly.

curl -s http://localhost:8000/schema | python3 -m json.tool

Expected output (condensed)

{
  "action":      { "title": "PayOpsAction", "type": "object", ... },
  "observation": { "title": "PayOpsObservation", "type": "object", ... },
  "state":       { "title": "PayOpsState", "type": "object", ... }
}

Checks to verify manually:

  • action.properties includes action_type, transaction_id, reason, confidence
  • observation.properties includes risk_score, flags, kyc_status, velocity_1h
  • HTTP status code is 200

T-03 Tasks Endpoint

Goal: Confirm all 20 tasks are returned with the correct difficulty distribution.

curl -s http://localhost:8000/tasks | python3 -c "
import sys, json
d = json.load(sys.stdin)
print('Total tasks:', d['count'])
from collections import Counter
c = Counter(t['difficulty'] for t in d['tasks'])
print('By difficulty:', dict(c))
print()
for t in d['tasks']:
    print(f\"  {t['task_id']:12} [{t['difficulty']:8}] correct={t['correct_action']}\")
"

Expected output

Total tasks: 20
By difficulty: {'easy': 4, 'medium': 6, 'hard': 6, 'critical': 4}

  EASY-001     [easy    ] correct=approve
  EASY-002     [easy    ] correct=reject
  EASY-003     [easy    ] correct=approve
  EASY-004     [easy    ] correct=flag
  MED-001      [medium  ] correct=escalate
  MED-002      [medium  ] correct=hold
  MED-003      [medium  ] correct=flag
  MED-004      [medium  ] correct=flag
  MED-005      [medium  ] correct=hold
  MED-006      [medium  ] correct=escalate
  HARD-001     [hard    ] correct=escalate
  HARD-002     [hard    ] correct=reject
  HARD-003     [hard    ] correct=reject
  HARD-004     [hard    ] correct=approve
  HARD-005     [hard    ] correct=escalate
  HARD-006     [hard    ] correct=flag
  CRIT-001     [critical] correct=approve
  CRIT-002     [critical] correct=reject
  CRIT-003     [critical] correct=escalate
  CRIT-004     [critical] correct=reject

Note: correct_action values for jitter-variant tasks (EASY-004, MED-001/003/004/006, HARD-001/006, CRIT-001/003/004) may differ per episode seed — the above shows default values.

Failure indicator: count != 20, missing difficulty tier, wrong correct_action.


T-04 Reset

Goal: Reset the environment and confirm the first task is EASY-001.

curl -s -X POST http://localhost:8000/reset | python3 -c "
import sys, json
d = json.load(sys.stdin)
print('task_id          :', d['task_id'])
print('transaction_id   :', d['transaction_id'])
print('difficulty        :', d['task_difficulty'])
print('status            :', d['status'])
print('done              :', d['done'])
print('reward            :', d['reward'])
print('cumulative_reward :', d['cumulative_reward'])
print('risk_score        :', d['risk_score'])
"

Expected output

task_id          : EASY-001
transaction_id   : TXN-E001
difficulty        : easy
status            : pending
done              : false
reward            : 0.0
cumulative_reward : 0.0
risk_score        : 0.05

Failure indicator: done=true, reward != 0, wrong task_id.


T-05 Correct Action — Full Credit (+1.0)

Goal: Submit the correct action for EASY-001 (approve) and receive reward +1.0.

curl -s -X POST http://localhost:8000/reset > /dev/null   # fresh start

curl -s -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{"action_type":"approve","transaction_id":"TXN-E001"}' \
| python3 -c "
import sys, json
d = json.load(sys.stdin)
print('reward           :', d['reward'])
print('correct info     :', d['info'].get('correct_action'))
print('action taken     :', d['info'].get('action_taken'))
"

Expected output

reward           : 1.0
correct info     : approve
action taken     : approve

T-06 Wrong Action — Penalty (approve on fraud = -1.0)

Goal: Skip to EASY-002 (textbook fraud) and approve it. Expect -1.0 penalty.

curl -s -X POST http://localhost:8000/reset > /dev/null
# Step past EASY-001
curl -s -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{"action_type":"approve","transaction_id":"TXN-E001"}' > /dev/null

# Now on EASY-002 (correct=reject). Try approving it.
curl -s -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{"action_type":"approve","transaction_id":"TXN-E002"}' \
| python3 -c "
import sys, json
d = json.load(sys.stdin)
print('reward       :', d['reward'])
print('correct was  :', d['info'].get('correct_action'))
"

Expected output

reward       : -1.0
correct was  : reject

T-07 Partial Credit Action

Goal: On MED-001 (correct=escalate), submit flag — should earn +0.5 partial credit.

curl -s -X POST http://localhost:8000/reset > /dev/null
# Step through EASY tasks 1-4 with any actions
for ACTION in approve reject approve reject; do
  curl -s -X POST http://localhost:8000/step \
    -H "Content-Type: application/json" \
    -d "{\"action_type\":\"$ACTION\",\"transaction_id\":\"dummy\"}" > /dev/null
done

# Now on MED-001 (correct=escalate). Submit flag.
curl -s -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{"action_type":"flag","transaction_id":"TXN-M001"}' \
| python3 -c "
import sys, json
d = json.load(sys.stdin)
print('reward       :', d['reward'])
print('correct was  :', d['info'].get('correct_action'))
print('partial?     :', 0 < d['reward'] < 1.0)
"

Expected output

reward       : 0.5
correct was  : escalate
partial?     : True

T-08 Inspect Action — Information Reveal

Goal: Use inspect on EASY-001 to receive investigation notes and a small reward (+0.15). The episode should NOT advance (still on same transaction).

curl -s -X POST http://localhost:8000/reset > /dev/null

curl -s -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{"action_type":"inspect","transaction_id":"TXN-E001"}' \
| python3 -c "
import sys, json
d = json.load(sys.stdin)
print('reward            :', d['reward'])
print('status            :', d['status'])
print('task_id           :', d['task_id'])    # should still be EASY-001
print('inspection_notes  :', d['inspection_notes'])
"

Expected output

reward            : 0.15
status            : inspected
task_id           : EASY-001
inspection_notes  : Sender account opened 3 years ago. Consistent transaction history. KYC fully verified.

T-09 Double Inspect — No Double-Dipping

Goal: Inspect the same transaction twice. Second inspect should return reward 0.0 (already inspected).

curl -s -X POST http://localhost:8000/reset > /dev/null

# First inspect — reward 0.15
curl -s -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{"action_type":"inspect","transaction_id":"TXN-E001"}' \
| python3 -c "import sys,json; d=json.load(sys.stdin); print('First inspect reward:', d['reward'])"

# Second inspect — reward 0.0
curl -s -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{"action_type":"inspect","transaction_id":"TXN-E001"}' \
| python3 -c "import sys,json; d=json.load(sys.stdin); print('Second inspect reward:', d['reward'])"

Expected output

First inspect reward: 0.15
Second inspect reward: 0.0

T-10 Invalid Action Type

Goal: Send an unsupported action type and receive a 422 validation error.

curl -s -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{"action_type":"delete","transaction_id":"TXN-E001"}' \
| python3 -c "import sys,json; d=json.load(sys.stdin); print('status_code:', d.get('detail','')[:60])"

Expected output

status_code: Invalid action_type 'delete'. Valid values: ['approve', 'escal

HTTP status code should be 422.


T-11 Step Without Reset

Goal: Call /step without calling /reset first. Should return a 400 error.

# Kill and restart server to guarantee clean state
# Then immediately step without reset:
curl -s -o /dev/null -w "%{http_code}" -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{"action_type":"approve","transaction_id":"TXN-E001"}'

Expected output

400

T-12 State Endpoint Tracking

Goal: Confirm /state reflects the episode progress correctly.

curl -s -X POST http://localhost:8000/reset > /dev/null

curl -s http://localhost:8000/state | python3 -c "
import sys, json
d = json.load(sys.stdin)
print('step_count            :', d['step_count'])
print('transactions_processed:', d['transactions_processed'])
print('total_tasks           :', d['total_tasks'])
print('done                  :', d['done'])
"

# Take one step
curl -s -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{"action_type":"approve","transaction_id":"TXN-E001"}' > /dev/null

curl -s http://localhost:8000/state | python3 -c "
import sys, json
d = json.load(sys.stdin)
print('step_count            :', d['step_count'])
print('transactions_processed:', d['transactions_processed'])
print('last_action           :', d['last_action'])
print('cumulative_reward     :', d['cumulative_reward'])
"

Expected output (before step)

step_count            : 0
transactions_processed: 0
total_tasks           : 12
done                  : false

Expected output (after step)

step_count            : 1
transactions_processed: 1
last_action           : approve
cumulative_reward     : 1.0

T-13 Complete Episode — Done Flag

Goal: Step through all 12 tasks and confirm done=true on the last step.

curl -s -X POST http://localhost:8000/reset > /dev/null

python3 - <<'EOF'
import httpx, asyncio

BASE = "http://localhost:8000"
ACTIONS = [
    "approve","reject","approve","flag",   # easy
    "escalate","hold","flag","flag",       # medium
    "escalate","reject","reject","approve" # hard (perfect sequence)
]

client = httpx.Client()
txn_ids = [t["transaction_id"] for t in client.get(f"{BASE}/tasks").json()["tasks"]]

for i, (action, txn) in enumerate(zip(ACTIONS, txn_ids)):
    resp = client.post(f"{BASE}/step", json={"action_type": action, "transaction_id": txn}).json()
    print(f"Step {i+1:2d}  {txn:12}  action={action:10}  reward={resp['reward']:+.2f}  done={resp['done']}")

client.close()
EOF

Expected output (last line)

Step 12  TXN-H004     action=approve      reward=+1.00  done=True

All other steps should show done=False.


T-14 Grader Endpoint

Goal: Grade and score the episode immediately after completing all steps.

# Run the perfect sequence first (T-13 above), then:
curl -s http://localhost:8000/grader | python3 -c "
import sys, json
d = json.load(sys.stdin)
print('total_reward      :', d['total_reward'])
print('max_possible      :', d['max_possible_reward'])
print('normalised_score  :', d['normalised_score'])
print('passed            :', d['passed'])
print()
for t in d['per_task']:
    mark = '✓' if t['correct'] else '✗'
    print(f\"  {mark} {t['task_id']:12} action={t['action_taken']:10} correct={t['correct_action']:10} reward={t['reward']:+.2f}\")
"

Expected output (perfect run)

total_reward      : 12.0
max_possible      : 12.0
normalised_score  : 1.0
passed            : True

  ✓ EASY-001     action=approve     correct=approve     reward=+1.00
  ✓ EASY-002     action=reject      correct=reject      reward=+1.00
  ...
  ✓ HARD-004     action=approve     correct=approve     reward=+1.00

T-15 Grader Without Episode

Goal: Call /grader before any steps — should return a 400 error.

curl -s -X POST http://localhost:8000/reset > /dev/null
curl -s http://localhost:8000/grader

Expected output

{"error": "No actions recorded. Run /reset then /step first."}

T-16 Baseline Endpoint

Goal: Confirm /baseline runs the rule-based agent and returns a normalised score ≥ 0.5.

curl -s -X POST http://localhost:8000/baseline | python3 -c "
import sys, json
d = json.load(sys.stdin)
print('normalised_score :', d['normalised_score'])
print('total_reward     :', d['total_reward'])
print('steps            :', d['steps'])
print('passed (>=0.5)   :', d['normalised_score'] >= 0.5)
print()
for t in d['scores']:
    mark = '✓' if t['correct'] else '✗'
    print(f\"  {mark} {t['task_id']:12} [{t['difficulty']:6}] action={t['action_taken']:10} reward={t['reward']:+.2f}\")
"

Expected output

normalised_score : 0.7292
total_reward     : 8.75
steps            : 12
passed (>=0.5)   : True

  ✓ EASY-001     [easy  ] action=approve     reward=+1.00
  ✓ EASY-002     [easy  ] action=reject      reward=+1.00
  ...

T-17 Optional Reason and Confidence Fields

Goal: Submit an action with optional reason and confidence. Should be accepted without error.

curl -s -X POST http://localhost:8000/reset > /dev/null

curl -s -X POST http://localhost:8000/step \
  -H "Content-Type: application/json" \
  -d '{
    "action_type": "flag",
    "transaction_id": "TXN-E001",
    "reason": "Low risk but flagging for audit trail verification",
    "confidence": 0.82
  }' \
| python3 -c "import sys,json; d=json.load(sys.stdin); print('HTTP OK, reward:', d['reward'])"

Expected output

HTTP OK, reward: 0.3

(0.3 = partial credit for flagging when correct action is approve)


T-18 Hard Task — Model Poisoning (HARD-001)

Goal: Demonstrate that the low risk_score (0.18) is a trap. The correct action is escalate.

curl -s -X POST http://localhost:8000/reset > /dev/null
# Advance to HARD-001 (task 9) — step through easy + medium with dummy actions
python3 - <<'EOF'
import httpx
BASE = "http://localhost:8000"
c = httpx.Client()
tasks = c.get(f"{BASE}/tasks").json()["tasks"]

# Steps 1-8: easy + medium
for t in tasks[:8]:
    c.post(f"{BASE}/step", json={"action_type": "approve", "transaction_id": t["transaction_id"]})

# Step 9: HARD-001 — agent trusts the low risk_score and approves (wrong!)
resp_wrong = c.post(f"{BASE}/step", json={
    "action_type": "approve", "transaction_id": "TXN-H001"
}).json()
print("Trusted ML score → approve")
print("  reward   :", resp_wrong["reward"])   # expect -0.5
print("  correct  :", resp_wrong["info"]["correct_action"])

# Reset and do it correctly
c.post(f"{BASE}/reset")
for t in tasks[:8]:
    c.post(f"{BASE}/step", json={"action_type": "approve", "transaction_id": t["transaction_id"]})

resp_correct = c.post(f"{BASE}/step", json={
    "action_type": "escalate", "transaction_id": "TXN-H001"
}).json()
print("\nOverrode ML score → escalate")
print("  reward   :", resp_correct["reward"])  # expect +1.0
c.close()
EOF

Expected output

Trusted ML score → approve
  reward   : -0.5
  correct  : escalate

Overrode ML score → escalate
  reward   : 1.0

T-19 Inspect Reveals Hidden Context (HARD-001)

Goal: Inspect HARD-001 to reveal the mule-account intelligence note before deciding.

curl -s -X POST http://localhost:8000/reset > /dev/null
# Advance to HARD-001
python3 - <<'EOF'
import httpx
BASE = "http://localhost:8000"
c = httpx.Client()
tasks = c.get(f"{BASE}/tasks").json()["tasks"]
for t in tasks[:8]:
    c.post(f"{BASE}/step", json={"action_type": "approve", "transaction_id": t["transaction_id"]})

# Inspect HARD-001
resp = c.post(f"{BASE}/step", json={
    "action_type": "inspect", "transaction_id": "TXN-H001"
}).json()
print("Inspect reward :", resp["reward"])
print("Notes          :", resp["inspection_notes"])
c.close()
EOF

Expected output

Inspect reward : 0.15
Notes          : Account created 7 days ago. This is the first outbound transfer. Receiver matches a pattern of solicitor-impersonation mule accounts flagged in last month's intelligence bulletin. Risk model underscored due to clean transaction history (new account).

T-20 WebSocket Session

Goal: Run a full reset → step sequence over the WebSocket endpoint.

pip install websockets -q   # if not already installed

python3 - <<'EOF'
import asyncio, json, websockets

async def test_ws():
    uri = "ws://localhost:8000/ws"
    async with websockets.connect(uri) as ws:
        # Reset
        await ws.send(json.dumps({"type": "reset"}))
        obs = json.loads(await ws.recv())
        print("Reset  →", obs["transaction_id"], "risk:", obs["risk_score"])

        # Step – approve
        await ws.send(json.dumps({
            "type": "step",
            "action_type": "approve",
            "transaction_id": obs["transaction_id"]
        }))
        obs2 = json.loads(await ws.recv())
        print("Step   →", "reward:", obs2["reward"], "next:", obs2["transaction_id"])

        # State
        await ws.send(json.dumps({"type": "state"}))
        state = json.loads(await ws.recv())
        print("State  →", "steps:", state["step_count"], "txns:", state["transactions_processed"])

asyncio.run(test_ws())
EOF

Expected output

Reset  → TXN-E001 risk: 0.05
Step   → reward: 1.0 next: TXN-E002
State  → steps: 1 txns: 1

T-21 Baseline Agent Script (Standalone)

Goal: Run the standalone Python baseline script independently of the server.

cd /Users/padmapriya
PYTHONPATH=/Users/padmapriya python3 payops_env/scripts/baseline_agent.py

Expected output (last few lines) ```

Episode Summary

Steps : 12 Total reward : +8.75 Max possible : 12.00 Normalised score : 0.7292 Passed (≥0.5) : YES ✓


---

## T-22  All Actions Are Valid on Each Task

**Goal:** Confirm every action type is accepted without error (even if penalised).

```bash
python3 - <<'EOF'
import httpx

BASE = "http://localhost:8000"
ACTIONS = ["approve", "reject", "flag", "escalate", "inspect", "hold"]

c = httpx.Client()
for action in ACTIONS:
    c.post(f"{BASE}/reset")
    resp = c.post(f"{BASE}/step", json={
        "action_type": action,
        "transaction_id": "TXN-E001"
    })
    print(f"action={action:10}  HTTP={resp.status_code}  reward={resp.json()['reward']:+.2f}")
c.close()
EOF

Expected output

action=approve     HTTP=200  reward=+1.00
action=reject      HTTP=200  reward=-0.50
action=flag        HTTP=200  reward=+0.30
action=escalate    HTTP=200  reward=-0.25
action=inspect     HTTP=200  reward=+0.15
action=hold        HTTP=200  reward=-0.25

T-23 Full Perfect Episode (Score = 1.0)

Goal: Submit all 12 correct actions and confirm normalised_score = 1.0.

python3 - <<'EOF'
import httpx

BASE = "http://localhost:8000"
c = httpx.Client()

tasks = c.get(f"{BASE}/tasks").json()["tasks"]
c.post(f"{BASE}/reset")

for t in tasks:
    resp = c.post(f"{BASE}/step", json={
        "action_type": t["correct_action"],
        "transaction_id": t["transaction_id"]
    }).json()
    mark = "✓" if resp["reward"] == 1.0 else "✗"
    print(f"{mark} {t['task_id']:12} action={t['correct_action']:10} reward={resp['reward']:+.2f}")

score = c.get(f"{BASE}/grader").json()
print()
print("Normalised score:", score["normalised_score"])
print("Passed          :", score["passed"])
c.close()
EOF

Expected output

✓ EASY-001     action=approve     reward=+1.00
✓ EASY-002     action=reject      reward=+1.00
✓ EASY-003     action=approve     reward=+1.00
✓ EASY-004     action=flag        reward=+1.00
✓ MED-001      action=escalate    reward=+1.00
✓ MED-002      action=hold        reward=+1.00
✓ MED-003      action=flag        reward=+1.00
✓ MED-004      action=flag        reward=+1.00
✓ HARD-001     action=escalate    reward=+1.00
✓ HARD-002     action=reject      reward=+1.00
✓ HARD-003     action=reject      reward=+1.00
✓ HARD-004     action=approve     reward=+1.00

Normalised score: 1.0
Passed          : True

T-24 Worst-Case Episode (Approve Everything)

Goal: Approve all 12 transactions (maximally wrong) and confirm very low score.

python3 - <<'EOF'
import httpx

BASE = "http://localhost:8000"
c = httpx.Client()

tasks = c.get(f"{BASE}/tasks").json()["tasks"]
c.post(f"{BASE}/reset")

total = 0
for t in tasks:
    resp = c.post(f"{BASE}/step", json={
        "action_type": "approve",
        "transaction_id": t["transaction_id"]
    }).json()
    total += resp["reward"]
    print(f"{t['task_id']:12} correct={t['correct_action']:10} reward={resp['reward']:+.2f}")

score = c.get(f"{BASE}/grader").json()
print()
print(f"Total reward     : {total:+.2f}")
print(f"Normalised score : {score['normalised_score']}")
print(f"Passed           : {score['passed']}")
c.close()
EOF

Expected outcome: Several -1.0 and -0.5 penalties. Normalised score near or equal to 0.0. passed=False.


Quick Reference — Expected Rewards Per Action

Scenario Action Reward
Correct decision any +1.0
Inspect (first time) inspect +0.15
Inspect (already inspected) inspect 0.0
Partial credit (task-specific) adjacent +0.2+0.6
Approve fraud/escalation approve -1.0
Approve flagged/held approve -0.5
Reject legitimate tx reject -0.5
Any other wrong action any -0.25

Quick Reference — Correct Actions per Task

Task ID Difficulty Correct Action Key Signal
EASY-001 easy approve risk=0.05, no flags, verified KYC
EASY-002 easy reject sanctioned country, unknown sender, risk=0.97
EASY-003 easy approve small refund, risk=0.03, verified
EASY-004 easy flag velocity_1h=15 (ATM burst)
MED-001 medium escalate large B2B, cross-border, medium-risk country
MED-002 medium hold KYC pending, large internal transfer
MED-003 medium flag amount 3× historical average
MED-004 medium flag crypto exchange, moderate risk
HARD-001 hard escalate risk_score=0.18 is poisoned — manual flags say escalate
HARD-002 hard reject APP scam, mule account pattern
HARD-003 hard reject structuring/smurfing, KYC failed
HARD-004 hard approve legitimate FX settlement — looks scary, is fine

Running All Tests in One Script

Save the following as run_tests.sh and execute from /Users/padmapriya:

#!/usr/bin/env bash
# run_tests.sh — smoke-test all PayOps endpoints

set -e
BASE="http://localhost:8000"
PASS=0
FAIL=0

check() {
  local name="$1"
  local got="$2"
  local want="$3"
  if echo "$got" | grep -q "$want"; then
    echo "  ✓  $name"
    ((PASS++))
  else
    echo "  ✗  $name  (expected '$want', got '$got')"
    ((FAIL++))
  fi
}

echo "=== PayOps Test Suite ==="

check "T-01 health"        "$(curl -s $BASE/health)"              '"status":"ok"'
check "T-02 schema"        "$(curl -s $BASE/schema)"              '"PayOpsAction"'
check "T-03 tasks count"   "$(curl -s $BASE/tasks)"               '"count":12'
check "T-04 reset"         "$(curl -s -X POST $BASE/reset)"       '"task_id":"EASY-001"'
check "T-05 correct step"  "$(curl -s -X POST $BASE/step -H 'Content-Type: application/json' -d '{"action_type":"approve","transaction_id":"TXN-E001"}')"  '"reward":1.0'
check "T-10 invalid action" "$(curl -s -X POST $BASE/step -H 'Content-Type: application/json' -d '{"action_type":"delete","transaction_id":"TXN-E001"}')" "Invalid action_type"
check "T-16 baseline"      "$(curl -s -X POST $BASE/baseline)"    '"normalised_score"'

echo ""
echo "Results: $PASS passed, $FAIL failed"
cd /Users/padmapriya
bash payops_env/run_tests.sh

Expected output

=== PayOps Test Suite ===
  ✓  T-01 health
  ✓  T-02 schema
  ✓  T-03 tasks count
  ✓  T-04 reset
  ✓  T-05 correct step
  ✓  T-10 invalid action
  ✓  T-16 baseline

Results: 7 passed, 0 failed

Interactive API Explorer

FastAPI serves auto-generated interactive docs. Open in a browser while the server is running:

http://localhost:8000/docs      ← Swagger UI (try endpoints in-browser)
http://localhost:8000/redoc     ← ReDoc documentation