Spaces:

Syntrex
/

KasperStocks

Sleeping

File size: 19,210 Bytes

3b144a1

Here is Claude's plan:                                                                                                                                            ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌ Fix: Multi-book display + Full Pre-Game Model Stack for HR%                                                                                                       

 Context

 Props are now rendering (Caesars only, model HR% blank). Three issues to fix:

 Issue A — Only one sportsbook showing (Caesars)
 Stop-at-first provider logic returns Caesars (from Odds API) and never runs the scraper for
 DK/FD/BetMGM. Fix: concat-all providers, dedup by best odds per player+book+market.

 Issue B — Model HR% blank (pre-season statcast empty)
 load_statcast_recent() returns empty pre-season (no 2026 games). Name index is empty → all
 players return "unavailable". Fix: fall back to load_statcast_previous_season_full() (2025).

 Issue C — Model uses only batter baseline
 analytics/props_mapper.py currently applies only compute_batter_baseline(). Full pre-game
 model includes: park factor, pitcher quality, zone matchup, arsenal matchup, rolling form.
 Key data gap: (1) statcast_df is batter-perspective (player_type=batter) — no pitcher rows.
 (2) Props rows have no pitcher name or venue. Both must be resolved.

 ---
 Files to Modify / Create

 1. data/live_prop_odds.py — concat-all with dedup (Fix A)
 2. app.py — statcast fallback + pitcher statcast loader + Props call site changes (Fixes B + C)
 3. data/statcast.py — add player_type param to _query_statcast()
 4. data/mlb_starters.py (NEW) — probable starters from MLB Stats API
 5. analytics/props_mapper.py — full pre-game model stack (Fix C)
 6. visualization/props_page.py — pass pitcher_statcast_df + probable_starters through

 ---
 Fix A: data/live_prop_odds.py — concat-all with dedup

 Replace stop-at-first loop. Run ALL providers, concatenate, dedup keeping best odds.

 # CURRENT loop (stop-at-first):
 for provider in providers:
     ...
     if not df.empty:
         return normalize_prop_odds(df)   # exits on first non-empty
 return pd.DataFrame()

 # NEW loop (concat-all, then dedup):
 frames = []
 for provider in providers:
     try:
         fetch_fn = getattr(provider, "fetch_all_upcoming_hr_props", None)
         if fetch_fn is None:
             continue
         df = fetch_fn(sportsbooks=sportsbooks)
         if not df.empty:
             frames.append(df)
     except Exception as e:
         logger.warning(f"[odds_provider_fetch] failure: {e}", exc_info=True)
         continue

 if not frames:
     return pd.DataFrame()

 merged = pd.concat(frames, ignore_index=True)
 merged = normalize_prop_odds(merged)

 # Dedup: keep one row per (player_name, sportsbook_key, market) — best odds wins
 if not merged.empty and "sportsbook_key" in merged.columns:
     merged["_odds_score"] = merged["odds_american"].apply(
         lambda x: int(x) if pd.notna(x) else -9999
     )
     merged = (
         merged
         .sort_values("_odds_score", ascending=False)
         .drop_duplicates(subset=["player_name", "sportsbook_key", "market"], keep="first")
         .drop(columns=["_odds_score"])
         .reset_index(drop=True)
     )

 logger.warning(
     "[fetch_all_upcoming_hr_props] providers=%d frames=%d merged_rows=%d unique_books=%s",
     len(providers), len(frames), len(merged),
     sorted(merged["sportsbook"].dropna().unique().tolist()) if not merged.empty else [],
 )
 return merged

 ---
 Fix B: app.py — statcast fallback for pre-season (line 3409 area)

 # CURRENT:
 render_props(load_statcast_recent(), conn=conn, raw_props=load_upcoming_hr_props())

 # NEW:
 _statcast_for_props = load_statcast_recent()
 if _statcast_for_props.empty:
     _statcast_for_props = load_statcast_previous_season_full()

 ---
 Fix C part 1: data/statcast.py — add player_type param

 _query_statcast() currently hardcodes "player_type": "batter". Add a parameter so pitcher
 perspective can be fetched separately.

 def _query_statcast(start_date: str, end_date: str, season: str, player_type: str = "batter") -> pd.DataFrame:
     params = {
         ...
         "player_type": player_type,   # was hardcoded "batter"
         ...
     }

 fetch_statcast_range() keeps its existing signature unchanged (defaults to batter).
 Add new public function:
 def fetch_statcast_range_pitcher(start_date: str, end_date: str) -> pd.DataFrame:
     """Fetch pitcher-perspective Statcast (player_name = pitcher name)."""
     season = str(datetime.strptime(start_date, "%Y-%m-%d").year)
     return _query_statcast(start_date, end_date, season=season, player_type="pitcher")

 ---
 Fix C part 2: app.py — load pitcher statcast + probable starters

 Add alongside load_statcast_previous_season_full():

 @st.cache_data(ttl=60 * 60 * 12, show_spinner=False)
 def load_statcast_previous_season_full_pitcher() -> pd.DataFrame:
     """2025 season, pitcher perspective — player_name = pitcher name."""
     today = pd.Timestamp.utcnow().date()
     previous_year = today.year - 1
     start_date = pd.Timestamp(year=previous_year, month=1, day=1).date()
     end_date = pd.Timestamp(year=previous_year, month=12, day=31).date()
     from data.statcast import fetch_statcast_range_pitcher
     from data.statcast import normalize_statcast
     from features.pitch_features import add_pitch_features
     raw = fetch_statcast_range_pitcher(start_date.isoformat(), end_date.isoformat())
     normalized = normalize_statcast(raw)
     return add_pitch_features(normalized)

 Update the Props call site:
 _statcast_for_props = load_statcast_recent()
 if _statcast_for_props.empty:
     _statcast_for_props = load_statcast_previous_season_full()

 _pitcher_statcast = load_statcast_previous_season_full_pitcher()

 from data.mlb_starters import fetch_probable_starters_for_props
 _probable_starters = fetch_probable_starters_for_props()   # {(away_team, home_team): {...}}

 render_props(
     _statcast_for_props,
     conn=conn,
     raw_props=load_upcoming_hr_props(),
     pitcher_statcast_df=_pitcher_statcast,
     probable_starters=_probable_starters,
 )

 ---
 Fix C part 3: data/mlb_starters.py (NEW FILE)

 Fetches probable starters from MLB Stats API for the next 7 days. Returns a dict mapping
 (away_team, home_team) -> {"home_pitcher": str, "away_pitcher": str} for use in props_mapper.

 Uses the public statsapi.mlb.com/api/v1/schedule endpoint with hydrate=probablePitcher.
 Cached in the calling layer (Streamlit cache_data). No API key required.

 MLB API team IDs for team name matching — response includes teams.home.team.name and
 teams.away.team.name alongside teams.home.probablePitcher.fullName.

 # data/mlb_starters.py
 from __future__ import annotations
 import logging
 from datetime import timedelta
 import requests
 import pandas as pd

 _log = logging.getLogger(__name__)

 _SCHEDULE_URL = "https://statsapi.mlb.com/api/v1/schedule"

 def fetch_probable_starters_for_props() -> dict[tuple[str, str], dict[str, str | None]]:
     """
     Returns {(away_team, home_team): {"home_pitcher": name_or_None, "away_pitcher": name_or_None}}
     for all MLB games in the next 7 days.
     """
     today = pd.Timestamp.utcnow().date()
     end_date = today + timedelta(days=7)
     params = {
         "sportId": 1,
         "startDate": today.isoformat(),
         "endDate": end_date.isoformat(),
         "hydrate": "probablePitcher",
         "gameType": "R,F,D,L,W",
     }
     try:
         r = requests.get(_SCHEDULE_URL, params=params, timeout=15)
         r.raise_for_status()
     except Exception as exc:
         _log.warning("[mlb_starters] schedule fetch failed: %s", exc)
         return {}

     result: dict[tuple[str, str], dict[str, str | None]] = {}
     for date_entry in r.json().get("dates", []):
         for game in date_entry.get("games", []):
             teams = game.get("teams", {})
             away_name = teams.get("away", {}).get("team", {}).get("name", "")
             home_name = teams.get("home", {}).get("team", {}).get("name", "")
             away_pitcher = teams.get("away", {}).get("probablePitcher", {}).get("fullName")
             home_pitcher = teams.get("home", {}).get("probablePitcher", {}).get("fullName")
             if away_name and home_name:
                 result[(away_name, home_name)] = {
                     "home_pitcher": home_pitcher,
                     "away_pitcher": away_pitcher,
                 }
     _log.warning("[mlb_starters] games_with_starters=%d", sum(1 for v in result.values() if v["home_pitcher"] or v["away_pitcher"]))
     return result

 Note: MLB API team names (e.g. "New York Yankees") may differ from props row team names (The
 Odds API uses "New York Yankees" format). A fuzzy match or alias map may be needed; add
 _normalize_team(name) helper that lowercases + strips punctuation for comparison.

 ---
 Fix C part 4: analytics/props_mapper.py — full pre-game model stack

 4a. Add HOME_TEAM_TO_STADIUM mapping (30 MLB teams → canonical stadium name)

 # Maps Odds API home_team names → canonical names accepted by models/stadium_lookup.resolve_stadium()
 HOME_TEAM_TO_STADIUM: dict[str, str] = {
     "Baltimore Orioles": "oriole park at camden yards",
     "Boston Red Sox": "fenway park",
     "New York Yankees": "yankee stadium",
     "Tampa Bay Rays": "tropicana field",
     "Toronto Blue Jays": "rogers centre",
     "Chicago White Sox": "guaranteed rate field",
     "Cleveland Guardians": "progressive field",
     "Detroit Tigers": "comerica park",
     "Kansas City Royals": "kauffman stadium",
     "Minnesota Twins": "target field",
     "Houston Astros": "minute maid park",
     "Los Angeles Angels": "angel stadium",
     "Oakland Athletics": "athletics ballpark",
     "Seattle Mariners": "t-mobile park",
     "Texas Rangers": "globe life field",
     "Atlanta Braves": "truist park",
     "Miami Marlins": "loandepot park",
     "New York Mets": "citi field",
     "Philadelphia Phillies": "citizens bank park",
     "Washington Nationals": "nationals park",
     "Chicago Cubs": "wrigley field",
     "Cincinnati Reds": "great american ball park",
     "Milwaukee Brewers": "american family field",
     "Pittsburgh Pirates": "pnc park",
     "St. Louis Cardinals": "busch stadium",
     "Arizona Diamondbacks": "chase field",
     "Colorado Rockies": "coors field",
     "Los Angeles Dodgers": "dodger stadium",
     "San Diego Padres": "petco park",
     "San Francisco Giants": "oracle park",
 }

 4b. Add batter team lookup helper

 def _lookup_batter_team(
     player_name_normalized: str,
     props_row_away_team: str,
     props_row_home_team: str,
     statcast_df: pd.DataFrame,
 ) -> str | None:
     """
     Returns "home" or "away" for which team the batter plays on, or None if unknown.
     Uses statcast game records: find rows where this player appears and check if the
     game's away_team or home_team matches the props row matchup.
     """
     # Try to match batter's statcast rows against the props game teams
     if statcast_df.empty or "player_name" not in statcast_df.columns:
         return None
     # (implementation: filter statcast by player_name, check home_team/away_team columns)

 4c. Replace _get_pregame_context_adjustments() with _get_full_pregame_adjustments()

 New signature:
 def _get_full_pregame_adjustments(
     props_row: Any,
     batter_features: dict,         # already computed batter baseline features
     statcast_df: pd.DataFrame,     # batter-perspective (player_name = batter)
     pitcher_statcast_df: pd.DataFrame,  # pitcher-perspective (player_name = pitcher)
     probable_starters: dict,       # {(away_team, home_team): {home_pitcher, away_pitcher}}
 ) -> tuple[float, str]:            # (total_adj, source_detail_str)

 Returns total additive HR probability adjustment and a pipe-separated source string
 (e.g. "baseline+pitcher_quality+park+rolling_form+zone_matchup+arsenal_matchup").

 Components applied in order, each try/except, no-op on failure:

 1. Park factor — via HOME_TEAM_TO_STADIUM[home_team] → resolve_stadium() →
 compute_park_adjustment() → clamp to ±0.015
 2. Probable pitcher lookup — match (away_team, home_team) in probable_starters dict
 (fuzzy normalize both sides). Determine batter's team → select opposing starting pitcher.
 3. Pitcher quality (proper) — build_pitcher_feature_row(pitcher_statcast_df, pitcher_name)
 → compute_pitcher_adjustment(batter_features, pitcher_row) from
 models/pitcher_adjustment.py. Use result's hr_adj directly. Clamp ±0.010.
 4. Zone matchup (if data available) — build_batter_zone_feature_row(statcast_df, player_name)
 (from remote DB via batter_zone_store) + build_pitcher_zone_feature_row(pitcher_statcast_df, pitcher_name) → compute_zone_matchup_adjustment(). Use hr_zone_boost 
  - baseline_hr delta,
 clamped ±0.010.
 5. Arsenal matchup — build_batter_arsenal_feature_row(statcast_df, statcast_name)
   - build_pitcher_arsenal_feature_row(pitcher_statcast_df, pitcher_name)
 → compute_arsenal_matchup_adjustment(). Use arsenal_hr_boost delta, clamped ±0.010.
 6. Rolling form — build_batter_rolling_form_row(statcast_df, statcast_name, reference_date=today)
   - build_pitcher_rolling_form_row(pitcher_statcast_df, pitcher_name, reference_date=today)
 → compute_upcoming_rolling_adjustment(batter_roll, pitcher_roll, batter_features, pitcher_row).
 Use rolling_hr_adjustment directly.

 4d. Update map_hr_props_to_model() signature

 def map_hr_props_to_model(
     props_df: pd.DataFrame,
     statcast_df: pd.DataFrame,
     prob_fn: ... | None = None,
     pitcher_stats_df: pd.DataFrame | None = None,      # existing param (kept)
     pitcher_statcast_df: pd.DataFrame | None = None,   # NEW: pitcher-perspective statcast
     probable_starters: dict | None = None,             # NEW: {(away,home): {pitchers}}
 ) -> pd.DataFrame:

 Inside the per-row loop, replace _get_pregame_context_adjustments() call with
 _get_full_pregame_adjustments() call using the new params.

 ---
 Fix C part 5: visualization/props_page.py — pass new params

 Update render_props() signature to accept and forward:
 def render_props(
     statcast_df: pd.DataFrame,
     conn=None,
     raw_props: pd.DataFrame | None = None,
     pitcher_statcast_df: pd.DataFrame | None = None,   # NEW
     probable_starters: dict | None = None,             # NEW
 ) -> None:

 At line 107:
 # CURRENT:
 mapped = map_hr_props_to_model(filtered_raw, statcast_df)

 # NEW:
 mapped = map_hr_props_to_model(
     filtered_raw,
     statcast_df,
     pitcher_statcast_df=pitcher_statcast_df,
     probable_starters=probable_starters,
 )

 ---
 Expected Behavior After All Fixes

 ┌───────────────────────────────────────┬────────────────────────────────┬─────────────────┬──────────────────────────────┐
 │               Scenario                │          Books shown           │    Model HR%    │       Model components       │
 ├───────────────────────────────────────┼────────────────────────────────┼─────────────────┼──────────────────────────────┤
 │ Season in progress, all books posting │ All books (Odds API + scraper) │ Recent statcast │ All pre-game models          │
 ├───────────────────────────────────────┼────────────────────────────────┼─────────────────┼──────────────────────────────┤
 │ Pre-season, starters announced        │ 4 books (scraper)              │ 2025 season     │ All pre-game models          │
 ├───────────────────────────────────────┼────────────────────────────────┼─────────────────┼──────────────────────────────┤
 │ Pre-season, no starters yet           │ 4 books (scraper)              │ 2025 season     │ Baseline + park (no pitcher) │
 ├───────────────────────────────────────┼────────────────────────────────┼─────────────────┼──────────────────────────────┤
 │ All providers fail                    │ "No HR props" warning          │ —               │ —                            │
 └───────────────────────────────────────┴────────────────────────────────┴─────────────────┴──────────────────────────────┘

 ---
 Verification

 1. Props tab loads without error
 2. Multiple sportsbooks appear in Book column
 3. Model HR% column shows percentages for most players
 4. Source column shows enriched source string (e.g. "baseline+pitcher_quality+park")
 5. Check logs: [fetch_all_upcoming_hr_props] unique_books= shows 2+ books
 6. Check logs: [mlb_starters] games_with_starters=N shows >0 when games are announced
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌

 Claude has written up a plan and is ready to execute. Would you like to proceed?

 ❯ 1. Yes, auto-accept edits
   2. Yes, manually approve edits
   3. Type here to tell Claude what to change