Spaces:
Sleeping
Sleeping
| Here is Claude's plan: βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Fix: Multi-book display + Full Pre-Game Model Stack for HR% | |
| Context | |
| Props are now rendering (Caesars only, model HR% blank). Three issues to fix: | |
| Issue A β Only one sportsbook showing (Caesars) | |
| Stop-at-first provider logic returns Caesars (from Odds API) and never runs the scraper for | |
| DK/FD/BetMGM. Fix: concat-all providers, dedup by best odds per player+book+market. | |
| Issue B β Model HR% blank (pre-season statcast empty) | |
| load_statcast_recent() returns empty pre-season (no 2026 games). Name index is empty β all | |
| players return "unavailable". Fix: fall back to load_statcast_previous_season_full() (2025). | |
| Issue C β Model uses only batter baseline | |
| analytics/props_mapper.py currently applies only compute_batter_baseline(). Full pre-game | |
| model includes: park factor, pitcher quality, zone matchup, arsenal matchup, rolling form. | |
| Key data gap: (1) statcast_df is batter-perspective (player_type=batter) β no pitcher rows. | |
| (2) Props rows have no pitcher name or venue. Both must be resolved. | |
| --- | |
| Files to Modify / Create | |
| 1. data/live_prop_odds.py β concat-all with dedup (Fix A) | |
| 2. app.py β statcast fallback + pitcher statcast loader + Props call site changes (Fixes B + C) | |
| 3. data/statcast.py β add player_type param to _query_statcast() | |
| 4. data/mlb_starters.py (NEW) β probable starters from MLB Stats API | |
| 5. analytics/props_mapper.py β full pre-game model stack (Fix C) | |
| 6. visualization/props_page.py β pass pitcher_statcast_df + probable_starters through | |
| --- | |
| Fix A: data/live_prop_odds.py β concat-all with dedup | |
| Replace stop-at-first loop. Run ALL providers, concatenate, dedup keeping best odds. | |
| # CURRENT loop (stop-at-first): | |
| for provider in providers: | |
| ... | |
| if not df.empty: | |
| return normalize_prop_odds(df) # exits on first non-empty | |
| return pd.DataFrame() | |
| # NEW loop (concat-all, then dedup): | |
| frames = [] | |
| for provider in providers: | |
| try: | |
| fetch_fn = getattr(provider, "fetch_all_upcoming_hr_props", None) | |
| if fetch_fn is None: | |
| continue | |
| df = fetch_fn(sportsbooks=sportsbooks) | |
| if not df.empty: | |
| frames.append(df) | |
| except Exception as e: | |
| logger.warning(f"[odds_provider_fetch] failure: {e}", exc_info=True) | |
| continue | |
| if not frames: | |
| return pd.DataFrame() | |
| merged = pd.concat(frames, ignore_index=True) | |
| merged = normalize_prop_odds(merged) | |
| # Dedup: keep one row per (player_name, sportsbook_key, market) β best odds wins | |
| if not merged.empty and "sportsbook_key" in merged.columns: | |
| merged["_odds_score"] = merged["odds_american"].apply( | |
| lambda x: int(x) if pd.notna(x) else -9999 | |
| ) | |
| merged = ( | |
| merged | |
| .sort_values("_odds_score", ascending=False) | |
| .drop_duplicates(subset=["player_name", "sportsbook_key", "market"], keep="first") | |
| .drop(columns=["_odds_score"]) | |
| .reset_index(drop=True) | |
| ) | |
| logger.warning( | |
| "[fetch_all_upcoming_hr_props] providers=%d frames=%d merged_rows=%d unique_books=%s", | |
| len(providers), len(frames), len(merged), | |
| sorted(merged["sportsbook"].dropna().unique().tolist()) if not merged.empty else [], | |
| ) | |
| return merged | |
| --- | |
| Fix B: app.py β statcast fallback for pre-season (line 3409 area) | |
| # CURRENT: | |
| render_props(load_statcast_recent(), conn=conn, raw_props=load_upcoming_hr_props()) | |
| # NEW: | |
| _statcast_for_props = load_statcast_recent() | |
| if _statcast_for_props.empty: | |
| _statcast_for_props = load_statcast_previous_season_full() | |
| --- | |
| Fix C part 1: data/statcast.py β add player_type param | |
| _query_statcast() currently hardcodes "player_type": "batter". Add a parameter so pitcher | |
| perspective can be fetched separately. | |
| def _query_statcast(start_date: str, end_date: str, season: str, player_type: str = "batter") -> pd.DataFrame: | |
| params = { | |
| ... | |
| "player_type": player_type, # was hardcoded "batter" | |
| ... | |
| } | |
| fetch_statcast_range() keeps its existing signature unchanged (defaults to batter). | |
| Add new public function: | |
| def fetch_statcast_range_pitcher(start_date: str, end_date: str) -> pd.DataFrame: | |
| """Fetch pitcher-perspective Statcast (player_name = pitcher name).""" | |
| season = str(datetime.strptime(start_date, "%Y-%m-%d").year) | |
| return _query_statcast(start_date, end_date, season=season, player_type="pitcher") | |
| --- | |
| Fix C part 2: app.py β load pitcher statcast + probable starters | |
| Add alongside load_statcast_previous_season_full(): | |
| @st.cache_data(ttl=60 * 60 * 12, show_spinner=False) | |
| def load_statcast_previous_season_full_pitcher() -> pd.DataFrame: | |
| """2025 season, pitcher perspective β player_name = pitcher name.""" | |
| today = pd.Timestamp.utcnow().date() | |
| previous_year = today.year - 1 | |
| start_date = pd.Timestamp(year=previous_year, month=1, day=1).date() | |
| end_date = pd.Timestamp(year=previous_year, month=12, day=31).date() | |
| from data.statcast import fetch_statcast_range_pitcher | |
| from data.statcast import normalize_statcast | |
| from features.pitch_features import add_pitch_features | |
| raw = fetch_statcast_range_pitcher(start_date.isoformat(), end_date.isoformat()) | |
| normalized = normalize_statcast(raw) | |
| return add_pitch_features(normalized) | |
| Update the Props call site: | |
| _statcast_for_props = load_statcast_recent() | |
| if _statcast_for_props.empty: | |
| _statcast_for_props = load_statcast_previous_season_full() | |
| _pitcher_statcast = load_statcast_previous_season_full_pitcher() | |
| from data.mlb_starters import fetch_probable_starters_for_props | |
| _probable_starters = fetch_probable_starters_for_props() # {(away_team, home_team): {...}} | |
| render_props( | |
| _statcast_for_props, | |
| conn=conn, | |
| raw_props=load_upcoming_hr_props(), | |
| pitcher_statcast_df=_pitcher_statcast, | |
| probable_starters=_probable_starters, | |
| ) | |
| --- | |
| Fix C part 3: data/mlb_starters.py (NEW FILE) | |
| Fetches probable starters from MLB Stats API for the next 7 days. Returns a dict mapping | |
| (away_team, home_team) -> {"home_pitcher": str, "away_pitcher": str} for use in props_mapper. | |
| Uses the public statsapi.mlb.com/api/v1/schedule endpoint with hydrate=probablePitcher. | |
| Cached in the calling layer (Streamlit cache_data). No API key required. | |
| MLB API team IDs for team name matching β response includes teams.home.team.name and | |
| teams.away.team.name alongside teams.home.probablePitcher.fullName. | |
| # data/mlb_starters.py | |
| from __future__ import annotations | |
| import logging | |
| from datetime import timedelta | |
| import requests | |
| import pandas as pd | |
| _log = logging.getLogger(__name__) | |
| _SCHEDULE_URL = "https://statsapi.mlb.com/api/v1/schedule" | |
| def fetch_probable_starters_for_props() -> dict[tuple[str, str], dict[str, str | None]]: | |
| """ | |
| Returns {(away_team, home_team): {"home_pitcher": name_or_None, "away_pitcher": name_or_None}} | |
| for all MLB games in the next 7 days. | |
| """ | |
| today = pd.Timestamp.utcnow().date() | |
| end_date = today + timedelta(days=7) | |
| params = { | |
| "sportId": 1, | |
| "startDate": today.isoformat(), | |
| "endDate": end_date.isoformat(), | |
| "hydrate": "probablePitcher", | |
| "gameType": "R,F,D,L,W", | |
| } | |
| try: | |
| r = requests.get(_SCHEDULE_URL, params=params, timeout=15) | |
| r.raise_for_status() | |
| except Exception as exc: | |
| _log.warning("[mlb_starters] schedule fetch failed: %s", exc) | |
| return {} | |
| result: dict[tuple[str, str], dict[str, str | None]] = {} | |
| for date_entry in r.json().get("dates", []): | |
| for game in date_entry.get("games", []): | |
| teams = game.get("teams", {}) | |
| away_name = teams.get("away", {}).get("team", {}).get("name", "") | |
| home_name = teams.get("home", {}).get("team", {}).get("name", "") | |
| away_pitcher = teams.get("away", {}).get("probablePitcher", {}).get("fullName") | |
| home_pitcher = teams.get("home", {}).get("probablePitcher", {}).get("fullName") | |
| if away_name and home_name: | |
| result[(away_name, home_name)] = { | |
| "home_pitcher": home_pitcher, | |
| "away_pitcher": away_pitcher, | |
| } | |
| _log.warning("[mlb_starters] games_with_starters=%d", sum(1 for v in result.values() if v["home_pitcher"] or v["away_pitcher"])) | |
| return result | |
| Note: MLB API team names (e.g. "New York Yankees") may differ from props row team names (The | |
| Odds API uses "New York Yankees" format). A fuzzy match or alias map may be needed; add | |
| _normalize_team(name) helper that lowercases + strips punctuation for comparison. | |
| --- | |
| Fix C part 4: analytics/props_mapper.py β full pre-game model stack | |
| 4a. Add HOME_TEAM_TO_STADIUM mapping (30 MLB teams β canonical stadium name) | |
| # Maps Odds API home_team names β canonical names accepted by models/stadium_lookup.resolve_stadium() | |
| HOME_TEAM_TO_STADIUM: dict[str, str] = { | |
| "Baltimore Orioles": "oriole park at camden yards", | |
| "Boston Red Sox": "fenway park", | |
| "New York Yankees": "yankee stadium", | |
| "Tampa Bay Rays": "tropicana field", | |
| "Toronto Blue Jays": "rogers centre", | |
| "Chicago White Sox": "guaranteed rate field", | |
| "Cleveland Guardians": "progressive field", | |
| "Detroit Tigers": "comerica park", | |
| "Kansas City Royals": "kauffman stadium", | |
| "Minnesota Twins": "target field", | |
| "Houston Astros": "minute maid park", | |
| "Los Angeles Angels": "angel stadium", | |
| "Oakland Athletics": "athletics ballpark", | |
| "Seattle Mariners": "t-mobile park", | |
| "Texas Rangers": "globe life field", | |
| "Atlanta Braves": "truist park", | |
| "Miami Marlins": "loandepot park", | |
| "New York Mets": "citi field", | |
| "Philadelphia Phillies": "citizens bank park", | |
| "Washington Nationals": "nationals park", | |
| "Chicago Cubs": "wrigley field", | |
| "Cincinnati Reds": "great american ball park", | |
| "Milwaukee Brewers": "american family field", | |
| "Pittsburgh Pirates": "pnc park", | |
| "St. Louis Cardinals": "busch stadium", | |
| "Arizona Diamondbacks": "chase field", | |
| "Colorado Rockies": "coors field", | |
| "Los Angeles Dodgers": "dodger stadium", | |
| "San Diego Padres": "petco park", | |
| "San Francisco Giants": "oracle park", | |
| } | |
| 4b. Add batter team lookup helper | |
| def _lookup_batter_team( | |
| player_name_normalized: str, | |
| props_row_away_team: str, | |
| props_row_home_team: str, | |
| statcast_df: pd.DataFrame, | |
| ) -> str | None: | |
| """ | |
| Returns "home" or "away" for which team the batter plays on, or None if unknown. | |
| Uses statcast game records: find rows where this player appears and check if the | |
| game's away_team or home_team matches the props row matchup. | |
| """ | |
| # Try to match batter's statcast rows against the props game teams | |
| if statcast_df.empty or "player_name" not in statcast_df.columns: | |
| return None | |
| # (implementation: filter statcast by player_name, check home_team/away_team columns) | |
| 4c. Replace _get_pregame_context_adjustments() with _get_full_pregame_adjustments() | |
| New signature: | |
| def _get_full_pregame_adjustments( | |
| props_row: Any, | |
| batter_features: dict, # already computed batter baseline features | |
| statcast_df: pd.DataFrame, # batter-perspective (player_name = batter) | |
| pitcher_statcast_df: pd.DataFrame, # pitcher-perspective (player_name = pitcher) | |
| probable_starters: dict, # {(away_team, home_team): {home_pitcher, away_pitcher}} | |
| ) -> tuple[float, str]: # (total_adj, source_detail_str) | |
| Returns total additive HR probability adjustment and a pipe-separated source string | |
| (e.g. "baseline+pitcher_quality+park+rolling_form+zone_matchup+arsenal_matchup"). | |
| Components applied in order, each try/except, no-op on failure: | |
| 1. Park factor β via HOME_TEAM_TO_STADIUM[home_team] β resolve_stadium() β | |
| compute_park_adjustment() β clamp to Β±0.015 | |
| 2. Probable pitcher lookup β match (away_team, home_team) in probable_starters dict | |
| (fuzzy normalize both sides). Determine batter's team β select opposing starting pitcher. | |
| 3. Pitcher quality (proper) β build_pitcher_feature_row(pitcher_statcast_df, pitcher_name) | |
| β compute_pitcher_adjustment(batter_features, pitcher_row) from | |
| models/pitcher_adjustment.py. Use result's hr_adj directly. Clamp Β±0.010. | |
| 4. Zone matchup (if data available) β build_batter_zone_feature_row(statcast_df, player_name) | |
| (from remote DB via batter_zone_store) + build_pitcher_zone_feature_row(pitcher_statcast_df, pitcher_name) β compute_zone_matchup_adjustment(). Use hr_zone_boost | |
| - baseline_hr delta, | |
| clamped Β±0.010. | |
| 5. Arsenal matchup β build_batter_arsenal_feature_row(statcast_df, statcast_name) | |
| - build_pitcher_arsenal_feature_row(pitcher_statcast_df, pitcher_name) | |
| β compute_arsenal_matchup_adjustment(). Use arsenal_hr_boost delta, clamped Β±0.010. | |
| 6. Rolling form β build_batter_rolling_form_row(statcast_df, statcast_name, reference_date=today) | |
| - build_pitcher_rolling_form_row(pitcher_statcast_df, pitcher_name, reference_date=today) | |
| β compute_upcoming_rolling_adjustment(batter_roll, pitcher_roll, batter_features, pitcher_row). | |
| Use rolling_hr_adjustment directly. | |
| 4d. Update map_hr_props_to_model() signature | |
| def map_hr_props_to_model( | |
| props_df: pd.DataFrame, | |
| statcast_df: pd.DataFrame, | |
| prob_fn: ... | None = None, | |
| pitcher_stats_df: pd.DataFrame | None = None, # existing param (kept) | |
| pitcher_statcast_df: pd.DataFrame | None = None, # NEW: pitcher-perspective statcast | |
| probable_starters: dict | None = None, # NEW: {(away,home): {pitchers}} | |
| ) -> pd.DataFrame: | |
| Inside the per-row loop, replace _get_pregame_context_adjustments() call with | |
| _get_full_pregame_adjustments() call using the new params. | |
| --- | |
| Fix C part 5: visualization/props_page.py β pass new params | |
| Update render_props() signature to accept and forward: | |
| def render_props( | |
| statcast_df: pd.DataFrame, | |
| conn=None, | |
| raw_props: pd.DataFrame | None = None, | |
| pitcher_statcast_df: pd.DataFrame | None = None, # NEW | |
| probable_starters: dict | None = None, # NEW | |
| ) -> None: | |
| At line 107: | |
| # CURRENT: | |
| mapped = map_hr_props_to_model(filtered_raw, statcast_df) | |
| # NEW: | |
| mapped = map_hr_props_to_model( | |
| filtered_raw, | |
| statcast_df, | |
| pitcher_statcast_df=pitcher_statcast_df, | |
| probable_starters=probable_starters, | |
| ) | |
| --- | |
| Expected Behavior After All Fixes | |
| βββββββββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ¬ββββββββββββββββββ¬βββββββββββββββββββββββββββββββ | |
| β Scenario β Books shown β Model HR% β Model components β | |
| βββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββΌββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β Season in progress, all books posting β All books (Odds API + scraper) β Recent statcast β All pre-game models β | |
| βββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββΌββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β Pre-season, starters announced β 4 books (scraper) β 2025 season β All pre-game models β | |
| βββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββΌββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β Pre-season, no starters yet β 4 books (scraper) β 2025 season β Baseline + park (no pitcher) β | |
| βββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββΌββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β All providers fail β "No HR props" warning β β β β β | |
| βββββββββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββ΄ββββββββββββββββββ΄βββββββββββββββββββββββββββββββ | |
| --- | |
| Verification | |
| 1. Props tab loads without error | |
| 2. Multiple sportsbooks appear in Book column | |
| 3. Model HR% column shows percentages for most players | |
| 4. Source column shows enriched source string (e.g. "baseline+pitcher_quality+park") | |
| 5. Check logs: [fetch_all_upcoming_hr_props] unique_books= shows 2+ books | |
| 6. Check logs: [mlb_starters] games_with_starters=N shows >0 when games are announced | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| Claude has written up a plan and is ready to execute. Would you like to proceed? | |
| β― 1. Yes, auto-accept edits | |
| 2. Yes, manually approve edits | |
| 3. Type here to tell Claude what to change |