Spaces:
Sleeping
Sleeping
File size: 19,210 Bytes
3b144a1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 | Here is Claude's plan: βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Fix: Multi-book display + Full Pre-Game Model Stack for HR%
Context
Props are now rendering (Caesars only, model HR% blank). Three issues to fix:
Issue A β Only one sportsbook showing (Caesars)
Stop-at-first provider logic returns Caesars (from Odds API) and never runs the scraper for
DK/FD/BetMGM. Fix: concat-all providers, dedup by best odds per player+book+market.
Issue B β Model HR% blank (pre-season statcast empty)
load_statcast_recent() returns empty pre-season (no 2026 games). Name index is empty β all
players return "unavailable". Fix: fall back to load_statcast_previous_season_full() (2025).
Issue C β Model uses only batter baseline
analytics/props_mapper.py currently applies only compute_batter_baseline(). Full pre-game
model includes: park factor, pitcher quality, zone matchup, arsenal matchup, rolling form.
Key data gap: (1) statcast_df is batter-perspective (player_type=batter) β no pitcher rows.
(2) Props rows have no pitcher name or venue. Both must be resolved.
---
Files to Modify / Create
1. data/live_prop_odds.py β concat-all with dedup (Fix A)
2. app.py β statcast fallback + pitcher statcast loader + Props call site changes (Fixes B + C)
3. data/statcast.py β add player_type param to _query_statcast()
4. data/mlb_starters.py (NEW) β probable starters from MLB Stats API
5. analytics/props_mapper.py β full pre-game model stack (Fix C)
6. visualization/props_page.py β pass pitcher_statcast_df + probable_starters through
---
Fix A: data/live_prop_odds.py β concat-all with dedup
Replace stop-at-first loop. Run ALL providers, concatenate, dedup keeping best odds.
# CURRENT loop (stop-at-first):
for provider in providers:
...
if not df.empty:
return normalize_prop_odds(df) # exits on first non-empty
return pd.DataFrame()
# NEW loop (concat-all, then dedup):
frames = []
for provider in providers:
try:
fetch_fn = getattr(provider, "fetch_all_upcoming_hr_props", None)
if fetch_fn is None:
continue
df = fetch_fn(sportsbooks=sportsbooks)
if not df.empty:
frames.append(df)
except Exception as e:
logger.warning(f"[odds_provider_fetch] failure: {e}", exc_info=True)
continue
if not frames:
return pd.DataFrame()
merged = pd.concat(frames, ignore_index=True)
merged = normalize_prop_odds(merged)
# Dedup: keep one row per (player_name, sportsbook_key, market) β best odds wins
if not merged.empty and "sportsbook_key" in merged.columns:
merged["_odds_score"] = merged["odds_american"].apply(
lambda x: int(x) if pd.notna(x) else -9999
)
merged = (
merged
.sort_values("_odds_score", ascending=False)
.drop_duplicates(subset=["player_name", "sportsbook_key", "market"], keep="first")
.drop(columns=["_odds_score"])
.reset_index(drop=True)
)
logger.warning(
"[fetch_all_upcoming_hr_props] providers=%d frames=%d merged_rows=%d unique_books=%s",
len(providers), len(frames), len(merged),
sorted(merged["sportsbook"].dropna().unique().tolist()) if not merged.empty else [],
)
return merged
---
Fix B: app.py β statcast fallback for pre-season (line 3409 area)
# CURRENT:
render_props(load_statcast_recent(), conn=conn, raw_props=load_upcoming_hr_props())
# NEW:
_statcast_for_props = load_statcast_recent()
if _statcast_for_props.empty:
_statcast_for_props = load_statcast_previous_season_full()
---
Fix C part 1: data/statcast.py β add player_type param
_query_statcast() currently hardcodes "player_type": "batter". Add a parameter so pitcher
perspective can be fetched separately.
def _query_statcast(start_date: str, end_date: str, season: str, player_type: str = "batter") -> pd.DataFrame:
params = {
...
"player_type": player_type, # was hardcoded "batter"
...
}
fetch_statcast_range() keeps its existing signature unchanged (defaults to batter).
Add new public function:
def fetch_statcast_range_pitcher(start_date: str, end_date: str) -> pd.DataFrame:
"""Fetch pitcher-perspective Statcast (player_name = pitcher name)."""
season = str(datetime.strptime(start_date, "%Y-%m-%d").year)
return _query_statcast(start_date, end_date, season=season, player_type="pitcher")
---
Fix C part 2: app.py β load pitcher statcast + probable starters
Add alongside load_statcast_previous_season_full():
@st.cache_data(ttl=60 * 60 * 12, show_spinner=False)
def load_statcast_previous_season_full_pitcher() -> pd.DataFrame:
"""2025 season, pitcher perspective β player_name = pitcher name."""
today = pd.Timestamp.utcnow().date()
previous_year = today.year - 1
start_date = pd.Timestamp(year=previous_year, month=1, day=1).date()
end_date = pd.Timestamp(year=previous_year, month=12, day=31).date()
from data.statcast import fetch_statcast_range_pitcher
from data.statcast import normalize_statcast
from features.pitch_features import add_pitch_features
raw = fetch_statcast_range_pitcher(start_date.isoformat(), end_date.isoformat())
normalized = normalize_statcast(raw)
return add_pitch_features(normalized)
Update the Props call site:
_statcast_for_props = load_statcast_recent()
if _statcast_for_props.empty:
_statcast_for_props = load_statcast_previous_season_full()
_pitcher_statcast = load_statcast_previous_season_full_pitcher()
from data.mlb_starters import fetch_probable_starters_for_props
_probable_starters = fetch_probable_starters_for_props() # {(away_team, home_team): {...}}
render_props(
_statcast_for_props,
conn=conn,
raw_props=load_upcoming_hr_props(),
pitcher_statcast_df=_pitcher_statcast,
probable_starters=_probable_starters,
)
---
Fix C part 3: data/mlb_starters.py (NEW FILE)
Fetches probable starters from MLB Stats API for the next 7 days. Returns a dict mapping
(away_team, home_team) -> {"home_pitcher": str, "away_pitcher": str} for use in props_mapper.
Uses the public statsapi.mlb.com/api/v1/schedule endpoint with hydrate=probablePitcher.
Cached in the calling layer (Streamlit cache_data). No API key required.
MLB API team IDs for team name matching β response includes teams.home.team.name and
teams.away.team.name alongside teams.home.probablePitcher.fullName.
# data/mlb_starters.py
from __future__ import annotations
import logging
from datetime import timedelta
import requests
import pandas as pd
_log = logging.getLogger(__name__)
_SCHEDULE_URL = "https://statsapi.mlb.com/api/v1/schedule"
def fetch_probable_starters_for_props() -> dict[tuple[str, str], dict[str, str | None]]:
"""
Returns {(away_team, home_team): {"home_pitcher": name_or_None, "away_pitcher": name_or_None}}
for all MLB games in the next 7 days.
"""
today = pd.Timestamp.utcnow().date()
end_date = today + timedelta(days=7)
params = {
"sportId": 1,
"startDate": today.isoformat(),
"endDate": end_date.isoformat(),
"hydrate": "probablePitcher",
"gameType": "R,F,D,L,W",
}
try:
r = requests.get(_SCHEDULE_URL, params=params, timeout=15)
r.raise_for_status()
except Exception as exc:
_log.warning("[mlb_starters] schedule fetch failed: %s", exc)
return {}
result: dict[tuple[str, str], dict[str, str | None]] = {}
for date_entry in r.json().get("dates", []):
for game in date_entry.get("games", []):
teams = game.get("teams", {})
away_name = teams.get("away", {}).get("team", {}).get("name", "")
home_name = teams.get("home", {}).get("team", {}).get("name", "")
away_pitcher = teams.get("away", {}).get("probablePitcher", {}).get("fullName")
home_pitcher = teams.get("home", {}).get("probablePitcher", {}).get("fullName")
if away_name and home_name:
result[(away_name, home_name)] = {
"home_pitcher": home_pitcher,
"away_pitcher": away_pitcher,
}
_log.warning("[mlb_starters] games_with_starters=%d", sum(1 for v in result.values() if v["home_pitcher"] or v["away_pitcher"]))
return result
Note: MLB API team names (e.g. "New York Yankees") may differ from props row team names (The
Odds API uses "New York Yankees" format). A fuzzy match or alias map may be needed; add
_normalize_team(name) helper that lowercases + strips punctuation for comparison.
---
Fix C part 4: analytics/props_mapper.py β full pre-game model stack
4a. Add HOME_TEAM_TO_STADIUM mapping (30 MLB teams β canonical stadium name)
# Maps Odds API home_team names β canonical names accepted by models/stadium_lookup.resolve_stadium()
HOME_TEAM_TO_STADIUM: dict[str, str] = {
"Baltimore Orioles": "oriole park at camden yards",
"Boston Red Sox": "fenway park",
"New York Yankees": "yankee stadium",
"Tampa Bay Rays": "tropicana field",
"Toronto Blue Jays": "rogers centre",
"Chicago White Sox": "guaranteed rate field",
"Cleveland Guardians": "progressive field",
"Detroit Tigers": "comerica park",
"Kansas City Royals": "kauffman stadium",
"Minnesota Twins": "target field",
"Houston Astros": "minute maid park",
"Los Angeles Angels": "angel stadium",
"Oakland Athletics": "athletics ballpark",
"Seattle Mariners": "t-mobile park",
"Texas Rangers": "globe life field",
"Atlanta Braves": "truist park",
"Miami Marlins": "loandepot park",
"New York Mets": "citi field",
"Philadelphia Phillies": "citizens bank park",
"Washington Nationals": "nationals park",
"Chicago Cubs": "wrigley field",
"Cincinnati Reds": "great american ball park",
"Milwaukee Brewers": "american family field",
"Pittsburgh Pirates": "pnc park",
"St. Louis Cardinals": "busch stadium",
"Arizona Diamondbacks": "chase field",
"Colorado Rockies": "coors field",
"Los Angeles Dodgers": "dodger stadium",
"San Diego Padres": "petco park",
"San Francisco Giants": "oracle park",
}
4b. Add batter team lookup helper
def _lookup_batter_team(
player_name_normalized: str,
props_row_away_team: str,
props_row_home_team: str,
statcast_df: pd.DataFrame,
) -> str | None:
"""
Returns "home" or "away" for which team the batter plays on, or None if unknown.
Uses statcast game records: find rows where this player appears and check if the
game's away_team or home_team matches the props row matchup.
"""
# Try to match batter's statcast rows against the props game teams
if statcast_df.empty or "player_name" not in statcast_df.columns:
return None
# (implementation: filter statcast by player_name, check home_team/away_team columns)
4c. Replace _get_pregame_context_adjustments() with _get_full_pregame_adjustments()
New signature:
def _get_full_pregame_adjustments(
props_row: Any,
batter_features: dict, # already computed batter baseline features
statcast_df: pd.DataFrame, # batter-perspective (player_name = batter)
pitcher_statcast_df: pd.DataFrame, # pitcher-perspective (player_name = pitcher)
probable_starters: dict, # {(away_team, home_team): {home_pitcher, away_pitcher}}
) -> tuple[float, str]: # (total_adj, source_detail_str)
Returns total additive HR probability adjustment and a pipe-separated source string
(e.g. "baseline+pitcher_quality+park+rolling_form+zone_matchup+arsenal_matchup").
Components applied in order, each try/except, no-op on failure:
1. Park factor β via HOME_TEAM_TO_STADIUM[home_team] β resolve_stadium() β
compute_park_adjustment() β clamp to Β±0.015
2. Probable pitcher lookup β match (away_team, home_team) in probable_starters dict
(fuzzy normalize both sides). Determine batter's team β select opposing starting pitcher.
3. Pitcher quality (proper) β build_pitcher_feature_row(pitcher_statcast_df, pitcher_name)
β compute_pitcher_adjustment(batter_features, pitcher_row) from
models/pitcher_adjustment.py. Use result's hr_adj directly. Clamp Β±0.010.
4. Zone matchup (if data available) β build_batter_zone_feature_row(statcast_df, player_name)
(from remote DB via batter_zone_store) + build_pitcher_zone_feature_row(pitcher_statcast_df, pitcher_name) β compute_zone_matchup_adjustment(). Use hr_zone_boost
- baseline_hr delta,
clamped Β±0.010.
5. Arsenal matchup β build_batter_arsenal_feature_row(statcast_df, statcast_name)
- build_pitcher_arsenal_feature_row(pitcher_statcast_df, pitcher_name)
β compute_arsenal_matchup_adjustment(). Use arsenal_hr_boost delta, clamped Β±0.010.
6. Rolling form β build_batter_rolling_form_row(statcast_df, statcast_name, reference_date=today)
- build_pitcher_rolling_form_row(pitcher_statcast_df, pitcher_name, reference_date=today)
β compute_upcoming_rolling_adjustment(batter_roll, pitcher_roll, batter_features, pitcher_row).
Use rolling_hr_adjustment directly.
4d. Update map_hr_props_to_model() signature
def map_hr_props_to_model(
props_df: pd.DataFrame,
statcast_df: pd.DataFrame,
prob_fn: ... | None = None,
pitcher_stats_df: pd.DataFrame | None = None, # existing param (kept)
pitcher_statcast_df: pd.DataFrame | None = None, # NEW: pitcher-perspective statcast
probable_starters: dict | None = None, # NEW: {(away,home): {pitchers}}
) -> pd.DataFrame:
Inside the per-row loop, replace _get_pregame_context_adjustments() call with
_get_full_pregame_adjustments() call using the new params.
---
Fix C part 5: visualization/props_page.py β pass new params
Update render_props() signature to accept and forward:
def render_props(
statcast_df: pd.DataFrame,
conn=None,
raw_props: pd.DataFrame | None = None,
pitcher_statcast_df: pd.DataFrame | None = None, # NEW
probable_starters: dict | None = None, # NEW
) -> None:
At line 107:
# CURRENT:
mapped = map_hr_props_to_model(filtered_raw, statcast_df)
# NEW:
mapped = map_hr_props_to_model(
filtered_raw,
statcast_df,
pitcher_statcast_df=pitcher_statcast_df,
probable_starters=probable_starters,
)
---
Expected Behavior After All Fixes
βββββββββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ¬ββββββββββββββββββ¬βββββββββββββββββββββββββββββββ
β Scenario β Books shown β Model HR% β Model components β
βββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββΌββββββββββββββββββΌβββββββββββββββββββββββββββββββ€
β Season in progress, all books posting β All books (Odds API + scraper) β Recent statcast β All pre-game models β
βββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββΌββββββββββββββββββΌβββββββββββββββββββββββββββββββ€
β Pre-season, starters announced β 4 books (scraper) β 2025 season β All pre-game models β
βββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββΌββββββββββββββββββΌβββββββββββββββββββββββββββββββ€
β Pre-season, no starters yet β 4 books (scraper) β 2025 season β Baseline + park (no pitcher) β
βββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββΌββββββββββββββββββΌβββββββββββββββββββββββββββββββ€
β All providers fail β "No HR props" warning β β β β β
βββββββββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββ΄ββββββββββββββββββ΄βββββββββββββββββββββββββββββββ
---
Verification
1. Props tab loads without error
2. Multiple sportsbooks appear in Book column
3. Model HR% column shows percentages for most players
4. Source column shows enriched source string (e.g. "baseline+pitcher_quality+park")
5. Check logs: [fetch_all_upcoming_hr_props] unique_books= shows 2+ books
6. Check logs: [mlb_starters] games_with_starters=N shows >0 when games are announced
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Claude has written up a plan and is ready to execute. Would you like to proceed?
β― 1. Yes, auto-accept edits
2. Yes, manually approve edits
3. Type here to tell Claude what to change |