Upload FAQ.md with huggingface_hub

e435ddf verified 17 days ago

4.47 kB

	# FAQ

	## Is this model for all football leagues?
	No.
	This public release is intended only for Spanish La Liga.

	## Can I use it for Premier League or Champions League?
	Not as a validated release.
	You can experiment privately, but the published bundle is not documented or evaluated for those competitions.

	## Why do I need a history CSV?
	Because `predict_match(...)` builds features from past match data.
	The model does not fetch live match history by itself.

	In simple terms:
	- the model contains learned prediction patterns
	- the history CSV provides the current football context needed for a future fixture

	Without that context, the package cannot know:
	- each team's recent form
	- recent goals for and against
	- recent rest timing
	- recent tactical identity signals

	This package should be understood as:
	- a model-and-inference bundle
	- not a bundled football data service

	## Do I need to manually enter fields like `home_player_red_cards_total_prev5`?
	No, not for the normal public flow.
	That is exactly why the bundle includes a feature-building wrapper.
	Most users should use:
	- `predict_match(...)`
	- `predict_match_simple(...)`

	The raw feature interface is only for advanced users who already manage engineered features themselves.

	## What does `abstain_recommended` mean?
	It means the fixture is fragile enough that the exact score should be treated cautiously.
	In those cases, the scoreline is less trustworthy than the overall probability shape.

	## What is `xg_delta`?
	`xg_delta` means:
	- `expected_home_goals - expected_away_goals`

	How to read it:
	- positive: the home side has the stronger expected scoring outlook
	- negative: the away side has the stronger expected scoring outlook
	- near zero: the match is more balanced

	## Why are there simple and advanced outputs?
	Because many products only need a few fields, while developers may want more diagnostics.

	Use these for a simple product-facing response:
	- `predict_match_simple(...)`
	- `predict_features_simple(...)`

	Use the full methods if you want richer diagnostics.

	## What if my CSV only has the minimum columns?
	The bundle will still run.
	But prediction quality may be weaker because richer fields will use fallback defaults.

	## Why does the model talk about 48 signals if the sample CSV has fewer raw columns?
	Because the wrapper builds the final model input row before inference.

	Some signals come from:
	- raw columns already present in the history CSV
	- rolling features derived from past match results
	- fallback defaults when richer optional columns are missing

	So:
	- the model expects `48` numeric signals at prediction time
	- your history CSV does not need to contain all `48` as raw columns
	- but a richer CSV helps the wrapper build a stronger feature row

	## Will I get the exact same answers as your internal environment?
	Not necessarily.

	You should expect the same model logic, but not guaranteed identical predictions unless the input history data is also effectively identical.

	Differences in:
	- historical rows
	- team IDs
	- Elo values
	- tactical IDs
	- coach IDs
	- rolling-form inputs

	can all change the final prediction.

	## How should I name teams in my CSV?
	Keep naming consistent.
	The wrapper normalizes common variations, including accent and case differences, but stable naming is still best.

	Examples:
	- `Atletico Madrid` and `Atlético de Madrid`
	- `Mallorca` and `Real Mallorca`

	## Are the sample CSV files real production data?
	No.
	They are synthetic examples included only to make the bundle runnable out of the box.

	They are useful for:
	- learning the package shape
	- testing integration
	- understanding the expected CSV schema

	They are not meant to represent:
	- a full production dataset
	- a complete public La Liga historical archive
	- the exact private environment used during internal experimentation

	## Can I use the package without providing historical data?
	Not for `predict_match(...)`.

	If you do not want to provide historical data, your alternative is the advanced path:
	- `predict_features(...)`
	- `predict_features_simple(...)`

	Those methods require you to provide the engineered numeric features directly.

	## Are we expected to publish large historical CSV files too?
	No.
	In most cases you should not publish large internal historical datasets unless redistribution rights are explicit.
	The safer pattern is:
	- publish the model bundle
	- publish schema docs
	- publish synthetic samples
	- let users bring their own historical CSV