FAQ
Is this model for all football leagues?
No. This public release is intended only for Spanish La Liga.
Can I use it for Premier League or Champions League?
Not as a validated release. You can experiment privately, but the published bundle is not documented or evaluated for those competitions.
Why do I need a history CSV?
Because predict_match(...) builds features from past match data.
The model does not fetch live match history by itself.
In simple terms:
- the model contains learned prediction patterns
- the history CSV provides the current football context needed for a future fixture
Without that context, the package cannot know:
- each team's recent form
- recent goals for and against
- recent rest timing
- recent tactical identity signals
This package should be understood as:
- a model-and-inference bundle
- not a bundled football data service
Do I need to manually enter fields like home_player_red_cards_total_prev5?
No, not for the normal public flow. That is exactly why the bundle includes a feature-building wrapper. Most users should use:
predict_match(...)predict_match_simple(...)
The raw feature interface is only for advanced users who already manage engineered features themselves.
What does abstain_recommended mean?
It means the fixture is fragile enough that the exact score should be treated cautiously. In those cases, the scoreline is less trustworthy than the overall probability shape.
What is xg_delta?
xg_delta means:
expected_home_goals - expected_away_goals
How to read it:
- positive: the home side has the stronger expected scoring outlook
- negative: the away side has the stronger expected scoring outlook
- near zero: the match is more balanced
Why are there simple and advanced outputs?
Because many products only need a few fields, while developers may want more diagnostics.
Use these for a simple product-facing response:
predict_match_simple(...)predict_features_simple(...)
Use the full methods if you want richer diagnostics.
What if my CSV only has the minimum columns?
The bundle will still run. But prediction quality may be weaker because richer fields will use fallback defaults.
Why does the model talk about 48 signals if the sample CSV has fewer raw columns?
Because the wrapper builds the final model input row before inference.
Some signals come from:
- raw columns already present in the history CSV
- rolling features derived from past match results
- fallback defaults when richer optional columns are missing
So:
- the model expects
48numeric signals at prediction time - your history CSV does not need to contain all
48as raw columns - but a richer CSV helps the wrapper build a stronger feature row
Will I get the exact same answers as your internal environment?
Not necessarily.
You should expect the same model logic, but not guaranteed identical predictions unless the input history data is also effectively identical.
Differences in:
- historical rows
- team IDs
- Elo values
- tactical IDs
- coach IDs
- rolling-form inputs
can all change the final prediction.
How should I name teams in my CSV?
Keep naming consistent. The wrapper normalizes common variations, including accent and case differences, but stable naming is still best.
Examples:
Atletico MadridandAtlético de MadridMallorcaandReal Mallorca
Are the sample CSV files real production data?
No. They are synthetic examples included only to make the bundle runnable out of the box.
They are useful for:
- learning the package shape
- testing integration
- understanding the expected CSV schema
They are not meant to represent:
- a full production dataset
- a complete public La Liga historical archive
- the exact private environment used during internal experimentation
Can I use the package without providing historical data?
Not for predict_match(...).
If you do not want to provide historical data, your alternative is the advanced path:
predict_features(...)predict_features_simple(...)
Those methods require you to provide the engineered numeric features directly.
Are we expected to publish large historical CSV files too?
No. In most cases you should not publish large internal historical datasets unless redistribution rights are explicit. The safer pattern is:
- publish the model bundle
- publish schema docs
- publish synthetic samples
- let users bring their own historical CSV