Model Calibration
How accurate are the ELO-based win probability estimates? Each game's pre-match ELO ratings are used to predict the outcome, then compared to what actually happened across 64098 historical games.
Calibration Curve
Each dot is a bucket of games grouped by predicted win probability. The diagonal line is perfect calibration — dots above mean the model underestimates; dots below mean it overestimates. Dot size reflects number of games in that bucket.
Game Count by Predicted Probability
How many games fall into each 5% probability bucket — a well-spread distribution means the model uses its full range.
How It Works
ELO ratings are maintained per (team, age-group) pair. Before each game, each team has a rating; the expected win probability for the home side is:
P(home wins) = 1 / (1 + 10(ELO_away − ELO_home) / 400)
K-factor is 32. Ratings regress 40% toward 1500 between seasons to account for roster turnover. The season simulation uses these ratings plus a fixed 22% draw probability to run 10,000 Monte Carlo trials per flight.