SONAR|HES-SO

SONAR|HES-SO

SONAR|HES-SO regroupe les travaux de bachelor et master diffusables de plusieurs écoles de la HES-SO. Consultez cette page pour le détails.

En cas de question, merci de contacter les bibliothécaires de la HES-SO : bibliotheques(at)hes-so.ch

Bachelor thesis

Addressing evaluation challenges on the expected goals (xG) metric in football analysis

  • Genève : Haute école de gestion de Genève

65 p.

Bachelor of Science HES-SO en Informatique de gestion: Haute école de gestion de Genève, 2024

English The Expected Goals (xG) metric in football analysis measures the probability of a shot resulting in a goal using historical data. This study evaluates various machine learning models, including Logistic Regression, Random Forest, and XGBoost, against StatsBomb xG values to address evaluation challenges in xG models.
We analyzed data from over 3000 games and 10 million events from the StatsBomb dataset, focusing on features like shot angle and distance. The models were evaluated using precision, recall, F1 score, and ROC-AUC due to significant class imbalance, where non-goal events far outnumber goal events. Traditional accuracy metrics were less effective.
Calibration plots showed that models were well-calibrated up to a probability of 0.5 but fluctuated beyond this. Logistic Regression aligned closely with actual xG values, while all models tended to underestimate high xG values. The study highlighted the importance of high-quality contextual data over model complexity and found hyperparameter tuning less effective.
Addressing class imbalance was identified as crucial for developing high-performing models, as it significantly impacts their accuracy and reliability. The recommended strategies include resampling techniques like oversampling the minority class or undersampling the majority class, adjusting class weights to give more importance to the minority class, and using Stratified K-Fold Cross-Validation to ensure balanced class proportions during model training and validation. Implementing these methods could improve the overall performance and robustness of the models.
Continuous validation with real-world data is essential for model relevance and accuracy. The study also emphasized the need for more data on female players to improve genderspecific shooting pattern analysis and model inclusivity.
In conclusion, developing accurate xG models in football analytics requires robust performance metrics, handling class imbalance, ensuring data quality, and ongoing realworld validation. Future work should mostly focus on data refining and calibration methods but also advanced modelling techniques to improve predictive accuracy and reliability.
Language
  • English
Classification
Computer science and technology
Notes
  • Haute école de gestion Genève
  • Informatique de gestion
  • hesso:hegge
Persistent URL
https://sonar.rero.ch/hesso/documents/331251
Statistics

Document views: 16 File downloads:
  • Senn-William.pdf: 26