About

Outline

Introduction
Motivation
Goals and Research Question
Literature Review
Project Outline
Resource Downloads

"Football is a simple game. Twenty-two men chase a ball for 90 minutes and at the end, the Germans always win." – Gary Lineker

I. Introduction

The FIFA World Cup is the pinnacle of international association football competition. Every four years, the World Cup captures the imagination of billions of people around the globe, and the World Cup final is the most-watched event on television. Part of what makes the World Cup so intriguing is that it always seems to be full of surprises, making it famously unpredictable. Even Gary Lineker’s famous quote was proven wrong – Germany, the defending world champions, crashed out of the group stage in dramatic fashion at the 2018 World Cup, the third time in a row the defending champions fail to make it to the knockout round. Who could have seen that coming? And does that mean football is truly unpredictable?

While there always remains a certain amount of randomness and luck in the playing of football games, various objective and numerical metrics can give reasonably sound indications of the relative strength of various teams. One such metric is FIFA’s own World Ranking method for men’s football, which orders teams based on past performances and successes. However, this method has often led to an underestimation of lower-ranked teams and has often been accused of being biased towards larger countries. A custom Win, Importance, and Goal-adjusted GlickO (WIGGO) ranking method developed by one of our group members with help from Prof. Rader uses a modified Glicko model to provide for a more accurate and updated ranking of teams. These metrics, when combined with previous match results, can be useful in predicting the outcome of an upcoming match based on the relative abilities and past performances of the two teams competing in the match.

II. Motivation

Due to the increasing profitability of sports betting in recent years, the need to accurately predict football games has grown immensely. While football pundits have always rendered predictions that are at least somewhat accurate, the recent advent of sports statistics websites such as FiveThirtyEight’s World Cup prediction model have proven extremely popular. Alternatively, sports predictions using neural networks (primarily feedforward networks such as multilayer perceptrons) have achieved high predictive accuracies in a variety of sports and contexts, including past World Cups.

Furthermore, there exists a sense among football purists that football as a sport will never “succumb” to the same statistical revolution that has swept through American sports like baseball, basketball, and even American football. Building a robust mathematical model that makes accurate football predictions would show that there is room for statistics and football to coexist, and, as data scientists, that would make us very happy.

III. Goals and Research Question

Because we are working with a unique dataset in the form of WIGGO Rankings and Ratings, our research questions and objectives are two-fold:

Research Question 1: How do the FIFA Ranking and WIGGO Ranking systems compare in their abilities to predict World Cup game outcomes?
Research Question 2: Can we build a statistical model that predicts World Cup game outcomes that performs as well or better than the traditional “experts” – that is, pundits and bookmakers?

Objective 1: To compare the predictive power of the standard FIFA Rankings and WIGGO Ranking in their abilities to predict World Cup game outcomes.
Objective 2: Using a combination of previous match results, match location data, and fatigue metrics to our two available Ranking models, to create a Stacked model that can predict the outcomes of games played at the 2018 FIFA World Cup with higher accuracy than Las Vegas bookmakers.

IV. Literature Review

The most relevant research on the subject of men’s international football rankings has been carried out by Lasek et al. (2012), who showed that ranking methods used in other contexts clearly outperform the FIFA Ranking when it comes to predicting match outcomes. Their research measured the performance of the FIFA Ranking and Elo-based ranking methods on two separate prediction accuracy metrics. Out of all the models they tested, they identified a variant of the Elo model currently in use for the FIFA Women’s World Ranking and another used by the website Eloratings.net as the strongest-performing individual ranking methods.

Further research by one of our team members, under instruction of Kevin Rader of the Harvard Statistics Department Faculty, has found that Glicko-based models perform even better than Elo-based models. That research yielded a new international football ranking called WIGGO that outperformed the best models previously identified by Lasek and even other variants of the Glicko system.

"The predictive power of ranking systems in association football," Lasek et al. Int. J. Applied Pattern Recognition, Vol. 1, No. 1, 2013.

“A Better FIFA Ranking,” Bieler, Goldberg, and Wiggins. Stat 91r Independent Research Project with Kevin Rader. Unpublished.

V. Project Outline

Outline our available data with helpful visualizations
Provide a detailed description of the WIGGO Ranking model
Compare WIGGO and FIFA in terms of their usefulness in predicting match outcomes
Build basic baseline models for predicting match outcomes
Improve upon our basic baseline models for predicting match outcomes
Compare the predictive accuracy of our final models to the accuracy of bookmakers in the context of the 2018 FIFA World Cup

VI. Resource Downloads

To download the .ZIP file with the Jupyter Notebooks for this project, click the download link below.