OpenML
Football---Expected-Goals-Match-Statistics

Football---Expected-Goals-Match-Statistics

active ARFF CC0: Public Domain Visibility: public Uploaded 24-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Machine Learning
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context In recent years statisticians and data scientists alike have been trying to come up with new ways to evaluate team performance in Football. Sometimes a result is not a fair reflection on a teams performance, and this is where expected goals come in. Expected goals is a relatively new football metric, using quality of passing and goalscoring opportunities to rank a teams performance. Understat.com provides these statistics by using neural networks to approximate this data and I have therefore scraped statistics for matches played between the 2014-15 and 2019-2020 seasons to provide the following dataset. The Leagues included in this representation are: English Premier League La Liga Bundesliga Serie A Ligue 1 Russian Football Premier League Content The dataset contains 22 columns, a lot of which will be self explanatory such as date, home team etc. Some of the less common features will be outlined below: Chance - the percentage prediction of an outcome based on expected goals. Expected Goals - the number of goals a team is expected to score based on performance. Deep - number of passes completed within an estimated 20 yards from goal. PPDA - number of passes allowed per defensive action in the opposition half. Expected Points - number of points a team is expected to achieve in this game. Inspiration Is the expected goals feature an accurate representation of a teams performance? How can this feature be improved? Can we predict the outcome of future games based on previous games?

22 features

Unnamed:_0numeric10791 unique values
0 missing
Datestring1072 unique values
0 missing
Leaguestring6 unique values
0 missing
Home_Teamstring163 unique values
0 missing
Away_Teamstring163 unique values
0 missing
Home_Chance_%string87 unique values
0 missing
Draw_Chance_%string62 unique values
0 missing
Away_Chance_%string87 unique values
0 missing
Home_Goalsnumeric11 unique values
0 missing
Away_Goalsnumeric10 unique values
0 missing
Home_Expected_Goalsnumeric488 unique values
0 missing
Away_Expected_Goalsnumeric417 unique values
0 missing
Home_Shotsnumeric43 unique values
0 missing
Away_Shotsnumeric35 unique values
0 missing
Home_Shots_on_Targetnumeric19 unique values
0 missing
Away_Shots_on_Targetnumeric16 unique values
0 missing
Home_Deepnumeric36 unique values
0 missing
Away_Deepnumeric30 unique values
0 missing
Home_PPDAnumeric1922 unique values
0 missing
Away_PPDAnumeric2139 unique values
0 missing
Home_Expected_Pointsnumeric301 unique values
0 missing
Away_Expected_Pointsnumeric301 unique values
0 missing

19 properties

10791
Number of instances (rows) of the dataset.
22
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
15
Number of numeric attributes.
0
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
68.18
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
Average class difference between consecutive instances.
0
Percentage of missing values.

0 tasks

Define a new task