OpenML
All-Time-Premier-League-Player-Statistics

All-Time-Premier-League-Player-Statistics

active ARFF CC BY-NC-SA 4.0 Visibility: public Uploaded 23-03-2022 by Onur Yildirim
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Machine Learning
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context I am a really huge football fan and the Premier League is one of my favourite football (or soccer, whatever you like to call it) leagues. So, as my very first dataset, I thought this would be a great opportunity for me to make a dataset of player statistics of all seasons from the Premier League. The Premier League, often referred to as the English Premier League or the EPL outside England, is the top level of the English football league system. Contested by 20 clubs, it operates on a system of promotion and relegation with the English Football League (EFL). Contested by 20 clubs, it operates on a system of promotion and relegation with the English Football League. Home to some of the most famous clubs, players, managers and stadiums in world football, the Premier League is the most-watched league on the planet with one billion homes watching the action in 188 countries.The league takes place between August and May and involves the teams playing each other home and away across the season, a total of 380 matches. Three points are awarded for a win, one point for a draw and none for a defeat, with the team with the most points at the end of the season winning the Premier League title. The teams that finish in the bottom three of the league table at the end of the campaign are relegated to the Championship, the second tier of English football. Those teams are replaced by three clubs promoted from the Championship; the sides that finish in first and second place and the third via the end-of-season playoffs. Details about the dataset Some players of certain position may not have certain statistics - For example, A goalkeeper may not have a statistic for "Shot Accuracy" The format for the filename is - dataset - yyyy-mm-dd Date (The date is date when the file was last updated on) Content The data was acquired from: https://www.premierleague.com/ I made a BeautifulSoup4 Web Scrapper in Python3 which automatically outputs a csv file of all the player statistics. The runtime of the file is about 20 minutes but it varies with the bandwidth of the Internet connection. I made this program so that this dataset could be updated weekly. The reason for weekly update is that the statistics change after each match played by the player so I felt that for the most up-to-date results, such a program is needed. Planning this project took 2 days. Making the program in Python3 took 7 days and the testing and bug fixing took another 5 days. The project was completed in the span of 2 weeks. Acknowledgements Source credits : https://www.premierleague.com/ Image credits : https://rb.gy/wuiwth Inspiration How do variables like age, nationality and club affect the player performance? Known issues in the dataset Goals per match displays an abnormally high value for a few players as the HTML displays incorrect value during first few milliseconds of loading the page. I am trying to fix it analytically rather than scrapping directly from the website.

59 features

Namestring571 unique values
0 missing
Jersey_Numbernumeric67 unique values
8 missing
Clubstring20 unique values
0 missing
Positionstring4 unique values
0 missing
Nationalitystring57 unique values
1 missing
Agenumeric22 unique values
1 missing
Appearancesnumeric198 unique values
0 missing
Winsnumeric119 unique values
0 missing
Lossesnumeric101 unique values
0 missing
Goalsnumeric55 unique values
0 missing
Goals_per_matchnumeric71 unique values
262 missing
Headed_goalsnumeric20 unique values
69 missing
Goals_with_right_footnumeric43 unique values
69 missing
Goals_with_left_footnumeric29 unique values
69 missing
Penalties_scorednumeric16 unique values
262 missing
Freekicks_scorednumeric8 unique values
262 missing
Shotsnumeric153 unique values
262 missing
Shots_on_targetnumeric105 unique values
262 missing
Shooting_accuracy_%string50 unique values
262 missing
Hit_woodworknumeric23 unique values
69 missing
Big_chances_missednumeric48 unique values
262 missing
Clean_sheetsnumeric65 unique values
309 missing
Goals_concedednumeric120 unique values
309 missing
Tacklesnumeric213 unique values
69 missing
Tackle_success_%string54 unique values
181 missing
Last_man_tacklesnumeric12 unique values
378 missing
Blocked_shotsnumeric96 unique values
69 missing
Interceptionsnumeric179 unique values
69 missing
Clearancesnumeric216 unique values
69 missing
Headed_Clearancenumeric166 unique values
69 missing
Clearances_off_linenumeric9 unique values
378 missing
Recoveriesnumeric253 unique values
181 missing
Duels_wonnumeric266 unique values
181 missing
Duels_lostnumeric257 unique values
181 missing
Successful_50/50snumeric137 unique values
181 missing
Aerial_battles_wonnumeric167 unique values
181 missing
Aerial_battles_lostnumeric179 unique values
181 missing
Own_goalsnumeric7 unique values
309 missing
Errors_leading_to_goalnumeric16 unique values
112 missing
Assistsnumeric43 unique values
0 missing
Passesnumeric448 unique values
0 missing
Passes_per_matchnumeric443 unique values
0 missing
Big_chances_creatednumeric54 unique values
69 missing
Crossesnumeric196 unique values
69 missing
Cross_accuracy_%string44 unique values
181 missing
Through_ballsnumeric54 unique values
181 missing
Accurate_long_ballsnumeric232 unique values
112 missing
Savesnumeric44 unique values
502 missing
Penalties_savednumeric8 unique values
502 missing
Punchesnumeric32 unique values
502 missing
High_Claimsnumeric36 unique values
502 missing
Catchesnumeric28 unique values
502 missing
Sweeper_clearancesnumeric35 unique values
502 missing
Throw_outsnumeric44 unique values
502 missing
Goal_Kicksnumeric45 unique values
502 missing
Yellow_cardsnumeric49 unique values
0 missing
Red_cardsnumeric6 unique values
0 missing
Foulsnumeric173 unique values
0 missing
Offsidesnumeric67 unique values
69 missing

19 properties

571
Number of instances (rows) of the dataset.
59
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
10224
Number of missing values in the dataset.
571
Number of instances with at least one value missing.
52
Number of numeric attributes.
0
Number of nominal attributes.
0.1
Number of attributes divided by the number of instances.
88.14
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
100
Percentage of instances having missing values.
Average class difference between consecutive instances.
30.35
Percentage of missing values.

0 tasks

Define a new task