OpenML
Cricket-Test-Matches-Inningswise-between-1900-2021

Cricket-Test-Matches-Inningswise-between-1900-2021

active ARFF Database: Open Database, Contents: Database Contents Visibility: public Uploaded 24-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Machine Learning
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context Cricket based datasets are not readily available for analysis on most of the portals. Hence, an effort to provide test match data for 2200+ tests after cleaning it for very special cases. Content The columns are - MatchKey (Unique key for each match calculated by me) BattingTeam Score (made by BattingTeam in that innings) Wickets (lost by the batting team at end of innings) InningsResult (for the Batting team, whether they declared or not in that innings) Overs (bowled by Opponent/ faced by Batting team) RPO (Run Rate per Over) = Score/Overs Lead (Overall match lead by Batting team till that innings end) Inns (Innings number for each match). Result (with respect to Batting team) Opposition (the bowling team in that innings) HostCountry (where the match was played) StartDate (of the test match) An over has 6 balls to be bowled. The decimal value in overs represents number of balls bowled in the last over. So maximum value of decimal place can be 0.5 whereas 0.6 gets converted to 1. For statistical analysis, it might be prudent to convert decimal value of overs like: 0.1 -- 0.166, 0.2 -- 0.33, 0.3 -- 0.5 and so on. Same team may or may not play 2 continuous innings. Inspiration Predict match result after end of 3rd innings. In cricket, generally fans have a good idea of a likely result given the status of teams after end of 3 innings, and sometimes even after 2 innings! How about putting a probability value for each of the 3 possible result? The scope of further tasks is enormous.

14 features

MatchKeynumeric2249 unique values
0 missing
BattingTeamstring13 unique values
0 missing
Scorenumeric681 unique values
0 missing
Wicketsnumeric11 unique values
0 missing
InningsResultstring2 unique values
0 missing
Oversnumeric1208 unique values
0 missing
RPOstring500 unique values
0 missing
Leadnumeric1060 unique values
0 missing
Innsnumeric4 unique values
0 missing
Resultstring3 unique values
0 missing
Oppositionstring13 unique values
0 missing
HostCountrystring11 unique values
0 missing
StartDatestring2083 unique values
0 missing
Unnamed:_13numeric0 unique values
8291 missing

19 properties

8291
Number of instances (rows) of the dataset.
14
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
8291
Number of missing values in the dataset.
8291
Number of instances with at least one value missing.
7
Number of numeric attributes.
0
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
50
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
100
Percentage of instances having missing values.
Average class difference between consecutive instances.
7.14
Percentage of missing values.

0 tasks

Define a new task