Context
Cricket based datasets are not readily available for analysis on most of the portals. Hence, an effort to provide test match data for 2200+ tests after cleaning it for very special cases.
Content
The columns are -
MatchKey (Unique key for each match calculated by me)
BattingTeam
Score (made by BattingTeam in that innings)
Wickets (lost by the batting team at end of innings)
InningsResult (for the Batting team, whether they declared or not in that innings)
Overs (bowled by Opponent/ faced by Batting team)
RPO (Run Rate per Over) = Score/Overs
Lead (Overall match lead by Batting team till that innings end)
Inns (Innings number for each match).
Result (with respect to Batting team)
Opposition (the bowling team in that innings)
HostCountry (where the match was played)
StartDate (of the test match)
An over has 6 balls to be bowled.
The decimal value in overs represents number of balls bowled in the last over. So maximum value of decimal place can be 0.5 whereas 0.6 gets converted to 1.
For statistical analysis, it might be prudent to convert decimal value of overs like: 0.1 -- 0.166, 0.2 -- 0.33, 0.3 -- 0.5 and so on.
Same team may or may not play 2 continuous innings.
Inspiration
Predict match result after end of 3rd innings. In cricket, generally fans have a good idea of a likely result given the status of teams after end of 3 innings, and sometimes even after 2 innings! How about putting a probability value for each of the 3 possible result?
The scope of further tasks is enormous.