Cricket is a sport where individual performances of players can impact the outcome of a match. This research paper proposes an LSTM-based approach for predicting the retirement age of test batters in cricket. The aim is to build a model that can accurately predict the retirement age of a player using their historical performance data. The proposed model can assist cricket teams in selecting and managing players more effectively, as well as aid cricket analysts in understanding the factors that influence the retirement age of test batters.
LSTM (Long Short-Term Memory) is a type of recurrent neural network (RNN) architecture that is designed to address the vanishing gradient problem, which can occur when training traditional RNNs. In an LSTM network, the model can selectively remember or forget information over time, making it particularly useful for processing sequential data such as natural language text, speech, or time series data.
The architecture of an LSTM network includes a series of memory blocks that interact with each other, with each block containing several "gates" that regulate the flow of information. These gates are responsible for selectively adding or removing information from the memory block, based on whether it is deemed relevant or not.
LSTMs have been successfully applied in a wide range of tasks, such as language modeling, speech recognition, and time series forecasting, among others.
- To predict the total number of innings a player will play in their career.
- To predict the retirement age with the help of data predicted by the first model.
- We collected the data for this study from a publicly available source on the HowSTAT website which includes cricket archives and statistical databases.
- The dataset consists of a total of 100 retired test batters and 10 active batters, with each player having different number of innings.
- The batting statistics for each innings include runs scored, balls faced, cumulative runs and other relevant metrics.
It involves the following stages:
- Discretization: Fetching the data relevant for the research.
- Normalization: Normalizing the values of the attributes on the same scale to make the assessment easier.
- Cleaning: Filling the missing values in the data to avoid any discrepancy.
- Integration: Integration of data files.
After the transformation the dataset is divided into training and testing dataset. Testing data is kept only 25% of the training data.
The final Features given to the first model are:
- Current Age (in years)
- Debut Age (in years)
- Innings Played till date
- Cumulative runs scored
- Number of fifties scored
- Number of innings in last 3 years
The final features given to the second model are:
- Debut Age (in years)
- Total innings predicted from model 1
- Cumulative runs scored
- Number of fifties scored