This project aims to predict the outcomes of horse races using machine learning techniques. The dataset spans from 1990 to 2020 and includes detailed information on horse races and individual horses. Given the complexity and unpredictability of horse racing, this project explores various machine learning models and feature engineering techniques to enhance prediction accuracy.
- Data Cleansing
- Exploratory Data Analysis (EDA)
- Visualization
- Machine Learning
- Sports
The dataset consists of two main types of files:
- Races and Horses Data: Available for each year from 1990 to 2020.
- Forward Data (forward.csv): Contains information collected prior to race starts, including average odds from Oddschecker.com and current RPR and TR values.
- Detailed information on individual horses and race outcomes.
- Predict the outcome of horse races (e.g., win or place).
- Identify significant features affecting race outcomes.
- Address the imbalanced nature of the dataset.
- Develop a robust prediction model using historical data.
- Handle missing values.
- Normalize data (e.g., times, distances).
- Convert categorical variables to numerical formats.
- Create new features from existing data (e.g., performance metrics).
- Aggregate features across multiple races to identify trends.
- Merge race and horse datasets to create a comprehensive dataset.
- Summary statistics and distribution plots of key features.
- Correlation matrix to identify relationships between features.
- Feature importance analysis.
- Use of scatter plots, histograms, box plots, and heatmaps for data visualization.
- Evaluate various machine learning models (e.g., Regression, Random Forest, Gradient Boosting, Neural Networks).
- Use cross-validation for performance assessment.
- Techniques like SMOTE (Synthetic Minority Over-sampling Technique), under-sampling, and class weight adjustments.
- Recursive Feature Elimination (RFE) and regularization techniques.
- Grid search and random search for optimal hyperparameters.
This readme provides an overview of the Horse Race Prediction project, detailing its objectives, dataset, methodologies, and expected outcomes. For further details, refer to the project repository and documentation.
This project is licensed under the MIT License - see the LICENSE file for details.