Skip to content

Predict horse race outcomes using machine learning! This project leverages a dataset from 1990-2020, exploring various models and feature engineering techniques to improve prediction accuracy in the complex world of horse racing. Join us in this exciting challenge!

License

Notifications You must be signed in to change notification settings

Samuelson777/Horse-Race-Prediction

Repository files navigation

Horse Race Prediction Project

Project Overview

This project aims to predict the outcomes of horse races using machine learning techniques. The dataset spans from 1990 to 2020 and includes detailed information on horse races and individual horses. Given the complexity and unpredictability of horse racing, this project explores various machine learning models and feature engineering techniques to enhance prediction accuracy.

Technologies Used

  • Data Cleansing
  • Exploratory Data Analysis (EDA)
  • Visualization
  • Machine Learning

Domain

  • Sports

Dataset Description

The dataset consists of two main types of files:

  • Races and Horses Data: Available for each year from 1990 to 2020.
  • Forward Data (forward.csv): Contains information collected prior to race starts, including average odds from Oddschecker.com and current RPR and TR values.

Key Features

  • Detailed information on individual horses and race outcomes.

Project Goals

Primary Goal

  • Predict the outcome of horse races (e.g., win or place).

Secondary Goals

  • Identify significant features affecting race outcomes.
  • Address the imbalanced nature of the dataset.
  • Develop a robust prediction model using historical data.

Data Preprocessing

Data Cleaning

  • Handle missing values.
  • Normalize data (e.g., times, distances).
  • Convert categorical variables to numerical formats.

Feature Engineering

  • Create new features from existing data (e.g., performance metrics).
  • Aggregate features across multiple races to identify trends.

Data Integration

  • Merge race and horse datasets to create a comprehensive dataset.

Exploratory Data Analysis (EDA)

Descriptive Statistics

  • Summary statistics and distribution plots of key features.

Correlation Analysis

  • Correlation matrix to identify relationships between features.
  • Feature importance analysis.

Visualization

  • Use of scatter plots, histograms, box plots, and heatmaps for data visualization.

Modeling Approach

Model Selection

  • Evaluate various machine learning models (e.g., Regression, Random Forest, Gradient Boosting, Neural Networks).
  • Use cross-validation for performance assessment.

Handling Imbalanced Data

  • Techniques like SMOTE (Synthetic Minority Over-sampling Technique), under-sampling, and class weight adjustments.

Feature Selection

  • Recursive Feature Elimination (RFE) and regularization techniques.

Hyperparameter Tuning

  • Grid search and random search for optimal hyperparameters.

Contribution

This readme provides an overview of the Horse Race Prediction project, detailing its objectives, dataset, methodologies, and expected outcomes. For further details, refer to the project repository and documentation.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Predict horse race outcomes using machine learning! This project leverages a dataset from 1990-2020, exploring various models and feature engineering techniques to improve prediction accuracy in the complex world of horse racing. Join us in this exciting challenge!

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published