LEVERAGING CLUSTERING FOR BOOKING PATTERN FORECASTING: A CASE STUDY

Introduction

This thesis project focuses on evaluating various models for forecasting booking patterns, utilizing data from Sunweb, a leading holiday package provider in Europe. The core objective is to determine the effectiveness of incorporating clustering into forecasting models. The rationale behind clustering is straightforward: by training models at the cluster level, where more homogeneous booking patterns exist, models can better capture underlying trends. By conducting a comparative analysis, I aim to assess the value added by clustering compared to traditional forecasting models. Therefore, this study will evaluate the performance of ensemble models that integrate information from clustering alongside other forecasting methods and will be compared to a benchmark model.

Research Question: To what extent can clustering techniques improve the accuracy of booking forecasts for holiday packages?

Navigation in Repository

├── code/
│   ├── forecasting_models.ipynb/
│   │   ├── Data preparation
│   │   ├── Model definition 
│   │   ├── Clustering
│   │   ├── Ensemble models
|   |   └── Evaluation 
│   └── clustering.ipynb/
│       ├── Data preparation
│       ├── Clustering
│       └── Understand Clustering
├── visualizations/
│   ├── visualisations.ipynb
│   └── figures(13)    
├── README.md
├── cheatsheet.md
├── requirements.txt
└── final_results_tables.md

File Descriptions

forecasting_models.ipynb : includes all the code for all the forecasting models (1 benchmark model, 4 other forecasting models, 6 ensemble models). This includes hyperparameter tuning, model selection, and evaluation
clustering.ipynb: includes all code for clustering optimization, visualisations and insights generated. (3 clustering models)
visualisations.ipynb : includes all code to reproduce all figures used in this thesis

Description of the dataset

The dataset used in this analysis consists of cross-sectional data indexed by every holiday package’s departure week. For each departure week, the dataset includes information on the destination airport, destination country, and the number of passengers who booked a holiday package during a specific booking week. Additionally, for the booking weeks present in the dataset, the corresponding revenue and margin generated by Sunweb is captured.

Forecasting models used

Linear Regression
- Trained under two scenarios: model 4.1.2.1 and model 4.1.2.2
Weighted Linear Regression
- Trained under two scenarios: model 4.1.3.1 and model 4.1.3.2
Ridge Regression
- Hyperparameter tuning for value alpha
Regression Tree
- Hyperparameter tuning for max_depth, min_samples_leaf , min_samples_split , max_features

Clustering models used

TS k-means Clustering
- For two different models : Clustering 4.2.1.1 using the average booking patterns and Clustering 4.2.1.2 using the booking patterns aggregated in pairs.
K-means clustering
- For Clustering 4.2.2.1 : Using global Features extracted from the booking patterns

Packages to include

List of required Python packages:

pandas
seaborn
matplotlib
numpy
scikit-learn
tqdm
tslearn
scipy
statsmodels
fastdtw

To install these packages, look at requirements.txt file with these dependencies and use pip install -r requirements.txt to install them.

Conclusions

Clustering is a valuable method in booking pattern forecasting.
Ensemble models that utilized clustering outperformed all other models, including the benchmark model, which was not always outperformed by the general models.
Clustering booking patterns is effective for grouping similar booking behaviors due to its unsupervised nature.
The best performing clustering method was the TS k-means algorithm.
Linear regression-based models generally outperformed non-linear ones.

Contact

Created by @annitziak - feel free to contact me!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LEVERAGING CLUSTERING FOR BOOKING PATTERN FORECASTING: A CASE STUDY

Table of Contents

Introduction

Navigation in Repository

File Descriptions

Description of the dataset

Forecasting models used

Clustering models used

Packages to include

Conclusions

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
code		code
visualizations		visualizations
README.md		README.md
cheatsheet.md		cheatsheet.md
final_results_tables.md		final_results_tables.md
final_thesis_sunweb.pdf		final_thesis_sunweb.pdf
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

LEVERAGING CLUSTERING FOR BOOKING PATTERN FORECASTING: A CASE STUDY

Table of Contents

Introduction

Navigation in Repository

File Descriptions

Description of the dataset

Forecasting models used

Clustering models used

Packages to include

Conclusions

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages