- Introduction
- Navigation in Repository
- File Descriptions
- Description of the dataset
- Forecasting models used
- Clustering models used
- Packages to include
- Conclusions
- Contact
This thesis project focuses on evaluating various models for forecasting booking patterns, utilizing data from Sunweb, a leading holiday package provider in Europe. The core objective is to determine the effectiveness of incorporating clustering into forecasting models. The rationale behind clustering is straightforward: by training models at the cluster level, where more homogeneous booking patterns exist, models can better capture underlying trends. By conducting a comparative analysis, I aim to assess the value added by clustering compared to traditional forecasting models. Therefore, this study will evaluate the performance of ensemble models that integrate information from clustering alongside other forecasting methods and will be compared to a benchmark model.
Research Question: To what extent can clustering techniques improve the accuracy of booking forecasts for holiday packages?
├── code/
│ ├── forecasting_models.ipynb/
│ │ ├── Data preparation
│ │ ├── Model definition
│ │ ├── Clustering
│ │ ├── Ensemble models
| | └── Evaluation
│ └── clustering.ipynb/
│ ├── Data preparation
│ ├── Clustering
│ └── Understand Clustering
├── visualizations/
│ ├── visualisations.ipynb
│ └── figures(13)
├── README.md
├── cheatsheet.md
├── requirements.txt
└── final_results_tables.md
forecasting_models.ipynb: includes all the code for all the forecasting models (1 benchmark model, 4 other forecasting models, 6 ensemble models). This includes hyperparameter tuning, model selection, and evaluationclustering.ipynb: includes all code for clustering optimization, visualisations and insights generated. (3 clustering models)visualisations.ipynb: includes all code to reproduce all figures used in this thesis
The dataset used in this analysis consists of cross-sectional data indexed by every holiday package’s departure week. For each departure week, the dataset includes information on the destination airport, destination country, and the number of passengers who booked a holiday package during a specific booking week. Additionally, for the booking weeks present in the dataset, the corresponding revenue and margin generated by Sunweb is captured.
- Linear Regression
- Trained under two scenarios: model 4.1.2.1 and model 4.1.2.2
- Weighted Linear Regression
- Trained under two scenarios: model 4.1.3.1 and model 4.1.3.2
- Ridge Regression
- Hyperparameter tuning for value
alpha
- Hyperparameter tuning for value
- Regression Tree
- Hyperparameter tuning for
max_depth,min_samples_leaf,min_samples_split,max_features
- Hyperparameter tuning for
- TS k-means Clustering
- For two different models : Clustering 4.2.1.1 using the average booking patterns and Clustering 4.2.1.2 using the booking patterns aggregated in pairs.
- K-means clustering
- For Clustering 4.2.2.1 : Using global Features extracted from the booking patterns
List of required Python packages:
- pandas
- seaborn
- matplotlib
- numpy
- scikit-learn
- tqdm
- tslearn
- scipy
- statsmodels
- fastdtw
To install these packages, look at requirements.txt file with these dependencies and use pip install -r requirements.txt to install them.
- Clustering is a valuable method in booking pattern forecasting.
- Ensemble models that utilized clustering outperformed all other models, including the benchmark model, which was not always outperformed by the general models.
- Clustering booking patterns is effective for grouping similar booking behaviors due to its unsupervised nature.
- The best performing clustering method was the TS k-means algorithm.
- Linear regression-based models generally outperformed non-linear ones.
Created by @annitziak - feel free to contact me!