Repository for the paper: "Urban mobility and learning: analyzing the influence of commuting time on students' GPA at Politecnico di Milano"
Published in Studies in Higher Education – DOI: 10.1080/03075079.2024.2374005
Authors: Arianna Burzacchi*, Lidia Rossi*, Tommaso Agasisti, Anna Maria Paganoni, Simone Vantini
* Joint first authorship
This repository contains the R code used to conduct the analysis described in the paper. The study investigates how commuting time affects the academic performance (GPA) of first-year engineering students at Politecnico di Milano.
The research follows a two-step pipeline:
-
Commuting Time Estimation
Accessibility maps of Milan are created using GPS data and Kernel Regression Estimation to estimate travel times from home to campus. -
Causal Impact Analysis
A causal inference framework is applied to evaluate the effect of commuting time on GPA using multilevel polynomial mixed-effects models and balancing weighting methods (e.g., Entropy Balancing).
- Data-driven commuting time estimation from smartphone GPS data
- Multilevel causal modeling with mixed-effect polynomial regression
- Application of balancing methods (CBPS, Entropy Balancing, IPW)
- Emphasis on reproducibility and methodological rigor
mobility-and-learning/
├── README.md
├── LICENSE
├── data/ # Input data (NOT included)
│ ├── mobility/
│ └── learning/
├── scripts/ # All R scripts used in the analysis
│ ├── .R
├── output/ # Output figures, tables, and model results (NOT included)
├── figures/ # Selected plots used in the paper
├── paper/ # Pre-print PDF
├── install.R # Script to install required packages
└── .gitignore
- R version ≥ 4.1.0
- Required packages:
sfnpfieldslme4lmerTestWeightItmarginaleffectsggplot2dplyrtidyr
Install them all with:
source("install.R")git clone https://github.com/araiari/mobility-and-learning.gitThis project uses two datasets:
-
GPS Mobility Data (from Cuebiq Inc.):
Used to estimate commuting time. GDPR-compliant and anonymized.
Not included in this repo. To request access: cuebiq.com/data-for-good -
Student Data (from Politecnico di Milano):
Includes GPA, gender, age, income level, high school background, and bachelor program.
Pseudonymized, not publicly available due to privacy restrictions.
Note: Place both datasets in the
data/folder with filenames expected by the scripts.
Open scripts/ in RStudio and execute the scripts in order.
Each script is modular and can be adapted to new datasets or extended for future research.
This repository is licensed under the MIT License.
See the LICENSE file for full terms.