-
Notifications
You must be signed in to change notification settings - Fork 2
Flight_Data_Treatment
As mentioned in the Caveats of Basic Open Sky Data Query, the data from OpenSky is filled with the previous value if not available for a given seconds
One treatment of the data that is done is to simply go over values of lat and lon, and if there's a repeat, simply delete that data point, which will yield more accurate interpolations in the future.
When attempting to categorize different flight paths, tSNE constantly revealed two clusters separate from the rest, as can be seen below
Looking at flight paths corresponding to those clusters revealed that they were flights that, more often then not, "started late" or "ended early", as can be seen below on the plots for Cluster 2 and Cluster 3.
To address this issue, the coordinate-distance-thresh attribute was added to the config.yml in order to set a minimum threshold for the starting airport. This is further evident when looking at distributions of starting and ending longitudes and latitudes for the clusters
A good value for it can be inferred from the table below, showing the standard deviation of starting and ending longitude and latitude for the clusters.
A threshold of ~$3\sigma$ of the valid clusters (not 2 and 3) should suffice for most cases. So a default value of coordinate-distance-thresh = 0.12 should work out well. That is, if the flight had a latitude or longitude greater than that value from the airport's nominal coordinates, it's valid.
Project Documentation
- Home
- Tutorials and Notes
- Low Level Tutorials (External Resources)
- High Level Tutorials (Internal Resources)
- Notes
- Not Used
Project Meetings