Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 25 additions & 2 deletions rt_predictions/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ Accurate and reliable information should be provided to transit users for journe
* prediction inconsistency - how much are predictions changing from minute to minute as the bus approaches time of arrival?
* prediction reliability and accuracy - are these predictions accurate (when compared to our estimated actual time of arrival)?


## Reliable Prediction Accuracy

The prediction is considered **accurate** if it falls within the bounds of this equation: `-60ln(Time to Prediction+1.3) < Prediction Error < 60ln(Time to Prediction+1.5)`.
Expand All @@ -36,21 +35,45 @@ As the bus approaches each stop, the software is making predictions for when the
* follow the prediction, you will **miss** the bus...this is very bad!
* we want fewer of these kinds of predictions, and would much rather wait for the bus than to miss it

### Reliable Prediction Accuracy Metrics in Report

| Goal | Metric Columns |
|---|---|
| Bus Catch Likelihood<br>75%+ of predictions result in catching the bus | Bus Catch Likelihood<br>% Early / On-Time / Late Predictions |
| Prediction Error<br>Closer to zero, small positive values.<br>Late predictions = negative values = riders miss bus. | Average prediction error (minutes) |
| Prediction Error Variability<br>Variability is the interquartile range (IQR = 75th - 25th percentile).<br>Smaller values = better = more consistent experience for riders using app.<br><br>Ex1: 25th percentile = -5 minutes = a quarter of riders get predictions that are 5 or more minutes late.<br>Ex2: 75th percentile = 3 minutes = a quarter of riders get predictions that are 3 or more minutes early.<br>Ex3: half of riders get predictions between 5 minutes late and 3 minutes early. | 10th, 20th, ..., 90th percentiles<br>Variability = IQR = 75th percentile - 25th percentile<br>Accuracy Loss = 10th percentile / 50th percentile |

## Availability and Completeness of Predictions

* This metric is the easiest to achieve. For starters, having information is better than no information.
* For each instance of scheduled stop arrival, there is complete information if there are at least 2 predictions each minute.
* It measures the completeness _within_ the RT data we are capturing, regardless of coverage gaps _across_ dates.
* For each instance of scheduled stop arrival, RT information is complete with at least 2 predictions each minute (every 30 seconds).
* For the 30 minute period before the bus arrives at each stop, each minute is an observation that goes into this calculation (up to 30 observations).
* This ensures that we have fairly equal number of observations for each stop and can compare across stops.
* We want to avoid having 30 minutes of predictions for the 1st stop and 60 minutes of predictions for the last stop and comparing metrics that have different denominators.

### Availability and Completeness Metrics in Report

| Goal | Metric Columns |
|---|---|
| 2+ vehicle positions and trip updates messages per minute. | [Trip Updates / Vehicle Positions] Messages per Minute |
| 100% routes are covered by RT, and 75%+ of trips have RT.<br><br>Out of scheduled trips, how many trips have RT, regardless of completeness?<br>Out of scheduled routes, how many routes have at least 1 trip with RT? | [Trip Updates / Vehicle Positions] % Trips, <br>[Trip Updates / Vehicle Positions] % Routes |
| 90%+ of minutes has predicted arrival information.<br><br>How many minutes have at least 2+ messages, in the 30 minutes before the bus arrives? | % Minutes with 2+ Predictions |

## Prediction Inconsistency

* This metric (also called jitter or wobble) captures another aspect of transit user experience. Any change in prediction is counted, so this metric **only has positive values**, but smaller positive values are better.
* If the prediction is changing from minute to minute, a large spread would show up.
* If the prediction is fairly consistent, we would see small spread.
* There is [research](https://www.sciencedirect.com/science/article/abs/pii/S0965856416303494) around how transit users perceive wait time, and that users perceive longer wait times than what is actually experienced. Decreasing the perceived wait time by providing real-time information has positive benefits for user experience.

### Prediction Inconsistency Metrics in Report

| Goal | Metric Columns |
|---|---|
| Less wobbly or jittery predictions, to a point.<br><br>Real-time predictions should reflect traffic conditions and convey<br>updated information to riders, so aiming for zero is not the goal.<br><br>Higher = predictions change more = worse rider experience.<br><br>Lower = predictions are not fluctuating minute to minute = <br>riders trust the real-time arrival information. | Prediction Spread (minutes) |
| Lower padding = riders add less time to prevent missing the bus.<br><br>Riders adjust their behavior to catch the bus, and add time to adjust <br>for receiving late predictions.<br><br>Late predictions (negative prediction error values) become the <br>*time a rider adds to make sure they don't miss the bus next time*, <br>signaling a lack of trust with the information. | Prediction Padding (minutes)<br>Absolute value of the 5th percentile prediction error. |

## Master Services Agreement
Exhibit H definitions (pg 53 on pdf)

Expand Down
14 changes: 9 additions & 5 deletions rt_predictions/chart_utils_for_operators.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ def basic_percentiles_line_chart(
"""
chart = (
alt.Chart(df)
.mark_line(point=True)
.mark_line(point=True, interpolate="natural") # this one seems to smooth out the curves
.encode(
x=alt.X(x_col, title="Prediction Error (minutes)"),
y=alt.Y("percentile", title="Percentiles", scale=alt.Scale(domain=[0, 100])),
Expand All @@ -36,7 +36,7 @@ def basic_percentiles_line_chart(
return chart


def fig5and6_prediction_error_plots(df: pd.DataFrame) -> alt.Chart:
def fig5and6_prediction_error_plots(df: pd.DataFrame, color_col: str = "day_type") -> alt.Chart:
"""
Negative and positive prediction error plots are combined side-by-side as 1 chart.

Expand All @@ -47,14 +47,18 @@ def fig5and6_prediction_error_plots(df: pd.DataFrame) -> alt.Chart:
Instead of [10, 20, ....90] for percentiles, it should show [90, 80, ...10].
"""
# Make legend selectable
selection = alt.selection_point(fields=["day_type"], bind="legend")
selection = alt.selection_point(fields=[color_col], bind="legend")

neg_errors_chart = basic_percentiles_line_chart(df, x_col="neg_prediction_error_minutes").encode(
neg_errors_chart = basic_percentiles_line_chart(
df, x_col="neg_prediction_error_minutes", color_col=color_col
).encode(
opacity=alt.when(selection).then(alt.value(1)).otherwise(alt.value(0.2)),
strokeWidth=alt.when(selection).then(alt.value(2)).otherwise(alt.value(1)),
)

pos_errors_chart = basic_percentiles_line_chart(df, x_col="pos_prediction_error_minutes").encode(
pos_errors_chart = basic_percentiles_line_chart(
df, x_col="pos_prediction_error_minutes", color_col=color_col
).encode(
opacity=alt.when(selection).then(alt.value(1)).otherwise(alt.value(0.2)),
strokeWidth=alt.when(selection).then(alt.value(2)).otherwise(alt.value(1)),
)
Expand Down
81 changes: 69 additions & 12 deletions rt_predictions/operator_report.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -96,8 +96,8 @@
" \"n_stops\", \"num_stop_times\", \"daily_arrivals\", 'n_days_schedule_and_rt'\n",
"]\n",
"\n",
"vp_cols = ['vp_messages_per_minute', 'pct_vp_trips', 'pct_vp_routes'] #'daily_vp_trips'\n",
"tu_cols =['tu_messages_per_minute', 'pct_tu_trips', 'pct_tu_routes'] #'daily_tu_trips',\n",
"vp_cols = ['vp_messages_per_minute', 'pct_vp_trips', 'pct_vp_routes'] \n",
"tu_cols =['tu_messages_per_minute', 'pct_tu_trips', 'pct_tu_routes']\n",
"\n",
"tu_prediction_cols = [\n",
" \"bus_catch_likelihood\", \"pct_tu_complete_minutes\", # both are percents\n",
Expand Down Expand Up @@ -232,9 +232,19 @@
"source": [
"## General RT Metrics\n",
"\n",
"Vehicle positions and trip updates are distinct RT data sources, and each can be paired with GTFS schedule data. \n",
"\n",
"The metrics from the schedule-RT pairing include and % of schedule trips with vehicle positions and % of scheduled trips with trip updates. These are calculated the same way across both RT data sources.\n",
"\n",
"<span style=\"color:#4477aa\">**Update Availability Goal 1:** 2+ vehicle positions or trip updates messages per minute.</span>\n",
"\n",
"<span style=\"color:#4477aa\">**Update Availability Goal 2:** 100% routes are covered by RT, and 75%+ of trips have RT.</span>\n"
"Vehicle positions or trip updates per minute is a measure of completeness *within* the RT data we are capturing, regardless of coverage gaps *across* dates.\n",
"\n",
"<span style=\"color:#4477aa\">**Update Availability Goal 2:** 75%+ of trips have RT and 100% routes are covered by RT.</span>\n",
"\n",
"Out of scheduled trips, how many trips have RT, regardless of completeness? If the trip appeared in RT trip updates with at least 1 message, the trip counts as having RT trip updates (similarly for vehicle positions).\n",
"\n",
"Out of scheduled routes, how many routes have at least 1 trip with RT? If at least 1 trip for that route had RT trip updates, that route counts as having RT trip updates (similarly for vehicle positions)."
]
},
{
Expand Down Expand Up @@ -289,10 +299,14 @@
"source": [
"## Prediction Accuracy Metrics\n",
"\n",
"<span style=\"color:#4477aa\">**Update Availability Goal:** 90%+ of minutes has predicted arrival information.</span>\n",
"These metrics are derived entirely from RT trip updates (no comparison with schedule data).\n",
"\n",
"<span style=\"color:#4477aa\">**Update Availability Goal 3:** 90%+ of minutes has predicted arrival information.</span>\n",
"\n",
"<span style=\"color:#4477aa\">**Bus Catch Likelihood Goal:** 75%+ of predictions result in catching the bus.</span>\n",
"\n",
"On-time predictions use [this definition](https://analysis.dds.dot.ca.gov/rt_operator_metrics/#reliable-prediction-accuracy), where predictions 5 minutes before must fall within narrower bounds to be considered on-time, compared to predictions 30 minutes out.\n",
"\n",
"<span style=\"color:#4477aa\">**Prediction Error Goal:** Closer to zero or smaller positive values (early predictions). Late predictions = negative values = riders miss bus</span>\n"
]
},
Expand All @@ -316,8 +330,10 @@
" ).fmt_percent(columns=[\"bus_catch_likelihood\", \"pct_tu_complete_minutes\"], decimals=1)\n",
" .fmt_number(columns=[\"p50\", \"avg_prediction_spread_minutes\", \"prediction_padding_minutes\"], decimals=1)\n",
" .fmt_integer(columns=[\"n_predictions\"])\n",
" .tab_header(title = f\"Trip Update Prediction Accuracy Metrics\", \n",
" subtitle = \"units are in minutes\")\n",
" .tab_header(\n",
" title = f\"Trip Update Prediction Accuracy Metrics\", \n",
" subtitle = \"units are in minutes\"\n",
" )\n",
").pipe(chart_utils.format_great_table)\n",
"\n",
"table.pipe(\n",
Expand All @@ -332,7 +348,7 @@
" font_size=8\n",
").pipe(\n",
" gte.gt_hulk_col_numeric, \n",
" columns=[\"bus_catch_likelihood\", \"pct_tu_complete_minutes\"],\n",
" columns=[\"pct_tu_complete_minutes\", \"bus_catch_likelihood\"],\n",
" palette=[_color_palette.get_color(\"light_goldenrod\"), \n",
" _color_palette.get_color(\"pastel_peppermint\")],\n",
" domain=[0, 1],\n",
Expand All @@ -350,11 +366,50 @@
" \"n_predictions\", \"p50\"\n",
" ]\n",
" }\n",
").cols_move_to_end(columns=[\"n_predictions\"])\n",
").cols_move_to_end(\n",
" columns=[\"n_predictions\"]\n",
").cols_move_to_start(\n",
" columns=[\"pct_tu_complete_minutes\"]\n",
")\n",
"#.pipe(gte.gt_color_box, columns=[\"iqr\"], palette=\"YlOrRd\"), \n",
"# maybe IQR doesn't make sense to color, it'll just be ranked by day_type"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b3950f3e-2e99-4da9-bc2d-78f61ab2806b",
"metadata": {},
"outputs": [],
"source": [
"pct_category_df = prep_operator_data.reshape_prediction_category_counts_to_long(df)\n",
"\n",
"category_counts_stacked_bar = alt.Chart(pct_category_df).mark_bar().encode(\n",
" x=alt.X(\n",
" 'day_type',\n",
" title = \"\", sort=[\"Weekday\", \"Saturday\", \"Sunday\"]\n",
" ),\n",
" y=alt.Y('pct', title=\"Percent\"),\n",
" color=alt.Color(\n",
" 'prediction_category:N',\n",
" title=\"Prediction Category\", \n",
" sort=[\"early\", \"ontime\", \"late\"],\n",
" scale=alt.Scale(range=[\n",
" _color_palette.get_color(\"light_cadmium_yellow\"),\n",
" _color_palette.get_color(\"electric_orange\"),\n",
" _color_palette.get_color(\"aquatic\")\n",
" ])\n",
" ),\n",
" column=alt.Column(\"tu_name\", title = \"\"),\n",
" tooltip=[\"tu_name\", \"day_type\", \"prediction_category\", \"pct\"]\n",
").interactive().properties(\n",
" title = \"Predictions by Category\",\n",
" width=150, height = 200\n",
")\n",
"\n",
"category_counts_stacked_bar"
]
},
{
"cell_type": "markdown",
"id": "3033b010-f091-4a32-8df1-38dc7d2fd3af",
Expand Down Expand Up @@ -609,9 +664,11 @@
"source": [
"## Route Summary\n",
"\n",
"Prediction accuracy varies by routes. The routes shown at the top have high variability (high IQRs).\n",
"Prediction accuracy varies by routes. The routes shown at the top have high variability. \n",
"\n",
"Variability is measured by the interquartile range (IQR), which is the difference between the 75th percentile and 25th percentile prediction errors.\n",
"\n",
"* **High variability = high IQRs**: local traffic conditions mat confound the prediction algorithm. For these routes, a focus on improving service reliability through additional infrastructure (signal priority, bus lanes), or other transit planning and policies could be explored.\n",
"* **High variability = high IQRs**: local traffic conditions may confound the prediction algorithm. For these routes, a focus on improving service reliability through additional infrastructure (signal priority, bus lanes), or other transit planning and policies could be explored.\n",
"\n",
"* **Negative 25th percentiles**: riders miss the bus (late predictions). These routes may benefit from service reliability improvements for riders.\n",
"\n",
Expand Down Expand Up @@ -722,9 +779,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "Pyproject Local (use-venv)",
"language": "python",
"name": "python3"
"name": "pyproject_local_kernel_use_venv"
},
"language_info": {
"codemirror_mode": {
Expand Down
57 changes: 51 additions & 6 deletions rt_predictions/operator_report.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -63,8 +63,8 @@ schedule_cols = [
"n_stops", "num_stop_times", "daily_arrivals", 'n_days_schedule_and_rt'
]

vp_cols = ['vp_messages_per_minute', 'pct_vp_trips', 'pct_vp_routes'] #'daily_vp_trips'
tu_cols =['tu_messages_per_minute', 'pct_tu_trips', 'pct_tu_routes'] #'daily_tu_trips',
vp_cols = ['vp_messages_per_minute', 'pct_vp_trips', 'pct_vp_routes']
tu_cols =['tu_messages_per_minute', 'pct_tu_trips', 'pct_tu_routes']

tu_prediction_cols = [
"bus_catch_likelihood", "pct_tu_complete_minutes", # both are percents
Expand Down Expand Up @@ -164,9 +164,19 @@ chart_utils.format_great_table(schedule_table)

## General RT Metrics

Vehicle positions and trip updates are distinct RT data sources, and each can be paired with GTFS schedule data.

The metrics from the schedule-RT pairing include and % of schedule trips with vehicle positions and % of scheduled trips with trip updates. These are calculated the same way across both RT data sources.

<span style="color:#4477aa">**Update Availability Goal 1:** 2+ vehicle positions or trip updates messages per minute.</span>

<span style="color:#4477aa">**Update Availability Goal 2:** 100% routes are covered by RT, and 75%+ of trips have RT.</span>
Vehicle positions or trip updates per minute is a measure of completeness *within* the RT data we are capturing, regardless of coverage gaps *across* dates.

<span style="color:#4477aa">**Update Availability Goal 2:** 75%+ of trips have RT and 100% routes are covered by RT.</span>

Out of scheduled trips, how many trips have RT, regardless of completeness? If the trip appeared in RT trip updates with at least 1 message, the trip counts as having RT trip updates (similarly for vehicle positions).

Out of scheduled routes, how many routes have at least 1 trip with RT? If at least 1 trip for that route had RT trip updates, that route counts as having RT trip updates (similarly for vehicle positions).

```{python}
rt_table = (
Expand Down Expand Up @@ -209,10 +219,14 @@ chart_utils.format_great_table(rt_table, day_type_grouping = True).pipe(

## Prediction Accuracy Metrics

<span style="color:#4477aa">**Update Availability Goal:** 90%+ of minutes has predicted arrival information.</span>
These metrics are derived entirely from RT trip updates (no comparison with schedule data).

<span style="color:#4477aa">**Update Availability Goal 3:** 90%+ of minutes has predicted arrival information.</span>

<span style="color:#4477aa">**Bus Catch Likelihood Goal:** 75%+ of predictions result in catching the bus.</span>

On-time predictions use [this definition](https://analysis.dds.dot.ca.gov/rt_operator_metrics/#reliable-prediction-accuracy), where predictions 5 minutes before must fall within narrower bounds to be considered on-time, compared to predictions 30 minutes out.

<span style="color:#4477aa">**Prediction Error Goal:** Closer to zero or smaller positive values (early predictions). Late predictions = negative values = riders miss bus</span>

```{python}
Expand Down Expand Up @@ -268,6 +282,35 @@ table.pipe(
# maybe IQR doesn't make sense to color, it'll just be ranked by day_type
```

```{python}
pct_category_df = prep_operator_data.reshape_prediction_category_counts_to_long(df)

category_counts_stacked_bar = alt.Chart(pct_category_df).mark_bar().encode(
x=alt.X(
'day_type',
title = "", sort=["Weekday", "Saturday", "Sunday"]
),
y=alt.Y('pct', title="Percent"),
color=alt.Color(
'prediction_category:N',
title="Prediction Category",
sort=["early", "ontime", "late"],
scale=alt.Scale(range=[
_color_palette.get_color("light_cadmium_yellow"),
_color_palette.get_color("electric_orange"),
_color_palette.get_color("aquatic")
])
),
column=alt.Column("tu_name", title = ""),
tooltip=["tu_name", "day_type", "prediction_category", "pct"]
).interactive().properties(
title = "Predictions by Category",
width=150, height = 200
)

category_counts_stacked_bar
```

## Prediction Error Percentiles

### Distribution of Prediction Errors
Expand Down Expand Up @@ -451,9 +494,11 @@ m

## Route Summary

Prediction accuracy varies by routes. The routes shown at the top have high variability (high IQRs).
Prediction accuracy varies by routes. The routes shown at the top have high variability.

Variability is measured by the interquartile range (IQR), which is the difference between the 75th percentile and 25th percentile prediction errors.

* **High variability = high IQRs**: local traffic conditions mat confound the prediction algorithm. For these routes, a focus on improving service reliability through additional infrastructure (signal priority, bus lanes), or other transit planning and policies could be explored.
* **High variability = high IQRs**: local traffic conditions may confound the prediction algorithm. For these routes, a focus on improving service reliability through additional infrastructure (signal priority, bus lanes), or other transit planning and policies could be explored.

* **Negative 25th percentiles**: riders miss the bus (late predictions). These routes may benefit from service reliability improvements for riders.

Expand Down
Loading
Loading