cal-itp · tiffanychu90 · May 8, 2026 · May 1, 2026 · May 1, 2026 · May 1, 2026
diff --git a/rt_predictions/README.md b/rt_predictions/README.md
@@ -11,7 +11,6 @@ Accurate and reliable information should be provided to transit users for journe
 * prediction inconsistency - how much are predictions changing from minute to minute as the bus approaches time of arrival?
 * prediction reliability and accuracy - are these predictions accurate (when compared to our estimated actual time of arrival)?
 
-
 ## Reliable Prediction Accuracy
 
 The prediction is considered **accurate** if it falls within the bounds of this equation: `-60ln(Time to Prediction+1.3) < Prediction Error < 60ln(Time to Prediction+1.5)`.
@@ -36,21 +35,45 @@ As the bus approaches each stop, the software is making predictions for when the
    * follow the prediction, you will **miss** the bus...this is very bad!
    * we want fewer of these kinds of predictions, and would much rather wait for the bus than to miss it
 
+### Reliable Prediction Accuracy Metrics in Report
+
+| Goal | Metric Columns |
+|---|---|
+| Bus Catch Likelihood<br>75%+ of predictions result in catching the bus | Bus Catch Likelihood<br>% Early / On-Time / Late Predictions |
+| Prediction Error<br>Closer to zero, small positive values.<br>Late predictions = negative values = riders miss bus. | Average prediction error (minutes) |
+| Prediction Error Variability<br>Variability is the interquartile range (IQR = 75th - 25th percentile).<br>Smaller values = better = more consistent experience for riders using app.<br><br>Ex1: 25th percentile = -5 minutes = a quarter of riders get predictions that are 5 or more minutes late.<br>Ex2: 75th percentile = 3 minutes = a quarter of riders get predictions that are 3 or more minutes early.<br>Ex3: half of riders get predictions between 5 minutes late and 3 minutes early. | 10th, 20th, ..., 90th percentiles<br>Variability = IQR = 75th percentile - 25th percentile<br>Accuracy Loss = 10th percentile / 50th percentile |
+
 ## Availability and Completeness of Predictions
 
 * This metric is the easiest to achieve. For starters, having information is better than no information.
-* For each instance of scheduled stop arrival, there is complete information if there are at least 2 predictions each minute.
+* It measures the completeness _within_ the RT data we are capturing, regardless of coverage gaps _across_ dates.
+* For each instance of scheduled stop arrival, RT information is complete with at least 2 predictions each minute (every 30 seconds).
 * For the 30 minute period before the bus arrives at each stop, each minute is an observation that goes into this calculation (up to 30 observations).
 * This ensures that we have fairly equal number of observations for each stop and can compare across stops.
    * We want to avoid having 30 minutes of predictions for the 1st stop and 60 minutes of predictions for the last stop and comparing metrics that have different denominators.
 
+### Availability and Completeness Metrics in Report
+
+| Goal | Metric Columns |
+|---|---|
+| 2+ vehicle positions and trip updates messages per minute. | [Trip Updates / Vehicle Positions] Messages per Minute |
+| 100% routes are covered by RT, and 75%+ of trips have RT.<br><br>Out of scheduled trips, how many trips have RT, regardless of completeness?<br>Out of scheduled routes, how many routes have at least 1 trip with RT? | [Trip Updates / Vehicle Positions] % Trips, <br>[Trip Updates / Vehicle Positions] % Routes |
+| 90%+ of minutes has predicted arrival information.<br><br>How many minutes have at least 2+ messages, in the 30 minutes before the bus arrives? | % Minutes with 2+ Predictions |
+
 ## Prediction Inconsistency
 
 * This metric (also called jitter or wobble) captures another aspect of transit user experience. Any change in prediction is counted, so this metric **only has positive values**, but smaller positive values are better.
    * If the prediction is changing from minute to minute, a large spread would show up.
    * If the prediction is fairly consistent, we would see small spread.
 * There is [research](https://www.sciencedirect.com/science/article/abs/pii/S0965856416303494) around how transit users perceive wait time, and that users perceive longer wait times than what is actually experienced. Decreasing the perceived wait time by providing real-time information has positive benefits for user experience.
 
+### Prediction Inconsistency Metrics in Report
+
+| Goal | Metric Columns |
+|---|---|
+| Less wobbly or jittery predictions, to a point.<br><br>Real-time predictions should reflect traffic conditions and convey<br>updated information to riders, so aiming for zero is not the goal.<br><br>Higher = predictions change more = worse rider experience.<br><br>Lower = predictions are not fluctuating minute to minute = <br>riders trust the real-time arrival information. | Prediction Spread (minutes) |
+| Lower padding = riders add less time to prevent missing the bus.<br><br>Riders adjust their behavior to catch the bus, and add time to adjust <br>for receiving late predictions.<br><br>Late predictions (negative prediction error values) become the <br>*time a rider adds to make sure they don't miss the bus next time*, <br>signaling a lack of trust with the information. | Prediction Padding (minutes)<br>Absolute value of the 5th percentile prediction error.  |
+
 ## Master Services Agreement
 Exhibit H definitions (pg 53 on pdf)
 

diff --git a/rt_predictions/chart_utils_for_operators.py b/rt_predictions/chart_utils_for_operators.py
@@ -20,7 +20,7 @@ def basic_percentiles_line_chart(
     """
     chart = (
         alt.Chart(df)
-        .mark_line(point=True)
+        .mark_line(point=True, interpolate="natural")  # this one seems to smooth out the curves
         .encode(
             x=alt.X(x_col, title="Prediction Error (minutes)"),
             y=alt.Y("percentile", title="Percentiles", scale=alt.Scale(domain=[0, 100])),
@@ -36,7 +36,7 @@ def basic_percentiles_line_chart(
     return chart
 
 
-def fig5and6_prediction_error_plots(df: pd.DataFrame) -> alt.Chart:
+def fig5and6_prediction_error_plots(df: pd.DataFrame, color_col: str = "day_type") -> alt.Chart:
     """
     Negative and positive prediction error plots are combined side-by-side as 1 chart.
 
@@ -47,14 +47,18 @@ def fig5and6_prediction_error_plots(df: pd.DataFrame) -> alt.Chart:
     Instead of [10, 20, ....90] for percentiles, it should show [90, 80, ...10].
     """
     # Make legend selectable
-    selection = alt.selection_point(fields=["day_type"], bind="legend")
+    selection = alt.selection_point(fields=[color_col], bind="legend")
 
-    neg_errors_chart = basic_percentiles_line_chart(df, x_col="neg_prediction_error_minutes").encode(
+    neg_errors_chart = basic_percentiles_line_chart(
+        df, x_col="neg_prediction_error_minutes", color_col=color_col
+    ).encode(
         opacity=alt.when(selection).then(alt.value(1)).otherwise(alt.value(0.2)),
         strokeWidth=alt.when(selection).then(alt.value(2)).otherwise(alt.value(1)),
     )
 
-    pos_errors_chart = basic_percentiles_line_chart(df, x_col="pos_prediction_error_minutes").encode(
+    pos_errors_chart = basic_percentiles_line_chart(
+        df, x_col="pos_prediction_error_minutes", color_col=color_col
+    ).encode(
         opacity=alt.when(selection).then(alt.value(1)).otherwise(alt.value(0.2)),
         strokeWidth=alt.when(selection).then(alt.value(2)).otherwise(alt.value(1)),
     )

diff --git a/rt_predictions/operator_report.ipynb b/rt_predictions/operator_report.ipynb
@@ -96,8 +96,8 @@
     "    \"n_stops\", \"num_stop_times\", \"daily_arrivals\", 'n_days_schedule_and_rt'\n",
     "]\n",
     "\n",
-    "vp_cols = ['vp_messages_per_minute', 'pct_vp_trips', 'pct_vp_routes'] #'daily_vp_trips'\n",
-    "tu_cols =['tu_messages_per_minute', 'pct_tu_trips', 'pct_tu_routes'] #'daily_tu_trips',\n",
+    "vp_cols = ['vp_messages_per_minute', 'pct_vp_trips', 'pct_vp_routes'] \n",
+    "tu_cols =['tu_messages_per_minute', 'pct_tu_trips', 'pct_tu_routes']\n",
     "\n",
     "tu_prediction_cols = [\n",
     "    \"bus_catch_likelihood\", \"pct_tu_complete_minutes\", # both are percents\n",
@@ -232,9 +232,19 @@
    "source": [
     "## General RT Metrics\n",
     "\n",
+    "Vehicle positions and trip updates are distinct RT data sources, and each can be paired with GTFS schedule data. \n",
+    "\n",
+    "The metrics from the schedule-RT pairing include and % of schedule trips with vehicle positions and % of scheduled trips with trip updates. These are calculated the same way across both RT data sources.\n",
+    "\n",
     "<span style=\"color:#4477aa\">**Update Availability Goal 1:** 2+ vehicle positions or trip updates messages per minute.</span>\n",
     "\n",
-    "<span style=\"color:#4477aa\">**Update Availability Goal 2:** 100% routes are covered by RT, and 75%+ of trips have RT.</span>\n"
+    "Vehicle positions or trip updates per minute is a measure of completeness *within* the RT data we are capturing, regardless of coverage gaps *across* dates.\n",
+    "\n",
+    "<span style=\"color:#4477aa\">**Update Availability Goal 2:** 75%+ of trips have RT and 100% routes are covered by RT.</span>\n",
+    "\n",
+    "Out of scheduled trips, how many trips have RT, regardless of completeness? If the trip appeared in RT trip updates with at least 1 message, the trip counts as having RT trip updates (similarly for vehicle positions).\n",
+    "\n",
+    "Out of scheduled routes, how many routes have at least 1 trip with RT? If at least 1 trip for that route had RT trip updates, that route counts as having RT trip updates (similarly for vehicle positions)."
    ]
   },
   {
@@ -289,10 +299,14 @@
    "source": [
     "## Prediction Accuracy Metrics\n",
     "\n",
-    "<span style=\"color:#4477aa\">**Update Availability Goal:** 90%+ of minutes has predicted arrival information.</span>\n",
+    "These metrics are derived entirely from RT trip updates (no comparison with schedule data).\n",
+    "\n",
+    "<span style=\"color:#4477aa\">**Update Availability Goal 3:** 90%+ of minutes has predicted arrival information.</span>\n",
     "\n",
     "<span style=\"color:#4477aa\">**Bus Catch Likelihood Goal:** 75%+ of predictions result in catching the bus.</span>\n",
     "\n",
+    "On-time predictions use [this definition](https://analysis.dds.dot.ca.gov/rt_operator_metrics/#reliable-prediction-accuracy), where predictions 5 minutes before must fall within narrower bounds to be considered on-time, compared to predictions 30 minutes out.\n",
+    "\n",
     "<span style=\"color:#4477aa\">**Prediction Error Goal:** Closer to zero or smaller positive values (early predictions). Late predictions = negative values = riders miss bus</span>\n"
    ]
   },
@@ -316,8 +330,10 @@
     "    ).fmt_percent(columns=[\"bus_catch_likelihood\", \"pct_tu_complete_minutes\"], decimals=1)\n",
     "    .fmt_number(columns=[\"p50\", \"avg_prediction_spread_minutes\", \"prediction_padding_minutes\"], decimals=1)\n",
     "    .fmt_integer(columns=[\"n_predictions\"])\n",
-    "    .tab_header(title = f\"Trip Update Prediction Accuracy Metrics\", \n",
-    "                subtitle = \"units are in minutes\")\n",
+    "    .tab_header(\n",
+    "        title = f\"Trip Update Prediction Accuracy Metrics\", \n",
+    "        subtitle = \"units are in minutes\"\n",
+    "    )\n",
     ").pipe(chart_utils.format_great_table)\n",
     "\n",
     "table.pipe(\n",
@@ -332,7 +348,7 @@
     "    font_size=8\n",
     ").pipe(\n",
     "    gte.gt_hulk_col_numeric, \n",
-    "    columns=[\"bus_catch_likelihood\", \"pct_tu_complete_minutes\"],\n",
+    "    columns=[\"pct_tu_complete_minutes\", \"bus_catch_likelihood\"],\n",
     "    palette=[_color_palette.get_color(\"light_goldenrod\"), \n",
     "             _color_palette.get_color(\"pastel_peppermint\")],\n",
     "    domain=[0, 1],\n",
@@ -350,11 +366,50 @@
     "            \"n_predictions\", \"p50\"\n",
     "        ]\n",
     "    }\n",
-    ").cols_move_to_end(columns=[\"n_predictions\"])\n",
+    ").cols_move_to_end(\n",
+    "    columns=[\"n_predictions\"]\n",
+    ").cols_move_to_start(\n",
+    "    columns=[\"pct_tu_complete_minutes\"]\n",
+    ")\n",
     "#.pipe(gte.gt_color_box, columns=[\"iqr\"], palette=\"YlOrRd\"), \n",
     "# maybe IQR doesn't make sense to color, it'll just be ranked by day_type"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b3950f3e-2e99-4da9-bc2d-78f61ab2806b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pct_category_df = prep_operator_data.reshape_prediction_category_counts_to_long(df)\n",
+    "\n",
+    "category_counts_stacked_bar = alt.Chart(pct_category_df).mark_bar().encode(\n",
+    "    x=alt.X(\n",
+    "        'day_type',\n",
+    "        title = \"\", sort=[\"Weekday\", \"Saturday\", \"Sunday\"]\n",
+    "    ),\n",
+    "    y=alt.Y('pct', title=\"Percent\"),\n",
+    "    color=alt.Color(\n",
+    "        'prediction_category:N',\n",
+    "        title=\"Prediction Category\", \n",
+    "        sort=[\"early\", \"ontime\", \"late\"],\n",
+    "        scale=alt.Scale(range=[\n",
+    "            _color_palette.get_color(\"light_cadmium_yellow\"),\n",
+    "            _color_palette.get_color(\"electric_orange\"),\n",
+    "            _color_palette.get_color(\"aquatic\")\n",
+    "        ])\n",
+    "    ),\n",
+    "    column=alt.Column(\"tu_name\", title = \"\"),\n",
+    "    tooltip=[\"tu_name\", \"day_type\", \"prediction_category\", \"pct\"]\n",
+    ").interactive().properties(\n",
+    "    title = \"Predictions by Category\",\n",
+    "    width=150, height = 200\n",
+    ")\n",
+    "\n",
+    "category_counts_stacked_bar"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "3033b010-f091-4a32-8df1-38dc7d2fd3af",
@@ -609,9 +664,11 @@
    "source": [
     "## Route Summary\n",
     "\n",
-    "Prediction accuracy varies by routes. The routes shown at the top have high variability (high IQRs).\n",
+    "Prediction accuracy varies by routes. The routes shown at the top have high variability. \n",
+    "\n",
+    "Variability is measured by the interquartile range (IQR), which is the difference between the 75th percentile and 25th percentile prediction errors.\n",
     "\n",
-    "* **High variability = high IQRs**: local traffic conditions mat confound the prediction algorithm. For these routes, a focus on improving service reliability through additional infrastructure (signal priority, bus lanes), or other transit planning and policies could be explored.\n",
+    "* **High variability = high IQRs**: local traffic conditions may confound the prediction algorithm. For these routes, a focus on improving service reliability through additional infrastructure (signal priority, bus lanes), or other transit planning and policies could be explored.\n",
     "\n",
     "* **Negative 25th percentiles**: riders miss the bus (late predictions). These routes may benefit from service reliability improvements for riders.\n",
     "\n",
@@ -722,9 +779,9 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "Pyproject Local (use-venv)",
    "language": "python",
-   "name": "python3"
+   "name": "pyproject_local_kernel_use_venv"
   },
   "language_info": {
    "codemirror_mode": {

diff --git a/rt_predictions/operator_report.qmd b/rt_predictions/operator_report.qmd
@@ -63,8 +63,8 @@ schedule_cols = [
     "n_stops", "num_stop_times", "daily_arrivals", 'n_days_schedule_and_rt'
 ]
 
-vp_cols = ['vp_messages_per_minute', 'pct_vp_trips', 'pct_vp_routes'] #'daily_vp_trips'
-tu_cols =['tu_messages_per_minute', 'pct_tu_trips', 'pct_tu_routes'] #'daily_tu_trips',
+vp_cols = ['vp_messages_per_minute', 'pct_vp_trips', 'pct_vp_routes']
+tu_cols =['tu_messages_per_minute', 'pct_tu_trips', 'pct_tu_routes']
 
 tu_prediction_cols = [
     "bus_catch_likelihood", "pct_tu_complete_minutes", # both are percents
@@ -164,9 +164,19 @@ chart_utils.format_great_table(schedule_table)
 
 ## General RT Metrics
 
+Vehicle positions and trip updates are distinct RT data sources, and each can be paired with GTFS schedule data.
+
+The metrics from the schedule-RT pairing include and % of schedule trips with vehicle positions and % of scheduled trips with trip updates. These are calculated the same way across both RT data sources.
+
 <span style="color:#4477aa">**Update Availability Goal 1:** 2+ vehicle positions or trip updates messages per minute.</span>
 
-<span style="color:#4477aa">**Update Availability Goal 2:** 100% routes are covered by RT, and 75%+ of trips have RT.</span>
+Vehicle positions or trip updates per minute is a measure of completeness *within* the RT data we are capturing, regardless of coverage gaps *across* dates.
+
+<span style="color:#4477aa">**Update Availability Goal 2:** 75%+ of trips have RT and 100% routes are covered by RT.</span>
+
+Out of scheduled trips, how many trips have RT, regardless of completeness? If the trip appeared in RT trip updates with at least 1 message, the trip counts as having RT trip updates (similarly for vehicle positions).
+
+Out of scheduled routes, how many routes have at least 1 trip with RT? If at least 1 trip for that route had RT trip updates, that route counts as having RT trip updates (similarly for vehicle positions).
 
 ```{python}
 rt_table = (
@@ -209,10 +219,14 @@ chart_utils.format_great_table(rt_table, day_type_grouping = True).pipe(
 
 ## Prediction Accuracy Metrics
 
-<span style="color:#4477aa">**Update Availability Goal:** 90%+ of minutes has predicted arrival information.</span>
+These metrics are derived entirely from RT trip updates (no comparison with schedule data).
+
+<span style="color:#4477aa">**Update Availability Goal 3:** 90%+ of minutes has predicted arrival information.</span>
 
 <span style="color:#4477aa">**Bus Catch Likelihood Goal:** 75%+ of predictions result in catching the bus.</span>
 
+On-time predictions use [this definition](https://analysis.dds.dot.ca.gov/rt_operator_metrics/#reliable-prediction-accuracy), where predictions 5 minutes before must fall within narrower bounds to be considered on-time, compared to predictions 30 minutes out.
+
 <span style="color:#4477aa">**Prediction Error Goal:** Closer to zero or smaller positive values (early predictions). Late predictions = negative values = riders miss bus</span>
 
 ```{python}
@@ -268,6 +282,35 @@ table.pipe(
 # maybe IQR doesn't make sense to color, it'll just be ranked by day_type
 ```
 
+```{python}
+pct_category_df = prep_operator_data.reshape_prediction_category_counts_to_long(df)
+
+category_counts_stacked_bar = alt.Chart(pct_category_df).mark_bar().encode(
+    x=alt.X(
+        'day_type',
+        title = "", sort=["Weekday", "Saturday", "Sunday"]
+    ),
+    y=alt.Y('pct', title="Percent"),
+    color=alt.Color(
+        'prediction_category:N',
+        title="Prediction Category",
+        sort=["early", "ontime", "late"],
+        scale=alt.Scale(range=[
+            _color_palette.get_color("light_cadmium_yellow"),
+            _color_palette.get_color("electric_orange"),
+            _color_palette.get_color("aquatic")
+        ])
+    ),
+    column=alt.Column("tu_name", title = ""),
+    tooltip=["tu_name", "day_type", "prediction_category", "pct"]
+).interactive().properties(
+    title = "Predictions by Category",
+    width=150, height = 200
+)
+
+category_counts_stacked_bar
+```
+
 ## Prediction Error Percentiles
 
 ### Distribution of Prediction Errors
@@ -451,9 +494,11 @@ m
 
 ## Route Summary
 
-Prediction accuracy varies by routes. The routes shown at the top have high variability (high IQRs).
+Prediction accuracy varies by routes. The routes shown at the top have high variability.
+
+Variability is measured by the interquartile range (IQR), which is the difference between the 75th percentile and 25th percentile prediction errors.
 
-* **High variability = high IQRs**: local traffic conditions mat confound the prediction algorithm. For these routes, a focus on improving service reliability through additional infrastructure (signal priority, bus lanes), or other transit planning and policies could be explored.
+* **High variability = high IQRs**: local traffic conditions may confound the prediction algorithm. For these routes, a focus on improving service reliability through additional infrastructure (signal priority, bus lanes), or other transit planning and policies could be explored.
 
 * **Negative 25th percentiles**: riders miss the bus (late predictions). These routes may benefit from service reliability improvements for riders.