From fc994148d129f6e8dce5ad399d9d5a03bbbb61f7 Mon Sep 17 00:00:00 2001
From: tiffanychu90 <tiffany.ku@dot.ca.gov>
Date: Fri, 1 May 2026 17:06:11 +0000
Subject: [PATCH 1/5] expand readme with goals from report

---
 rt_predictions/README.md                    | 27 +++++++++++++++++++--
 rt_predictions/chart_utils_for_operators.py | 16 +++++++-----
 2 files changed, 35 insertions(+), 8 deletions(-)
diff --git a/rt_predictions/README.md b/rt_predictions/README.md
index 637f08944..26d12a875 100644
--- a/rt_predictions/README.md
+++ b/rt_predictions/README.md
@@ -11,7 +11,6 @@ Accurate and reliable information should be provided to transit users for journe
 * prediction inconsistency - how much are predictions changing from minute to minute as the bus approaches time of arrival?
 * prediction reliability and accuracy - are these predictions accurate (when compared to our estimated actual time of arrival)?
 
-
 ## Reliable Prediction Accuracy
 
 The prediction is considered **accurate** if it falls within the bounds of this equation: `-60ln(Time to Prediction+1.3) < Prediction Error < 60ln(Time to Prediction+1.5)`.
@@ -36,14 +35,31 @@ As the bus approaches each stop, the software is making predictions for when the
    * follow the prediction, you will **miss** the bus...this is very bad!
    * we want fewer of these kinds of predictions, and would much rather wait for the bus than to miss it
 
+### Reliable Prediction Accuracy Metrics in Report
+
+| Goal | Metric Columns |
+|---|---|
+| Bus Catch Likelihood<br>75%+ of predictions result in catching the bus | Bus Catch Likelihood<br>% Early / On-Time / Late Predictions |
+| Prediction Error<br>Closer to zero, small positive values.<br>Late predictions = negative values = riders miss bus. | Average prediction error (minutes) |
+| Prediction Error Variability<br>Variability is the interquartile range (IQR = 75th - 25th percentile).<br>Smaller values = better = more consistent experience for riders using app.<br><br>Ex1: 25th percentile = -5 minutes = a quarter of riders get predictions that are 5 or more minutes late.<br>Ex2: 75th percentile = 3 minutes = a quarter of riders get predictions that are 3 or more minutes early.<br>Ex3: half of riders get predictions between 5 minutes late and 3 minutes early. | 10th, 20th, ..., 90th percentiles<br>Variability = IQR = 75th percentile - 25th percentile<br>Accuracy Loss = 10th percentile / 50th percentile |
+
 ## Availability and Completeness of Predictions
 
 * This metric is the easiest to achieve. For starters, having information is better than no information.
-* For each instance of scheduled stop arrival, there is complete information if there are at least 2 predictions each minute.
+* It measures the completeness _within_ the RT data we are capturing, regardless of coverage gaps _across_ dates.
+* For each instance of scheduled stop arrival, RT information is complete with at least 2 predictions each minute (every 30 seconds).
 * For the 30 minute period before the bus arrives at each stop, each minute is an observation that goes into this calculation (up to 30 observations).
 * This ensures that we have fairly equal number of observations for each stop and can compare across stops.
    * We want to avoid having 30 minutes of predictions for the 1st stop and 60 minutes of predictions for the last stop and comparing metrics that have different denominators.
 
+### Availability and Completeness Metrics in Report
+
+| Goal | Metric Columns |
+|---|---|
+| 2+ vehicle positions and trip updates messages per minute. | [Trip Updates / Vehicle Positions] Messages per Minute |
+| 100% routes are covered by RT, and 75%+ of trips have RT.<br><br>Out of scheduled trips, how many trips have RT, regardless of completeness?<br>Out of scheduled routes, how many routes have at least 1 trip with RT? | [Trip Updates / Vehicle Positions] % Trips, <br>[Trip Updates / Vehicle Positions] % Routes |
+| 90%+ of minutes has predicted arrival information.<br><br>How many minutes have at least 2+ messages, in the 30 minutes before the bus arrives? | % Minutes with 2+ Predictions |
+
 ## Prediction Inconsistency
 
 * This metric (also called jitter or wobble) captures another aspect of transit user experience. Any change in prediction is counted, so this metric **only has positive values**, but smaller positive values are better.
@@ -51,6 +67,13 @@ As the bus approaches each stop, the software is making predictions for when the
    * If the prediction is fairly consistent, we would see small spread.
 * There is [research](https://www.sciencedirect.com/science/article/abs/pii/S0965856416303494) around how transit users perceive wait time, and that users perceive longer wait times than what is actually experienced. Decreasing the perceived wait time by providing real-time information has positive benefits for user experience.
 
+### Prediction Inconsistency Metrics in Report
+
+| Goal | Metric Columns |
+|---|---|
+| Less wobbly or jittery predictions, to a point.<br><br>Real-time predictions should reflect traffic conditions and convey<br>updated information to riders, so aiming for zero is not the goal.<br><br>Higher = predictions change more = worse rider experience.<br><br>Lower = predictions are not fluctuating minute to minute = <br>riders trust the real-time arrival information. | Prediction Spread (minutes) |
+| Lower padding = riders add less time to prevent missing the bus.<br><br>Riders adjust their behavior to catch the bus, and add time to adjust <br>for receiving late predictions.<br><br>Late predictions (negative prediction error values) become the <br>*time a rider adds to make sure they don't miss the bus next time*, <br>signaling a lack of trust with the information. | Prediction Padding (minutes)<br>Absolute value of the 5th percentile prediction error.  |
+
 ## Master Services Agreement
 Exhibit H definitions (pg 53 on pdf)
 
diff --git a/rt_predictions/chart_utils_for_operators.py b/rt_predictions/chart_utils_for_operators.py
index dc9a818a2..3a0e3cacb 100644
--- a/rt_predictions/chart_utils_for_operators.py
+++ b/rt_predictions/chart_utils_for_operators.py
@@ -2,10 +2,10 @@
 Chart and map functions for operator report.
 """
 
-import _color_palette
 import altair as alt
 import pandas as pd
 from great_tables import GT
+from gtfs_curator_utils import _color_palette
 
 
 def basic_percentiles_line_chart(
@@ -20,7 +20,7 @@ def basic_percentiles_line_chart(
     """
     chart = (
         alt.Chart(df)
-        .mark_line(point=True)
+        .mark_line(point=True, interpolate="natural")  # this one seems to smooth out the curves
         .encode(
             x=alt.X(x_col, title="Prediction Error (minutes)"),
             y=alt.Y("percentile", title="Percentiles", scale=alt.Scale(domain=[0, 100])),
@@ -36,7 +36,7 @@ def basic_percentiles_line_chart(
     return chart
 
 
-def fig5and6_prediction_error_plots(df: pd.DataFrame) -> alt.Chart:
+def fig5and6_prediction_error_plots(df: pd.DataFrame, color_col: str = "day_type") -> alt.Chart:
     """
     Negative and positive prediction error plots are combined side-by-side as 1 chart.
 
@@ -47,14 +47,18 @@ def fig5and6_prediction_error_plots(df: pd.DataFrame) -> alt.Chart:
     Instead of [10, 20, ....90] for percentiles, it should show [90, 80, ...10].
     """
     # Make legend selectable
-    selection = alt.selection_point(fields=["day_type"], bind="legend")
+    selection = alt.selection_point(fields=[color_col], bind="legend")
 
-    neg_errors_chart = basic_percentiles_line_chart(df, x_col="neg_prediction_error_minutes").encode(
+    neg_errors_chart = basic_percentiles_line_chart(
+        df, x_col="neg_prediction_error_minutes", color_col=color_col
+    ).encode(
         opacity=alt.when(selection).then(alt.value(1)).otherwise(alt.value(0.2)),
         strokeWidth=alt.when(selection).then(alt.value(2)).otherwise(alt.value(1)),
     )
 
-    pos_errors_chart = basic_percentiles_line_chart(df, x_col="pos_prediction_error_minutes").encode(
+    pos_errors_chart = basic_percentiles_line_chart(
+        df, x_col="pos_prediction_error_minutes", color_col=color_col
+    ).encode(
         opacity=alt.when(selection).then(alt.value(1)).otherwise(alt.value(0.2)),
         strokeWidth=alt.when(selection).then(alt.value(2)).otherwise(alt.value(1)),
     )

From 1cd279c033fcdbfa6eecc66f288a5821865dea64 Mon Sep 17 00:00:00 2001
From: tiffanychu90 <tiffany.ku@dot.ca.gov>
Date: Fri, 1 May 2026 17:27:17 +0000
Subject: [PATCH 2/5] expand captions

---
 rt_predictions/operator_report.ipynb |  24 +-
 rt_predictions/operator_report.qmd   | 541 ---------------------------
 2 files changed, 20 insertions(+), 545 deletions(-)
 delete mode 100644 rt_predictions/operator_report.qmd

diff --git a/rt_predictions/operator_report.ipynb b/rt_predictions/operator_report.ipynb
index 934dcdfd1..54c3b469f 100644
--- a/rt_predictions/operator_report.ipynb
+++ b/rt_predictions/operator_report.ipynb
@@ -232,9 +232,19 @@
    "source": [
     "## General RT Metrics\n",
     "\n",
+    "Vehicle positions and trip updates are distinct RT data sources, and each can be paired with GTFS schedule data. \n",
+    "\n",
+    "The metrics from the schedule-RT pairing include and % of schedule trips with vehicle positions and % of scheduled trips with trip updates. These are calculated the same way across both RT data sources.\n",
+    "\n",
     "<span style=\"color:#4477aa\">**Update Availability Goal 1:** 2+ vehicle positions or trip updates messages per minute.</span>\n",
     "\n",
-    "<span style=\"color:#4477aa\">**Update Availability Goal 2:** 100% routes are covered by RT, and 75%+ of trips have RT.</span>\n"
+    "Vehicle positions or trip updates per minute is a measure of completeness *within* the RT data we are capturing, regardless of coverage gaps *across* dates.\n",
+    "\n",
+    "<span style=\"color:#4477aa\">**Update Availability Goal 2:** 75%+ of trips have RT and 100% routes are covered by RT.</span>\n",
+    "\n",
+    "Out of scheduled trips, how many trips have RT, regardless of completeness? If the trip appeared in RT trip updates with at least 1 message, the trip counts as having RT trip updates (similarly for vehicle positions).\n",
+    "\n",
+    "Out of scheduled routes, how many routes have at least 1 trip with RT? If at least 1 trip for that route had RT trip updates, that route counts as having RT trip updates (similarly for vehicle positions)."
    ]
   },
   {
@@ -289,10 +299,14 @@
    "source": [
     "## Prediction Accuracy Metrics\n",
     "\n",
-    "<span style=\"color:#4477aa\">**Update Availability Goal:** 90%+ of minutes has predicted arrival information.</span>\n",
+    "These metrics are derived entirely from RT trip updates (no comparison with schedule data).\n",
+    "\n",
+    "<span style=\"color:#4477aa\">**Update Availability Goal 3:** 90%+ of minutes has predicted arrival information.</span>\n",
     "\n",
     "<span style=\"color:#4477aa\">**Bus Catch Likelihood Goal:** 75%+ of predictions result in catching the bus.</span>\n",
     "\n",
+    "On-time predictions use [this definition](https://analysis.dds.dot.ca.gov/rt_operator_metrics/#reliable-prediction-accuracy), where predictions 5 minutes before must fall within narrower bounds to be considered on-time, compared to predictions 30 minutes out.\n",
+    "\n",
     "<span style=\"color:#4477aa\">**Prediction Error Goal:** Closer to zero or smaller positive values (early predictions). Late predictions = negative values = riders miss bus</span>\n"
    ]
   },
@@ -609,9 +623,11 @@
    "source": [
     "## Route Summary\n",
     "\n",
-    "Prediction accuracy varies by routes. The routes shown at the top have high variability (high IQRs).\n",
+    "Prediction accuracy varies by routes. The routes shown at the top have high variability. \n",
+    "\n",
+    "Variability is measured by the interquartile range (IQR), which is the difference between the 75th percentile and 25th percentile prediction errors.\n",
     "\n",
-    "* **High variability = high IQRs**: local traffic conditions mat confound the prediction algorithm. For these routes, a focus on improving service reliability through additional infrastructure (signal priority, bus lanes), or other transit planning and policies could be explored.\n",
+    "* **High variability = high IQRs**: local traffic conditions may confound the prediction algorithm. For these routes, a focus on improving service reliability through additional infrastructure (signal priority, bus lanes), or other transit planning and policies could be explored.\n",
     "\n",
     "* **Negative 25th percentiles**: riders miss the bus (late predictions). These routes may benefit from service reliability improvements for riders.\n",
     "\n",
diff --git a/rt_predictions/operator_report.qmd b/rt_predictions/operator_report.qmd
deleted file mode 100644
index 35e0e8d57..000000000
--- a/rt_predictions/operator_report.qmd
+++ /dev/null
@@ -1,541 +0,0 @@
----
-title: '{one_month_formatted} Summary'
-jupyter: python3
----
-
-```{python}
-%%capture
-
-import warnings
-warnings.filterwarnings("ignore")
-
-import altair as alt
-import branca.colormap as cm
-import folium
-import pandas as pd
-
-import calitp_data_analysis.magics
-import gt_extras as gte
-
-from great_tables import GT, md
-
-import chart_utils_for_operators as chart_utils
-import prep_operator_data
-import report_utils
-import _color_palette
-from rt_msa_utils import operator_report_month
-
-alt.data_transformers.enable("vegafusion")
-
-one_month = pd.to_datetime(operator_report_month)
-```
-
-```{python}
-#| editable: true
-#| slideshow: {slide_type: ''}
-#| tags: [parameters]
-# parameters cell
-#name = "Redding Trip Updates"
-```
-
-```{python}
-%%capture_parameters
-
-date_format = "%b %Y" # gtfs_digest/_new_operator_report_utils.py
-one_month_formatted = one_month.strftime(date_format)
-
-name, one_month_formatted
-```
-
-
-Generally, we want better transit user experience. Specifically, the performance metrics we can derive from GTFS RT Trip Updates distills into the following objectives:
-
-* Increase prediction reliability and accuracy
-* Increase the availability and completeness of GTFS RT
-* Decrease the inconsistency and fluctuations of predictions
-
-
-```{python}
-operator_cols = ["day_type"]
-
-schedule_cols = [
-    "daily_trips", "daily_service_hours", "n_routes", "n_shapes",
-    "n_stops", "num_stop_times", "daily_arrivals", 'n_days_schedule_and_rt'
-]
-
-vp_cols = ['vp_messages_per_minute', 'pct_vp_trips', 'pct_vp_routes'] #'daily_vp_trips'
-tu_cols =['tu_messages_per_minute', 'pct_tu_trips', 'pct_tu_routes'] #'daily_tu_trips',
-
-tu_prediction_cols = [
-    "bus_catch_likelihood", "pct_tu_complete_minutes", # both are percents
-    "p25", "p75", "iqr", "p50",
-    "n_predictions",
-    "prediction_padding_minutes", "avg_prediction_spread_minutes"
-]
-```
-
-```{python}
-def check_counts_across_quartet(df: pd.DataFrame):
-    url_cols = [f"{s}_base64_url" for s in ["schedule", "vp", "tu"]]
-
-    counts_df = df.groupby(["schedule_name", "vp_name", "tu_name"]).agg({
-        **{c: "nunique" for c in url_cols}
-    }).reset_index()
-
-    # Need to do this for everyone
-    counts_df["max_count"] = counts_df[url_cols].max(axis=1)
-
-    if counts_df.max_count.iloc[0] > 1:
-        print("There were multiple entries for each day_type:")
-        display(counts_df)
-
-    urls_with_most_predictions = (
-        df[url_cols + ["n_predictions"]]
-        .sort_values("n_predictions", ascending=False)
-        .reset_index(drop=True)
-    ).head(1)[url_cols]
-
-    df2 = pd.merge(
-        df,
-        urls_with_most_predictions,
-        on = url_cols,
-        how = "inner"
-    )
-
-    return df2
-```
-
-```{python}
-df = report_utils.import_operator_df(
-    filters = [[
-        ("month_first_day", "==", one_month),
-        ("tu_name", "==", name),
-    ]],
-).pipe(
-    check_counts_across_quartet
-).pipe(
-    prep_operator_data.merge_in_operator_percentiles
-)
-
-schedule_name = df.schedule_name.iloc[0]
-```
-
-```{python}
-# Set variables for color bars used across maps, route dropdown, and great tables
-PREDICTION_ERROR_COLORS =list(_color_palette.PREDICTION_ERROR_COLOR_PALETTE.values())
-PREDICTION_ERROR_INDEX = [-5, -3, -1, 1, 3, 5]
-PREDICTION_ERROR_LEGEND_CAPTION = "minutes (negative = late; positive = early)"
-
-POS_BAR_COLOR = _color_palette.get_color("blueberry")
-NEG_BAR_COLOR = _color_palette.get_color("vivid_cerise")
-```
-
-## Schedule + RT Summary Stats
-
-```{python}
-schedule_table = (
-    GT(df[operator_cols + schedule_cols])
-    .cols_label(
-        daily_trips = "Daily Trips",
-        daily_service_hours = "Daily Service Hours",
-        n_routes = "# Routes",
-        n_shapes = "# Shapes",
-        n_stops = "# Stops",
-        num_stop_times = "Total Scheduled Arrivals",
-        daily_arrivals = "Daily Scheduled Arrivals",
-        n_days_schedule_and_rt = "# days with both RT",
-    ).fmt_integer(
-        columns = [
-            "daily_trips", "n_routes", "n_shapes", "n_stops",
-            "num_stop_times", "daily_arrivals", "n_days_schedule_and_rt"]
-    ).fmt_number(
-        columns = ["daily_service_hours"], decimals=1
-    ).tab_spanner(
-        label="Schedule",
-        columns = schedule_cols
-    ).tab_header(
-        title = "Schedule + RT Summary Metrics",
-        subtitle = f"{one_month_formatted}"
-    )
-)
-
-chart_utils.format_great_table(schedule_table)
-```
-
-## General RT Metrics
-
-<span style="color:#4477aa">**Update Availability Goal 1:** 2+ vehicle positions or trip updates messages per minute.</span>
-
-<span style="color:#4477aa">**Update Availability Goal 2:** 100% routes are covered by RT, and 75%+ of trips have RT.</span>
-
-```{python}
-rt_table = (
-    GT(df[operator_cols + vp_cols + tu_cols])
-    .cols_label(
-        tu_messages_per_minute = "Trip Updates per Minute",
-        pct_tu_trips = "% Trips",
-        pct_tu_routes = "% Routes",
-        vp_messages_per_minute = "Vehicle Positions per Minute",
-        pct_vp_trips = "% Trips",
-        pct_vp_routes = "% Routes",
-    ).fmt_number(
-        columns = ["tu_messages_per_minute", "vp_messages_per_minute"],
-        decimals=1
-    ).fmt_percent(
-        columns=["pct_tu_trips", "pct_tu_routes", "pct_vp_trips", "pct_vp_routes"],
-        decimals=1
-    ).tab_spanner(
-        label="Trip Updates",
-        columns=tu_cols
-    ).tab_spanner(
-        label="Vehicle Positions",
-        columns=vp_cols
-    )
-)
-
-chart_utils.format_great_table(rt_table, day_type_grouping = True).pipe(
-    gte.gt_color_box,
-    columns=["tu_messages_per_minute", "vp_messages_per_minute"],
-    palette="Blues",
-    domain=[1, 3]
-).pipe(
-    gte.gt_hulk_col_numeric,
-    columns=["pct_tu_trips", "pct_tu_routes", "pct_vp_trips", "pct_vp_routes"],
-    palette=["#FFEC8B", "#E5F5E0"], #[light goldenrod1, white alyssum (from greens)]
-    domain=[0, 1],
-    alpha=0.1
-)
-```
-
-## Prediction Accuracy Metrics
-
-<span style="color:#4477aa">**Update Availability Goal:** 90%+ of minutes has predicted arrival information.</span>
-
-<span style="color:#4477aa">**Bus Catch Likelihood Goal:** 75%+ of predictions result in catching the bus.</span>
-
-<span style="color:#4477aa">**Prediction Error Goal:** Closer to zero or smaller positive values (early predictions). Late predictions = negative values = riders miss bus</span>
-
-```{python}
-table = (
-    GT(df[operator_cols + tu_prediction_cols])
-    .cols_label(
-        pct_tu_complete_minutes = "% Minutes with 2+ Predictions",
-        bus_catch_likelihood = "Bus Catch Likelihood (Early + On-time)",
-        p50 = "Prediction Error",
-        avg_prediction_spread_minutes = "Prediction Spread / Wobble",
-        prediction_padding_minutes = "Prediction Padding",
-        n_predictions = "# Predictions",
-        iqr = "Variability"
-    ).fmt_percent(columns=["bus_catch_likelihood", "pct_tu_complete_minutes"], decimals=1)
-    .fmt_number(columns=["p50", "avg_prediction_spread_minutes", "prediction_padding_minutes"], decimals=1)
-    .fmt_integer(columns=["n_predictions"])
-    .tab_header(title = f"Trip Update Prediction Accuracy Metrics",
-                subtitle = "units are in minutes")
-).pipe(chart_utils.format_great_table)
-
-table.pipe(
-    gte.gt_plt_dumbbell,
-    col1='p25',
-    col2='p75',
-    label = "IQR",
-    num_decimals=1,
-    col1_color=_color_palette.get_color("valentino"),
-    col2_color=_color_palette.get_color("lizard_green"),
-    width=100, height=50, # check these each time
-    font_size=8
-).pipe(
-    gte.gt_hulk_col_numeric,
-    columns=["bus_catch_likelihood", "pct_tu_complete_minutes"],
-    palette=[_color_palette.get_color("light_goldenrod"),
-             _color_palette.get_color("pastel_peppermint")],
-    domain=[0, 1],
-    alpha=0.1
-).pipe(
-    gte.gt_color_box,
-    columns=["p50"],
-    palette=PREDICTION_ERROR_COLORS,
-    domain=[-5, 5]
-).cols_width(
-    cases={
-        c: "10%" for c in [
-            "bus_catch_likelihood", "pct_tu_complete_minutes",
-            "prediction_padding_minutes", "avg_prediction_spread_minutes",
-            "n_predictions", "p50"
-        ]
-    }
-).cols_move_to_end(columns=["n_predictions"])
-#.pipe(gte.gt_color_box, columns=["iqr"], palette="YlOrRd"),
-# maybe IQR doesn't make sense to color, it'll just be ranked by day_type
-```
-
-## Prediction Error Percentiles
-
-### Distribution of Prediction Errors
-
-The 50th percentile is the typical or median rider experience, and it can show that, on average, this transit agency is roughly on-time.
-* If the 10th percentile is fairly close to the 50th percentile, it means that the transit agency is consistent and reliable in its predictions.
-* Extreme values for the 10th percentile would indicate that predictions fluctuate, or, are somewhat unreliable.
-* Steeper lines indicate fairly reliable predictions for the rider.
-
-```{python}
-decile_cols = [
-    "month_first_day", "day_type",
-    "schedule_name", "tu_name",
-    'pos_prediction_error_sec_array', 'pos_prediction_error_sec_percentile_array',
-    'neg_prediction_error_sec_array', 'prediction_error_sec_percentile_array'
-]
-
-operator_deciles_df = report_utils.import_operator_df(
-    filters = [[
-        ("month_first_day", "==", one_month),
-        ("tu_name", "==", name),
-    ]],
-    columns = decile_cols
-).pipe(prep_operator_data.operator_deciles_for_chart)
-```
-
-```{python}
-percentile_chart = chart_utils.fig5and6_prediction_error_plots(operator_deciles_df)
-percentile_chart
-```
-
-### Accuracy Loss
-Ratio of the 10th to 50th percentiles
-
-* Newmark's paper on a small sample of transit agencies suggests that the positive prediction errors typically have ratios of 4.
-* Late predictions (negative prediction errors) have ratios around 3.
-* Steeper lines = less accuracy loss = better
-   * y-axis is percentile (moving from 10th to 50th percentile is moving from upwards on y-axis)
-   * x-axis is error (smaller change along x-axis is less accuracy loss).
-   * less accuracy loss = less change along x-axis, since change along y-axis is constant (10 to 50) = steeper (unintuitive to the normal interpretation of slope!)
-
-```{python}
-operator_cols = ["day_type"]
-percentile_chart_cols = [
-    "pos_p10", "pos_p50", "pos_error_ratio",
-    "neg_p10", "neg_p50", "neg_error_ratio"
-]
-
-mini_df = df[df.month_first_day == one_month][
-    operator_cols + percentile_chart_cols]
-
-# convert the 10th, 50th percentile columns to minutes
-seconds_cols = ["pos_p10", "pos_p50", "neg_p10", "neg_p50"]
-mini_df[seconds_cols] = mini_df[seconds_cols].divide(60).round(2)
-```
-
-```{python}
-mini_p10_p50_table = (
-    GT(mini_df)
-    .cols_label(
-        pos_p10 = "10th percentile ",
-        neg_p10 = "10th percentile",
-        pos_p50 = "50th percentile",
-        neg_p50 = "50th percentile",
-        pos_error_ratio = "Accuracy Loss",
-        neg_error_ratio = "Accuracy Loss",
-    ).fmt_number(
-        columns=["pos_error_ratio", "neg_error_ratio"], decimals=1
-    ).tab_spanner(
-        label="Early Predictions (Positive Prediction Error)",
-        columns=["pos_p10", "pos_p50", "pos_error_ratio"]
-    ).tab_spanner(
-        label="Late Predictions (Negative Prediction Error)",
-        columns = ["neg_p10", "neg_p50", "neg_error_ratio"]
-    )
-    .tab_header(
-        title = "Accuracy Loss = Ratio of 10th to 50th percentile error",
-        subtitle = "units are in minutes (lower = less accuracy loss)"
-    )
-).pipe(
-    gte.gt_color_box,
-    columns=["pos_p10", "pos_p50", "neg_p10", "neg_p50"],
-    palette=PREDICTION_ERROR_COLORS,
-    domain=[-5, 5]
-)
-
-chart_utils.format_great_table(mini_p10_p50_table).pipe(
-    chart_utils.format_great_table,
-    day_type_grouping=False
-)
-```
-
-## Route Map by Priority Criteria
-
-The following layers are available and selectable (if no routes match the criteria, the layer is excluded):
-
-1. **Average prediction error** (minutes) for all routes
-2. Routes with **<90% update completeness**
-   Providing complete real-time information for all routes is the crucial foundation.
-3. **Highly Variable Routes (IQR > 3)** that could benefit from transit-supportive policies (signal priority, bus lanes).
-   The variability in prediction accuracy here could be due to the local traffic conditions.
-4. Routes with **Bus Catch Likelihood (early + on-time accuracy < 75%)**, or late predictions 25% of the time.
-
-```{python}
-route_gdf = report_utils.import_route_df(
-    filters = [[
-        ("month_first_day", "==", one_month),
-        ("schedule_name", "==", schedule_name),
-        ("day_type", "==", "Weekday")
-    ]],
-    columns = [
-        "schedule_name", "tu_name",
-        "route_dir_name",
-        "avg_prediction_error_minutes", "prediction_error_label",
-        "pct_tu_complete_minutes",
-        "iqr", "bus_catch_likelihood",
-        "geometry"
-    ]
-).drop_duplicates().reset_index(drop=True)
-```
-
-```{python}
-# Set conditions for filtering to pick out priority criteria
-condition_completeness = route_gdf.pct_tu_complete_minutes < 0.9
-condition_variability = route_gdf.iqr >= 3
-condition_likelihood = route_gdf.bus_catch_likelihood < 0.75
-```
-
-```{python}
-m = route_gdf.explore(
-    "avg_prediction_error_minutes",
-    tiles = "CartoDB Positron",
-    name = "All Routes",
-    cmap = cm.StepColormap(
-        colors=PREDICTION_ERROR_COLORS, index=PREDICTION_ERROR_INDEX,
-        vmin=-5, vmax=5,
-        tick_labels=PREDICTION_ERROR_INDEX,
-        caption=PREDICTION_ERROR_LEGEND_CAPTION
-    ),
-    marker_kwds={"fill": True},
-    style_kwds={"opacity": 0.5, "fillOpacity": 0.3}
-)
-```
-
-```{python}
-if len(route_gdf[condition_completeness]) > 0:
-    m = route_gdf[condition_completeness].explore(
-        "route_dir_name",
-        m=m,
-        tiles = "CartoDB Positron",
-        name = "< 90% update completeness", # color by route-dir name, same as stop report
-        categorical = True,
-        legend = False,
-    )
-
-if len(route_gdf[condition_variability]) > 0:
-    m = route_gdf[condition_variability].explore(
-        "iqr",
-        m=m,
-        tiles = "CartoDB Positron",
-        name = "High Variability (IQR 3+ minutes) Routes",
-        categorical = False,
-        legend = True,
-        cmap="YlOrRd",
-    )
-
-if len(route_gdf[condition_likelihood]) > 0:
-    m = route_gdf[condition_likelihood].explore(
-        "bus_catch_likelihood",
-        m=m,
-        tiles = "CartoDB Positron",
-        name = "<75% Bus Catch Likelihood",
-        categorical = False,
-        legend = True,
-        cmap="cividis"
-    )
-
-folium.LayerControl().add_to(m)
-m
-```
-
-## Route Summary
-
-Prediction accuracy varies by routes. The routes shown at the top have high variability (high IQRs).
-
-* **High variability = high IQRs**: local traffic conditions mat confound the prediction algorithm. For these routes, a focus on improving service reliability through additional infrastructure (signal priority, bus lanes), or other transit planning and policies could be explored.
-
-* **Negative 25th percentiles**: riders miss the bus (late predictions). These routes may benefit from service reliability improvements for riders.
-
-   *Interpretation*: A value of -5 means that one quarter of riders miss the bus by 5 minutes.
-
-* **scaled IQR**: IQR adjusted so predictions closer to the bus arrival are weighted more. Predictions 5 minutes out are more important than predictions 25 minutes out.
-
-```{python}
-route_iqr_df = report_utils.import_route_df(
-    filters = [[
-        ("tu_name", "==", name),
-        ("month_first_day", "==", one_month),
-        ("day_type", "==", "Weekday")
-    ]],
-    columns= [
-        "route_dir_name",
-        "avg_prediction_error_minutes",
-        "n_predictions",
-        "p25", "p75",
-        "iqr",
-        "scaled_p25", "scaled_p75",
-        # does putting IQR help in interpretation?
-    ]
-).sort_values("iqr", ascending=False)
-```
-
-```{python}
-route_iqr_table = (
-    GT(route_iqr_df)
-    .cols_label(
-        route_dir_name = "Route-Direction",
-        n_predictions = "# Predictions",
-        iqr = "Variability",
-        avg_prediction_error_minutes = "Prediction Error (minutes)",
-    ).fmt_integer(["n_predictions"])
-    .tab_header(
-        title = "Route Summary Metrics",
-        subtitle = md(
-            """
-            High IQR = variability -> focus on service reliability through transit planning and policies.
-            Variability could be due to local traffic conditions.
-            Negative 25th percentiles = riders miss bus."""
-        )
-    ).pipe(chart_utils.format_great_table, day_type_grouping=False)
-)
-
-route_iqr_table.pipe(
-    gte.gt_plt_dumbbell,
-    col1='p25',
-    col2='p75',
-    label = "IQR (minutes)",
-    num_decimals=1,
-    width=200, height=50,
-    col1_color=_color_palette.get_color("valentino"),
-    col2_color=_color_palette.get_color("lizard_green"),
-    font_size=8
-).pipe(
-    gte.gt_plt_dumbbell,
-    col1="scaled_p25",
-    col2='scaled_p75',
-    label='scaled IQR',
-    num_decimals=3,
-    width=200, height=50,
-    col1_color=_color_palette.get_color("valentino"),
-    col2_color=_color_palette.get_color("lizard_green"),
-    font_size=8
-).pipe(
-    gte.gt_color_box,
-    columns=["avg_prediction_error_minutes"],
-    palette=PREDICTION_ERROR_COLORS,
-    domain=[-5, 5]
-).pipe(
-    gte.gt_color_box,
-    columns=["iqr"],
-    palette="YlOrRd",
-).cols_width(
-    cases={
-        "avg_prediction_error_minutes": "10%",
-        "n_predictions": "10%",
-        "iqr": "10%"
-    }
-).cols_align("center").cols_align(
-    "left", columns = "route_dir_name"
-).cols_move_to_end(columns=["n_predictions"])
-```

From dde5372d9e3394ac382887401e00a1f637bae027 Mon Sep 17 00:00:00 2001
From: tiffanychu90 <tiffany.ku@dot.ca.gov>
Date: Fri, 1 May 2026 18:07:54 +0000
Subject: [PATCH 3/5] add stacked bar for prediction category counts

---
 rt_predictions/chart_utils_for_operators.py |   2 +-
 rt_predictions/operator_report.ipynb        |  39 +-
 rt_predictions/operator_report.qmd          | 586 ++++++++++++++++++++
 rt_predictions/prep_operator_data.py        |  22 +-
 4 files changed, 645 insertions(+), 4 deletions(-)
 create mode 100644 rt_predictions/operator_report.qmd

diff --git a/rt_predictions/chart_utils_for_operators.py b/rt_predictions/chart_utils_for_operators.py
index 3a0e3cacb..e4fd424b7 100644
--- a/rt_predictions/chart_utils_for_operators.py
+++ b/rt_predictions/chart_utils_for_operators.py
@@ -2,10 +2,10 @@
 Chart and map functions for operator report.
 """
 
+import _color_palette
 import altair as alt
 import pandas as pd
 from great_tables import GT
-from gtfs_curator_utils import _color_palette
 
 
 def basic_percentiles_line_chart(
diff --git a/rt_predictions/operator_report.ipynb b/rt_predictions/operator_report.ipynb
index 54c3b469f..aa2bfed69 100644
--- a/rt_predictions/operator_report.ipynb
+++ b/rt_predictions/operator_report.ipynb
@@ -96,8 +96,8 @@
     "    \"n_stops\", \"num_stop_times\", \"daily_arrivals\", 'n_days_schedule_and_rt'\n",
     "]\n",
     "\n",
-    "vp_cols = ['vp_messages_per_minute', 'pct_vp_trips', 'pct_vp_routes'] #'daily_vp_trips'\n",
-    "tu_cols =['tu_messages_per_minute', 'pct_tu_trips', 'pct_tu_routes'] #'daily_tu_trips',\n",
+    "vp_cols = ['vp_messages_per_minute', 'pct_vp_trips', 'pct_vp_routes'] \n",
+    "tu_cols =['tu_messages_per_minute', 'pct_tu_trips', 'pct_tu_routes']\n",
     "\n",
     "tu_prediction_cols = [\n",
     "    \"bus_catch_likelihood\", \"pct_tu_complete_minutes\", # both are percents\n",
@@ -369,6 +369,41 @@
     "# maybe IQR doesn't make sense to color, it'll just be ranked by day_type"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b3950f3e-2e99-4da9-bc2d-78f61ab2806b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pct_category_df = prep_operator_data.reshape_prediction_category_counts_to_long(df)\n",
+    "\n",
+    "category_counts_stacked_bar = alt.Chart(pct_category_df).mark_bar().encode(\n",
+    "    x=alt.X(\n",
+    "        'day_type',\n",
+    "        title = \"\", sort=[\"Weekday\", \"Saturday\", \"Sunday\"]\n",
+    "    ),\n",
+    "    y=alt.Y('pct', title=\"Percent\"),\n",
+    "    color=alt.Color(\n",
+    "        'prediction_category:N',\n",
+    "        title=\"Prediction Category\", \n",
+    "        sort=[\"early\", \"ontime\", \"late\"],\n",
+    "        scale=alt.Scale(range=[\n",
+    "            _color_palette.get_color(\"light_cadmium_yellow\"),\n",
+    "            _color_palette.get_color(\"electric_orange\"),\n",
+    "            _color_palette.get_color(\"aquatic\")\n",
+    "        ])\n",
+    "    ),\n",
+    "    column=alt.Column(\"tu_name\", title = \"\"),\n",
+    "    tooltip=[\"tu_name\", \"day_type\", \"prediction_category\", \"pct\"]\n",
+    ").interactive().properties(\n",
+    "    title = \"Predictions by Category\",\n",
+    "    width=150, height = 200\n",
+    ")\n",
+    "\n",
+    "category_counts_stacked_bar"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "3033b010-f091-4a32-8df1-38dc7d2fd3af",
diff --git a/rt_predictions/operator_report.qmd b/rt_predictions/operator_report.qmd
new file mode 100644
index 000000000..1229bb48f
--- /dev/null
+++ b/rt_predictions/operator_report.qmd
@@ -0,0 +1,586 @@
+---
+title: '{one_month_formatted} Summary'
+jupyter: python3
+---
+
+```{python}
+%%capture
+
+import warnings
+warnings.filterwarnings("ignore")
+
+import altair as alt
+import branca.colormap as cm
+import folium
+import pandas as pd
+
+import calitp_data_analysis.magics
+import gt_extras as gte
+
+from great_tables import GT, md
+
+import chart_utils_for_operators as chart_utils
+import prep_operator_data
+import report_utils
+import _color_palette
+from rt_msa_utils import operator_report_month
+
+alt.data_transformers.enable("vegafusion")
+
+one_month = pd.to_datetime(operator_report_month)
+```
+
+```{python}
+#| editable: true
+#| slideshow: {slide_type: ''}
+#| tags: [parameters]
+# parameters cell
+#name = "Redding Trip Updates"
+```
+
+```{python}
+%%capture_parameters
+
+date_format = "%b %Y" # gtfs_digest/_new_operator_report_utils.py
+one_month_formatted = one_month.strftime(date_format)
+
+name, one_month_formatted
+```
+
+
+Generally, we want better transit user experience. Specifically, the performance metrics we can derive from GTFS RT Trip Updates distills into the following objectives:
+
+* Increase prediction reliability and accuracy
+* Increase the availability and completeness of GTFS RT
+* Decrease the inconsistency and fluctuations of predictions
+
+
+```{python}
+operator_cols = ["day_type"]
+
+schedule_cols = [
+    "daily_trips", "daily_service_hours", "n_routes", "n_shapes",
+    "n_stops", "num_stop_times", "daily_arrivals", 'n_days_schedule_and_rt'
+]
+
+vp_cols = ['vp_messages_per_minute', 'pct_vp_trips', 'pct_vp_routes']
+tu_cols =['tu_messages_per_minute', 'pct_tu_trips', 'pct_tu_routes']
+
+tu_prediction_cols = [
+    "bus_catch_likelihood", "pct_tu_complete_minutes", # both are percents
+    "p25", "p75", "iqr", "p50",
+    "n_predictions",
+    "prediction_padding_minutes", "avg_prediction_spread_minutes"
+]
+```
+
+```{python}
+def check_counts_across_quartet(df: pd.DataFrame):
+    url_cols = [f"{s}_base64_url" for s in ["schedule", "vp", "tu"]]
+
+    counts_df = df.groupby(["schedule_name", "vp_name", "tu_name"]).agg({
+        **{c: "nunique" for c in url_cols}
+    }).reset_index()
+
+    # Need to do this for everyone
+    counts_df["max_count"] = counts_df[url_cols].max(axis=1)
+
+    if counts_df.max_count.iloc[0] > 1:
+        print("There were multiple entries for each day_type:")
+        display(counts_df)
+
+    urls_with_most_predictions = (
+        df[url_cols + ["n_predictions"]]
+        .sort_values("n_predictions", ascending=False)
+        .reset_index(drop=True)
+    ).head(1)[url_cols]
+
+    df2 = pd.merge(
+        df,
+        urls_with_most_predictions,
+        on = url_cols,
+        how = "inner"
+    )
+
+    return df2
+```
+
+```{python}
+df = report_utils.import_operator_df(
+    filters = [[
+        ("month_first_day", "==", one_month),
+        ("tu_name", "==", name),
+    ]],
+).pipe(
+    check_counts_across_quartet
+).pipe(
+    prep_operator_data.merge_in_operator_percentiles
+)
+
+schedule_name = df.schedule_name.iloc[0]
+```
+
+```{python}
+# Set variables for color bars used across maps, route dropdown, and great tables
+PREDICTION_ERROR_COLORS =list(_color_palette.PREDICTION_ERROR_COLOR_PALETTE.values())
+PREDICTION_ERROR_INDEX = [-5, -3, -1, 1, 3, 5]
+PREDICTION_ERROR_LEGEND_CAPTION = "minutes (negative = late; positive = early)"
+
+POS_BAR_COLOR = _color_palette.get_color("blueberry")
+NEG_BAR_COLOR = _color_palette.get_color("vivid_cerise")
+```
+
+## Schedule + RT Summary Stats
+
+```{python}
+schedule_table = (
+    GT(df[operator_cols + schedule_cols])
+    .cols_label(
+        daily_trips = "Daily Trips",
+        daily_service_hours = "Daily Service Hours",
+        n_routes = "# Routes",
+        n_shapes = "# Shapes",
+        n_stops = "# Stops",
+        num_stop_times = "Total Scheduled Arrivals",
+        daily_arrivals = "Daily Scheduled Arrivals",
+        n_days_schedule_and_rt = "# days with both RT",
+    ).fmt_integer(
+        columns = [
+            "daily_trips", "n_routes", "n_shapes", "n_stops",
+            "num_stop_times", "daily_arrivals", "n_days_schedule_and_rt"]
+    ).fmt_number(
+        columns = ["daily_service_hours"], decimals=1
+    ).tab_spanner(
+        label="Schedule",
+        columns = schedule_cols
+    ).tab_header(
+        title = "Schedule + RT Summary Metrics",
+        subtitle = f"{one_month_formatted}"
+    )
+)
+
+chart_utils.format_great_table(schedule_table)
+```
+
+## General RT Metrics
+
+Vehicle positions and trip updates are distinct RT data sources, and each can be paired with GTFS schedule data.
+
+The metrics from the schedule-RT pairing include and % of schedule trips with vehicle positions and % of scheduled trips with trip updates. These are calculated the same way across both RT data sources.
+
+<span style="color:#4477aa">**Update Availability Goal 1:** 2+ vehicle positions or trip updates messages per minute.</span>
+
+Vehicle positions or trip updates per minute is a measure of completeness *within* the RT data we are capturing, regardless of coverage gaps *across* dates.
+
+<span style="color:#4477aa">**Update Availability Goal 2:** 75%+ of trips have RT and 100% routes are covered by RT.</span>
+
+Out of scheduled trips, how many trips have RT, regardless of completeness? If the trip appeared in RT trip updates with at least 1 message, the trip counts as having RT trip updates (similarly for vehicle positions).
+
+Out of scheduled routes, how many routes have at least 1 trip with RT? If at least 1 trip for that route had RT trip updates, that route counts as having RT trip updates (similarly for vehicle positions).
+
+```{python}
+rt_table = (
+    GT(df[operator_cols + vp_cols + tu_cols])
+    .cols_label(
+        tu_messages_per_minute = "Trip Updates per Minute",
+        pct_tu_trips = "% Trips",
+        pct_tu_routes = "% Routes",
+        vp_messages_per_minute = "Vehicle Positions per Minute",
+        pct_vp_trips = "% Trips",
+        pct_vp_routes = "% Routes",
+    ).fmt_number(
+        columns = ["tu_messages_per_minute", "vp_messages_per_minute"],
+        decimals=1
+    ).fmt_percent(
+        columns=["pct_tu_trips", "pct_tu_routes", "pct_vp_trips", "pct_vp_routes"],
+        decimals=1
+    ).tab_spanner(
+        label="Trip Updates",
+        columns=tu_cols
+    ).tab_spanner(
+        label="Vehicle Positions",
+        columns=vp_cols
+    )
+)
+
+chart_utils.format_great_table(rt_table, day_type_grouping = True).pipe(
+    gte.gt_color_box,
+    columns=["tu_messages_per_minute", "vp_messages_per_minute"],
+    palette="Blues",
+    domain=[1, 3]
+).pipe(
+    gte.gt_hulk_col_numeric,
+    columns=["pct_tu_trips", "pct_tu_routes", "pct_vp_trips", "pct_vp_routes"],
+    palette=["#FFEC8B", "#E5F5E0"], #[light goldenrod1, white alyssum (from greens)]
+    domain=[0, 1],
+    alpha=0.1
+)
+```
+
+## Prediction Accuracy Metrics
+
+These metrics are derived entirely from RT trip updates (no comparison with schedule data).
+
+<span style="color:#4477aa">**Update Availability Goal 3:** 90%+ of minutes has predicted arrival information.</span>
+
+<span style="color:#4477aa">**Bus Catch Likelihood Goal:** 75%+ of predictions result in catching the bus.</span>
+
+On-time predictions use [this definition](https://analysis.dds.dot.ca.gov/rt_operator_metrics/#reliable-prediction-accuracy), where predictions 5 minutes before must fall within narrower bounds to be considered on-time, compared to predictions 30 minutes out.
+
+<span style="color:#4477aa">**Prediction Error Goal:** Closer to zero or smaller positive values (early predictions). Late predictions = negative values = riders miss bus</span>
+
+```{python}
+table = (
+    GT(df[operator_cols + tu_prediction_cols])
+    .cols_label(
+        pct_tu_complete_minutes = "% Minutes with 2+ Predictions",
+        bus_catch_likelihood = "Bus Catch Likelihood (Early + On-time)",
+        p50 = "Prediction Error",
+        avg_prediction_spread_minutes = "Prediction Spread / Wobble",
+        prediction_padding_minutes = "Prediction Padding",
+        n_predictions = "# Predictions",
+        iqr = "Variability"
+    ).fmt_percent(columns=["bus_catch_likelihood", "pct_tu_complete_minutes"], decimals=1)
+    .fmt_number(columns=["p50", "avg_prediction_spread_minutes", "prediction_padding_minutes"], decimals=1)
+    .fmt_integer(columns=["n_predictions"])
+    .tab_header(title = f"Trip Update Prediction Accuracy Metrics",
+                subtitle = "units are in minutes")
+).pipe(chart_utils.format_great_table)
+
+table.pipe(
+    gte.gt_plt_dumbbell,
+    col1='p25',
+    col2='p75',
+    label = "IQR",
+    num_decimals=1,
+    col1_color=_color_palette.get_color("valentino"),
+    col2_color=_color_palette.get_color("lizard_green"),
+    width=100, height=50, # check these each time
+    font_size=8
+).pipe(
+    gte.gt_hulk_col_numeric,
+    columns=["bus_catch_likelihood", "pct_tu_complete_minutes"],
+    palette=[_color_palette.get_color("light_goldenrod"),
+             _color_palette.get_color("pastel_peppermint")],
+    domain=[0, 1],
+    alpha=0.1
+).pipe(
+    gte.gt_color_box,
+    columns=["p50"],
+    palette=PREDICTION_ERROR_COLORS,
+    domain=[-5, 5]
+).cols_width(
+    cases={
+        c: "10%" for c in [
+            "bus_catch_likelihood", "pct_tu_complete_minutes",
+            "prediction_padding_minutes", "avg_prediction_spread_minutes",
+            "n_predictions", "p50"
+        ]
+    }
+).cols_move_to_end(columns=["n_predictions"])
+#.pipe(gte.gt_color_box, columns=["iqr"], palette="YlOrRd"),
+# maybe IQR doesn't make sense to color, it'll just be ranked by day_type
+```
+
+```{python}
+pct_category_df = prep_operator_data.reshape_prediction_category_counts_to_long(df)
+
+category_counts_stacked_bar = alt.Chart(pct_category_df).mark_bar().encode(
+    x=alt.X(
+        'day_type',
+        title = "", sort=["Weekday", "Saturday", "Sunday"]
+    ),
+    y=alt.Y('pct', title="Percent"),
+    color=alt.Color(
+        'prediction_category:N',
+        title="Prediction Category",
+        sort=["early", "ontime", "late"],
+        scale=alt.Scale(range=[
+            _color_palette.get_color("light_cadmium_yellow"),
+            _color_palette.get_color("electric_orange"),
+            _color_palette.get_color("aquatic")
+        ])
+    ),
+    column=alt.Column("tu_name", title = ""),
+    tooltip=["tu_name", "day_type", "prediction_category", "pct"]
+).interactive().properties(
+    title = "Predictions by Category",
+    width=150, height = 200
+)
+
+category_counts_stacked_bar
+```
+
+## Prediction Error Percentiles
+
+### Distribution of Prediction Errors
+
+The 50th percentile is the typical or median rider experience, and it can show that, on average, this transit agency is roughly on-time.
+* If the 10th percentile is fairly close to the 50th percentile, it means that the transit agency is consistent and reliable in its predictions.
+* Extreme values for the 10th percentile would indicate that predictions fluctuate, or, are somewhat unreliable.
+* Steeper lines indicate fairly reliable predictions for the rider.
+
+```{python}
+decile_cols = [
+    "month_first_day", "day_type",
+    "schedule_name", "tu_name",
+    'pos_prediction_error_sec_array', 'pos_prediction_error_sec_percentile_array',
+    'neg_prediction_error_sec_array', 'prediction_error_sec_percentile_array'
+]
+
+operator_deciles_df = report_utils.import_operator_df(
+    filters = [[
+        ("month_first_day", "==", one_month),
+        ("tu_name", "==", name),
+    ]],
+    columns = decile_cols
+).pipe(prep_operator_data.operator_deciles_for_chart)
+```
+
+```{python}
+percentile_chart = chart_utils.fig5and6_prediction_error_plots(operator_deciles_df)
+percentile_chart
+```
+
+### Accuracy Loss
+Ratio of the 10th to 50th percentiles
+
+* Newmark's paper on a small sample of transit agencies suggests that the positive prediction errors typically have ratios of 4.
+* Late predictions (negative prediction errors) have ratios around 3.
+* Steeper lines = less accuracy loss = better
+   * y-axis is percentile (moving from 10th to 50th percentile is moving from upwards on y-axis)
+   * x-axis is error (smaller change along x-axis is less accuracy loss).
+   * less accuracy loss = less change along x-axis, since change along y-axis is constant (10 to 50) = steeper (unintuitive to the normal interpretation of slope!)
+
+```{python}
+operator_cols = ["day_type"]
+percentile_chart_cols = [
+    "pos_p10", "pos_p50", "pos_error_ratio",
+    "neg_p10", "neg_p50", "neg_error_ratio"
+]
+
+mini_df = df[df.month_first_day == one_month][
+    operator_cols + percentile_chart_cols]
+
+# convert the 10th, 50th percentile columns to minutes
+seconds_cols = ["pos_p10", "pos_p50", "neg_p10", "neg_p50"]
+mini_df[seconds_cols] = mini_df[seconds_cols].divide(60).round(2)
+```
+
+```{python}
+mini_p10_p50_table = (
+    GT(mini_df)
+    .cols_label(
+        pos_p10 = "10th percentile ",
+        neg_p10 = "10th percentile",
+        pos_p50 = "50th percentile",
+        neg_p50 = "50th percentile",
+        pos_error_ratio = "Accuracy Loss",
+        neg_error_ratio = "Accuracy Loss",
+    ).fmt_number(
+        columns=["pos_error_ratio", "neg_error_ratio"], decimals=1
+    ).tab_spanner(
+        label="Early Predictions (Positive Prediction Error)",
+        columns=["pos_p10", "pos_p50", "pos_error_ratio"]
+    ).tab_spanner(
+        label="Late Predictions (Negative Prediction Error)",
+        columns = ["neg_p10", "neg_p50", "neg_error_ratio"]
+    )
+    .tab_header(
+        title = "Accuracy Loss = Ratio of 10th to 50th percentile error",
+        subtitle = "units are in minutes (lower = less accuracy loss)"
+    )
+).pipe(
+    gte.gt_color_box,
+    columns=["pos_p10", "pos_p50", "neg_p10", "neg_p50"],
+    palette=PREDICTION_ERROR_COLORS,
+    domain=[-5, 5]
+)
+
+chart_utils.format_great_table(mini_p10_p50_table).pipe(
+    chart_utils.format_great_table,
+    day_type_grouping=False
+)
+```
+
+## Route Map by Priority Criteria
+
+The following layers are available and selectable (if no routes match the criteria, the layer is excluded):
+
+1. **Average prediction error** (minutes) for all routes
+2. Routes with **<90% update completeness**
+   Providing complete real-time information for all routes is the crucial foundation.
+3. **Highly Variable Routes (IQR > 3)** that could benefit from transit-supportive policies (signal priority, bus lanes).
+   The variability in prediction accuracy here could be due to the local traffic conditions.
+4. Routes with **Bus Catch Likelihood (early + on-time accuracy < 75%)**, or late predictions 25% of the time.
+
+```{python}
+route_gdf = report_utils.import_route_df(
+    filters = [[
+        ("month_first_day", "==", one_month),
+        ("schedule_name", "==", schedule_name),
+        ("day_type", "==", "Weekday")
+    ]],
+    columns = [
+        "schedule_name", "tu_name",
+        "route_dir_name",
+        "avg_prediction_error_minutes", "prediction_error_label",
+        "pct_tu_complete_minutes",
+        "iqr", "bus_catch_likelihood",
+        "geometry"
+    ]
+).drop_duplicates().reset_index(drop=True)
+```
+
+```{python}
+# Set conditions for filtering to pick out priority criteria
+condition_completeness = route_gdf.pct_tu_complete_minutes < 0.9
+condition_variability = route_gdf.iqr >= 3
+condition_likelihood = route_gdf.bus_catch_likelihood < 0.75
+```
+
+```{python}
+m = route_gdf.explore(
+    "avg_prediction_error_minutes",
+    tiles = "CartoDB Positron",
+    name = "All Routes",
+    cmap = cm.StepColormap(
+        colors=PREDICTION_ERROR_COLORS, index=PREDICTION_ERROR_INDEX,
+        vmin=-5, vmax=5,
+        tick_labels=PREDICTION_ERROR_INDEX,
+        caption=PREDICTION_ERROR_LEGEND_CAPTION
+    ),
+    marker_kwds={"fill": True},
+    style_kwds={"opacity": 0.5, "fillOpacity": 0.3}
+)
+```
+
+```{python}
+if len(route_gdf[condition_completeness]) > 0:
+    m = route_gdf[condition_completeness].explore(
+        "route_dir_name",
+        m=m,
+        tiles = "CartoDB Positron",
+        name = "< 90% update completeness", # color by route-dir name, same as stop report
+        categorical = True,
+        legend = False,
+    )
+
+if len(route_gdf[condition_variability]) > 0:
+    m = route_gdf[condition_variability].explore(
+        "iqr",
+        m=m,
+        tiles = "CartoDB Positron",
+        name = "High Variability (IQR 3+ minutes) Routes",
+        categorical = False,
+        legend = True,
+        cmap="YlOrRd",
+    )
+
+if len(route_gdf[condition_likelihood]) > 0:
+    m = route_gdf[condition_likelihood].explore(
+        "bus_catch_likelihood",
+        m=m,
+        tiles = "CartoDB Positron",
+        name = "<75% Bus Catch Likelihood",
+        categorical = False,
+        legend = True,
+        cmap="cividis"
+    )
+
+folium.LayerControl().add_to(m)
+m
+```
+
+## Route Summary
+
+Prediction accuracy varies by routes. The routes shown at the top have high variability.
+
+Variability is measured by the interquartile range (IQR), which is the difference between the 75th percentile and 25th percentile prediction errors.
+
+* **High variability = high IQRs**: local traffic conditions may confound the prediction algorithm. For these routes, a focus on improving service reliability through additional infrastructure (signal priority, bus lanes), or other transit planning and policies could be explored.
+
+* **Negative 25th percentiles**: riders miss the bus (late predictions). These routes may benefit from service reliability improvements for riders.
+
+   *Interpretation*: A value of -5 means that one quarter of riders miss the bus by 5 minutes.
+
+* **scaled IQR**: IQR adjusted so predictions closer to the bus arrival are weighted more. Predictions 5 minutes out are more important than predictions 25 minutes out.
+
+```{python}
+route_iqr_df = report_utils.import_route_df(
+    filters = [[
+        ("tu_name", "==", name),
+        ("month_first_day", "==", one_month),
+        ("day_type", "==", "Weekday")
+    ]],
+    columns= [
+        "route_dir_name",
+        "avg_prediction_error_minutes",
+        "n_predictions",
+        "p25", "p75",
+        "iqr",
+        "scaled_p25", "scaled_p75",
+        # does putting IQR help in interpretation?
+    ]
+).sort_values("iqr", ascending=False)
+```
+
+```{python}
+route_iqr_table = (
+    GT(route_iqr_df)
+    .cols_label(
+        route_dir_name = "Route-Direction",
+        n_predictions = "# Predictions",
+        iqr = "Variability",
+        avg_prediction_error_minutes = "Prediction Error (minutes)",
+    ).fmt_integer(["n_predictions"])
+    .tab_header(
+        title = "Route Summary Metrics",
+        subtitle = md(
+            """
+            High IQR = variability -> focus on service reliability through transit planning and policies.
+            Variability could be due to local traffic conditions.
+            Negative 25th percentiles = riders miss bus."""
+        )
+    ).pipe(chart_utils.format_great_table, day_type_grouping=False)
+)
+
+route_iqr_table.pipe(
+    gte.gt_plt_dumbbell,
+    col1='p25',
+    col2='p75',
+    label = "IQR (minutes)",
+    num_decimals=1,
+    width=200, height=50,
+    col1_color=_color_palette.get_color("valentino"),
+    col2_color=_color_palette.get_color("lizard_green"),
+    font_size=8
+).pipe(
+    gte.gt_plt_dumbbell,
+    col1="scaled_p25",
+    col2='scaled_p75',
+    label='scaled IQR',
+    num_decimals=3,
+    width=200, height=50,
+    col1_color=_color_palette.get_color("valentino"),
+    col2_color=_color_palette.get_color("lizard_green"),
+    font_size=8
+).pipe(
+    gte.gt_color_box,
+    columns=["avg_prediction_error_minutes"],
+    palette=PREDICTION_ERROR_COLORS,
+    domain=[-5, 5]
+).pipe(
+    gte.gt_color_box,
+    columns=["iqr"],
+    palette="YlOrRd",
+).cols_width(
+    cases={
+        "avg_prediction_error_minutes": "10%",
+        "n_predictions": "10%",
+        "iqr": "10%"
+    }
+).cols_align("center").cols_align(
+    "left", columns = "route_dir_name"
+).cols_move_to_end(columns=["n_predictions"])
+```
diff --git a/rt_predictions/prep_operator_data.py b/rt_predictions/prep_operator_data.py
index b3ce1afdf..abb2ffd16 100644
--- a/rt_predictions/prep_operator_data.py
+++ b/rt_predictions/prep_operator_data.py
@@ -163,9 +163,29 @@ def merge_in_operator_percentiles(df: pd.DataFrame) -> pd.DataFrame:
             day_type_sorted=df1.day_type.map(report_utils.DAYTYPE_ORDER_DICT),
         )
         .rename(columns={"prediction_padding": "prediction_padding_minutes"})
-        .drop(columns=["pct_predictions_early", "pct_predictions_ontime"] + array_cols)
+        .drop(columns=array_cols)
         .sort_values(["month_first_day", "schedule_name", "tu_name", "day_type_sorted"])
         .reset_index(drop=True)
     )
 
     return df1
+
+
+def reshape_prediction_category_counts_to_long(df: pd.DataFrame) -> pd.DataFrame:
+    """
+    Get distribution of early / on-time/ late,
+    turn columns into long df to plot with altair stacked bar chart.
+    """
+    df2 = df.melt(
+        id_vars=["tu_name", "day_type"],
+        value_vars=["pct_predictions_early", "pct_predictions_ontime", "pct_predictions_late"],
+        var_name="prediction_category",
+        value_name="pct",
+    )
+
+    df2 = df2.assign(
+        prediction_category=df2.prediction_category.str.replace("pct_predictions_", ""),
+        pct=df2.pct * 100,  # scale this to match the axis without adjusting it in chart
+    )
+
+    return df2

From d0576e7d501147d01f6c87a39d9ae022ec0b5732 Mon Sep 17 00:00:00 2001
From: tiffanychu90 <tiffany.ku@dot.ca.gov>
Date: Fri, 8 May 2026 15:47:46 +0000
Subject: [PATCH 4/5] switch order of goals for bus catch & pct completeness

---
 rt_predictions/operator_report.ipynb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rt_predictions/operator_report.ipynb b/rt_predictions/operator_report.ipynb
index aa2bfed69..9dac4acc7 100644
--- a/rt_predictions/operator_report.ipynb
+++ b/rt_predictions/operator_report.ipynb
@@ -346,7 +346,7 @@
     "    font_size=8\n",
     ").pipe(\n",
     "    gte.gt_hulk_col_numeric, \n",
-    "    columns=[\"bus_catch_likelihood\", \"pct_tu_complete_minutes\"],\n",
+    "    columns=[\"pct_tu_complete_minutes\", \"bus_catch_likelihood\"],\n",
     "    palette=[_color_palette.get_color(\"light_goldenrod\"), \n",
     "             _color_palette.get_color(\"pastel_peppermint\")],\n",
     "    domain=[0, 1],\n",

From c123396de578f3f9cebd3d1ddc334b4423a8eae9 Mon Sep 17 00:00:00 2001
From: tiffanychu90 <tiffany.ku@dot.ca.gov>
Date: Fri, 8 May 2026 16:02:48 +0000
Subject: [PATCH 5/5] switch order using cols_move_to_start

---
 rt_predictions/operator_report.ipynb | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/rt_predictions/operator_report.ipynb b/rt_predictions/operator_report.ipynb
index 9dac4acc7..a2337baf2 100644
--- a/rt_predictions/operator_report.ipynb
+++ b/rt_predictions/operator_report.ipynb
@@ -330,8 +330,10 @@
     "    ).fmt_percent(columns=[\"bus_catch_likelihood\", \"pct_tu_complete_minutes\"], decimals=1)\n",
     "    .fmt_number(columns=[\"p50\", \"avg_prediction_spread_minutes\", \"prediction_padding_minutes\"], decimals=1)\n",
     "    .fmt_integer(columns=[\"n_predictions\"])\n",
-    "    .tab_header(title = f\"Trip Update Prediction Accuracy Metrics\", \n",
-    "                subtitle = \"units are in minutes\")\n",
+    "    .tab_header(\n",
+    "        title = f\"Trip Update Prediction Accuracy Metrics\", \n",
+    "        subtitle = \"units are in minutes\"\n",
+    "    )\n",
     ").pipe(chart_utils.format_great_table)\n",
     "\n",
     "table.pipe(\n",
@@ -364,7 +366,11 @@
     "            \"n_predictions\", \"p50\"\n",
     "        ]\n",
     "    }\n",
-    ").cols_move_to_end(columns=[\"n_predictions\"])\n",
+    ").cols_move_to_end(\n",
+    "    columns=[\"n_predictions\"]\n",
+    ").cols_move_to_start(\n",
+    "    columns=[\"pct_tu_complete_minutes\"]\n",
+    ")\n",
     "#.pipe(gte.gt_color_box, columns=[\"iqr\"], palette=\"YlOrRd\"), \n",
     "# maybe IQR doesn't make sense to color, it'll just be ranked by day_type"
    ]
@@ -773,9 +779,9 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "Pyproject Local (use-venv)",
    "language": "python",
-   "name": "python3"
+   "name": "pyproject_local_kernel_use_venv"
   },
   "language_info": {
    "codemirror_mode": {