Skip to content

[Labelling Health] Labelling Health Report — 2026-06-03 #455

@mnkiefer

Description

@mnkiefer

Summary

  • Overall status: mixed
  • The prediction pipeline is active and applying label changes, but the correction feedback loop has been silent for over 6 weeks (last correction signal dates from 2026-04-18). Daily summary counts did not parse into structured fields, limiting automated trend comparison.

Key Metrics

Metric Value
Discussions reviewed, last 7 days ~13 (inferred from summary bodies)
Label changes applied, last 7 days ~9 (inferred from summary bodies)
Change rate, last 7 days ~69% (9 / 13)
Previous 7-day window data None available (no older summaries loaded)
Correction-intake (Collect Corrections) runs, last 7 days 30 runs — 14 succeeded, 16 cancelled/failed
Predict Labels runs, last 7 days 4 (all succeeded)
Persisted prediction snapshots 0
Open correction signals 0
Correction signals created, last 7 days 0
Correction signals created, last 30 days 0

Note: The reviewed and changed fields in the daily summaries did not parse into the health data's structured fields (both show null). Counts above are manually extracted from summary body text. Trend comparison with the previous 7-day window is not possible — no summaries from that period were included.


Correction Pressure

No new correction signals have been created in the last 30 days. All 335 historical signals are closed. The newest signal (#404) dates from 2026-04-18 — over six weeks ago.

Historical label pressure (across all 335 closed signals) was concentrated in:

Historical label distribution (all closed signals)
Label Signal Count
Copilot 56
Copilot in GitHub 42
GitHub Education 38
bug 38
Other Features and Feedback 20
question 14
Apps API and Webhooks 10
Profile 10
Mobile 10
Product Feedback 9

Current snapshot-backed prediction pressure cannot be computed: no prediction snapshots are persisted and no truth diffs exist. This means underprediction and overprediction breakdowns are unavailable for this cycle.

The ~69% label-change rate observed in the last 7 days is notable. Without correction signals or snapshot truth comparisons, it is not possible to determine whether these changes reflect genuine improvement or systematic over-labelling.


Open Instruction Debt

The correction backlog is at zero — all 335 signals are closed and all 5 correction parent intake issues are closed. No new signals have been filed in over 45 days.

This could mean:

  1. The labelling system is performing well and human reviewers are not finding errors worth flagging.
  2. The correction collection pipeline (Collect Corrections) is not surfacing errors — only 14 of 30 runs in the last 7 days succeeded; 16 were cancelled or failed.
  3. Discussions are not being reviewed by humans at a rate that generates correction volume.

The correction backlog is not stale — it is empty — but the absence of new signals over 45 days is itself a signal worth investigating.


Recommendations

  1. Investigate why daily summary reviewed/changed counts do not parse into structured fields. Three consecutive summaries returned null for both fields despite having numeric data in their bodies. This blocks automated health trending and should be fixed in the summary parsing step.

  2. Investigate the Collect Corrections run failure rate. Only 14 of 30 runs in the last 7 days completed successfully; 16 were cancelled or failed. If correction collection is unreliable, new human corrections may be silently dropped.

  3. Verify that human correction signals are expected to be absent. If no new correction parent issues have been opened since April 2026, confirm whether the community review process is still active. If it has paused, the health signal from correction pressure will remain artificially zero.

  4. Enable prediction snapshot persistence. Zero snapshots are stored, which prevents snapshot-backed truth diff analysis. Without this, overprediction and underprediction pressure by label cannot be computed and adaptation mode has no deterministic artifact to act on.


Recent daily summary issues
Issue Date Reviewed (body) Changed (body)
#453 2026-06-02 11 7
#450 2026-05-31 1 1
#445 2026-05-27 1 1
Open correction signal breakdown

No open correction signals. All 335 signals are closed. Last signal (#404) was closed in April 2026.

Recent workflow run references
Workflow Run Status
Predict Labels §6 success
Collect Corrections §40 success
Review Health §8 in_progress

References

  • §6 — Predict Labels, most recent run
  • §40 — Collect Corrections, most recent successful run
  • §8 — Review Health, current run

Generated by Review Health · sonnet46 1.1M ·

  • expires on Jul 3, 2026, 4:49 AM UTC

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions