Skip to content

Scope orphaned-event cleanup to recent ingest window to eliminate 8-hour ES CPU spikes#2279

Draft
Copilot wants to merge 3 commits into
mainfrom
copilot/investigate-cluster-cpu-spikes
Draft

Scope orphaned-event cleanup to recent ingest window to eliminate 8-hour ES CPU spikes#2279
Copilot wants to merge 3 commits into
mainfrom
copilot/investigate-cluster-cpu-spikes

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 31, 2026

Orphaned data cleanup runs every 8 hours and was scanning event history across all time for missing stacks/projects/organizations, driving periodic Elasticsearch CPU spikes. This change narrows orphan-event detection to the same practical submission window (last 3 days), reducing query fanout and delete scope.

  • What changed

    • Added a fixed lookback window in CleanupOrphanedDataJob:
      • OrphanedEventLookback = TimeSpan.FromDays(3)
    • Applied the cutoff to all orphaned-event passes:
      • stack-based orphan detection
      • project-based orphan detection
      • organization-based orphan detection
  • Query/deletion behavior update

    • Cardinality + terms aggregations now run only on events with CreatedUtc >= cutoff.
    • DeleteByQuery now uses a bool/filter combining:
      • missing parent IDs (terms)
      • recent-event cutoff (date_range on CreatedUtc)
  • Regression coverage

    • Added CanDeleteOnlyRecentOrphanedEventsByStack to verify:
      • recent orphaned events are removed
      • older orphaned events remain untouched
var cutoff = _timeProvider.GetUtcNow().UtcDateTime.Subtract(TimeSpan.FromDays(3));

await _elasticClient.DeleteByQueryAsync<PersistentEvent>(r => r.Query(q => q.Bool(b => b
    .Filter(
        f => f.Terms(t => t.Field(e => e.StackId).Terms(missingStackIds)),
        f => f.DateRange(d => d.Field(e => e.CreatedUtc).GreaterThanOrEquals(cutoff))))));

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copilot AI linked an issue Jun 1, 2026 that may be closed by this pull request
Copilot AI and others added 2 commits June 1, 2026 00:22
Co-authored-by: niemyjski <1020579+niemyjski@users.noreply.github.com>
Co-authored-by: niemyjski <1020579+niemyjski@users.noreply.github.com>
Copilot AI changed the title [WIP] Investigate cluster CPU spikes during orphan data cleanup Scope orphaned-event cleanup to recent ingest window to eliminate 8-hour ES CPU spikes Jun 1, 2026
Copilot AI requested a review from niemyjski June 1, 2026 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate cluster CPU spikes

3 participants