Skip to content

fix: dismiss reliability - write lock, batched UPDATE, transaction, timeout (#718)#733

Open
HannahVernon wants to merge 14 commits intoerikdarlingdata:devfrom
HannahVernon:feature/dismiss-reliability-718
Open

fix: dismiss reliability - write lock, batched UPDATE, transaction, timeout (#718)#733
HannahVernon wants to merge 14 commits intoerikdarlingdata:devfrom
HannahVernon:feature/dismiss-reliability-718

Conversation

@HannahVernon
Copy link
Copy Markdown
Contributor

@HannahVernon HannahVernon commented Mar 27, 2026

What does this PR do?

Addresses write lock, transaction wrapping, and batch updates from the issue #718 improvement list. This makes alert dismissal reliable under concurrent archival and prevents UI freezes.

Key changes:

  • Write lock for dismiss operationsDismissAlertsAsync and DismissAllVisibleAlertsAsync now acquire an exclusive write lock instead of a read lock, preventing race conditions with archival/compaction
  • Batched UPDATEDismissAlertsAsync sends a single UPDATE ... WHERE (alert_time, server_id, metric_name) IN (VALUES ...) instead of looping, reducing round-trips and lock hold time
  • Transaction wrappingDismissAlertsAsync uses BEGIN/COMMIT/ROLLBACK for all-or-nothing semantics
  • Timeout on write lockAcquireWriteLock(TimeSpan?) now supports an optional timeout (5 seconds for UI paths) to prevent indefinite UI freeze when archival holds the lock
  • User-friendly error handling — both dismiss button handlers catch TimeoutException and show a "database busy, try again" message instead of silently failing
  • LockedConnection cleanup — updated class documentation and field naming to reflect that it handles both read and write locks

Which component(s) does this affect?

  • Full Dashboard
  • Lite Dashboard
  • Lite Tests
  • SQL collection scripts
  • CLI Installer
  • GUI Installer
  • Documentation

How was this tested?

  • 7 new xUnit v3 tests in DismissReliabilityTests.cs covering:
    • Batched UPDATE dismisses multiple alerts in a single statement
    • Correct affected count when some alerts are already dismissed
    • Transaction rollback restores undismissed state
    • Transaction commit persists dismissed state
    • Write lock exclusivity blocks concurrent acquisition
    • Timeout throws TimeoutException instead of blocking indefinitely
    • DismissAll UPDATE works correctly under write lock
  • Full test suite: 225 tests, all passing
  • Build: 0 errors, 0 new warnings (pre-existing CS8602 only)

Checklist

  • I have read the contributing guide
  • My code builds with zero warnings (dotnet build -c Debug)
  • I have tested my changes against at least one SQL Server version
  • I have not introduced any hardcoded credentials or server names

HannahVernon and others added 13 commits March 10, 2026 17:08
- Parse detail_text to extract Database, Query Text, and Wait Type
  when using 'Mute This Alert' from alert history (both editions)
- Add PopulateFromDetailText() to AlertMuteContext for structured
  field extraction from the label: value format
- Add 'Default expiration for new mute rules' dropdown to Settings
  in both editions (1 hour, 24 hours, 7 days, Never; default 24h)
- MuteRuleDialog now selects the configured default expiration
  instead of always defaulting to 'Never'
- Persist setting as mute_rule_default_expiration in settings.json
  (Lite) and preferences.json (Dashboard)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The XML doc claimed Job Name extraction but the parser did not
implement it. Add the missing branch in both Dashboard and Lite
editions so the behavior matches the documentation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Collapse newlines in Truncate/TruncateText so detail_text fields
  stay single-line in the label: value format
- Handle multi-line query values in PopulateFromDetailText by
  accumulating continuation lines until the next indented field
- Recognize variant query labels (Blocked Query, Blocking Query,
  Victim SQL) in addition to Query

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Explain that the field is a case-insensitive substring match and
suggest entering a distinctive fragment like a table or procedure name.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…eout

- DuckDbInitializer.AcquireWriteLock() now accepts optional TimeSpan timeout
  to prevent indefinite blocking when archival holds the lock
- DismissAlertsAsync uses exclusive write lock + single batched UPDATE with
  VALUES list + BEGIN/COMMIT/ROLLBACK transaction for all-or-nothing semantics
- DismissAllVisibleAlertsAsync uses exclusive write lock (was read lock)
- OpenWriteConnectionAsync uses 5-second timeout to prevent UI freeze
- UI dismiss handlers catch TimeoutException with friendly retry message
- LockedConnection updated to document both read and write lock usage
- 7 new tests covering batched UPDATE, transaction commit/rollback,
  write lock exclusivity, and timeout behavior (225 total, all passing)

Addresses items #4, #5, erikdarlingdata#11 from issue erikdarlingdata#718 improvement list.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@erikdarlingdata
Copy link
Copy Markdown
Owner

Hey Hannah — we merged #729, #730, and #734 into dev. This PR (#733) needs a rebase onto current dev to reconcile the batched UPDATE/write lock/transaction changes with the sidecar fallback from #730 and the structured telemetry from #734. The dismiss methods in LocalDataService.AlertHistory.cs have changed significantly since this branch was last synced.

Specifically, DismissAlertsAsync now has the sidecar INSERT fallback and structured logging — the batched UPDATE rewrite here needs to preserve both of those. Same for DismissAllVisibleAlertsAsync with the sidecar bulk INSERT.

No rush — the existing dismiss flow works correctly, this PR adds reliability improvements on top. Thanks for the great work on this series!

# Conflicts:
#	Lite/Services/LocalDataService.AlertHistory.cs
@HannahVernon
Copy link
Copy Markdown
Contributor Author

@erikdarlingdata this is now rebased

@HannahVernon
Copy link
Copy Markdown
Contributor Author

Batched UPDATE — DismissAlertsAsync sends a single UPDATE ... WHERE (alert_time, server_id, metric_name) IN (VALUES ...)

In SQL Server, I'd frown on using an IN (...) construct, but I'm not sure what options we have for DuckDB; perhaps batching in stages if there are more than 'x' number of alerts being dismissed. Not sure what 'x' should be; perhaps that should be tested.

@erikdarlingdata
Copy link
Copy Markdown
Owner

Agreed in principle, but in practice the dismiss count is going to be small — there's a low chance anyone will dismiss more than 100 alerts in a go. If it ever becomes a concern we can chunk it, but not worth the complexity right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants