fix: dismiss reliability - write lock, batched UPDATE, transaction, timeout (#718)#733
fix: dismiss reliability - write lock, batched UPDATE, transaction, timeout (#718)#733HannahVernon wants to merge 14 commits intoerikdarlingdata:devfrom
Conversation
- Parse detail_text to extract Database, Query Text, and Wait Type when using 'Mute This Alert' from alert history (both editions) - Add PopulateFromDetailText() to AlertMuteContext for structured field extraction from the label: value format - Add 'Default expiration for new mute rules' dropdown to Settings in both editions (1 hour, 24 hours, 7 days, Never; default 24h) - MuteRuleDialog now selects the configured default expiration instead of always defaulting to 'Never' - Persist setting as mute_rule_default_expiration in settings.json (Lite) and preferences.json (Dashboard) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The XML doc claimed Job Name extraction but the parser did not implement it. Add the missing branch in both Dashboard and Lite editions so the behavior matches the documentation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Collapse newlines in Truncate/TruncateText so detail_text fields stay single-line in the label: value format - Handle multi-line query values in PopulateFromDetailText by accumulating continuation lines until the next indented field - Recognize variant query labels (Blocked Query, Blocking Query, Victim SQL) in addition to Query Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Explain that the field is a case-insensitive substring match and suggest entering a distinctive fragment like a table or procedure name. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Feature/alert muting part 2
…eout - DuckDbInitializer.AcquireWriteLock() now accepts optional TimeSpan timeout to prevent indefinite blocking when archival holds the lock - DismissAlertsAsync uses exclusive write lock + single batched UPDATE with VALUES list + BEGIN/COMMIT/ROLLBACK transaction for all-or-nothing semantics - DismissAllVisibleAlertsAsync uses exclusive write lock (was read lock) - OpenWriteConnectionAsync uses 5-second timeout to prevent UI freeze - UI dismiss handlers catch TimeoutException with friendly retry message - LockedConnection updated to document both read and write lock usage - 7 new tests covering batched UPDATE, transaction commit/rollback, write lock exclusivity, and timeout behavior (225 total, all passing) Addresses items #4, #5, erikdarlingdata#11 from issue erikdarlingdata#718 improvement list. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Hey Hannah — we merged #729, #730, and #734 into dev. This PR (#733) needs a rebase onto current dev to reconcile the batched UPDATE/write lock/transaction changes with the sidecar fallback from #730 and the structured telemetry from #734. The dismiss methods in Specifically, No rush — the existing dismiss flow works correctly, this PR adds reliability improvements on top. Thanks for the great work on this series! |
# Conflicts: # Lite/Services/LocalDataService.AlertHistory.cs
|
@erikdarlingdata this is now rebased |
In SQL Server, I'd frown on using an |
|
Agreed in principle, but in practice the dismiss count is going to be small — there's a low chance anyone will dismiss more than 100 alerts in a go. If it ever becomes a concern we can chunk it, but not worth the complexity right now. |
What does this PR do?
Addresses write lock, transaction wrapping, and batch updates from the issue #718 improvement list. This makes alert dismissal reliable under concurrent archival and prevents UI freezes.
Key changes:
DismissAlertsAsyncandDismissAllVisibleAlertsAsyncnow acquire an exclusive write lock instead of a read lock, preventing race conditions with archival/compactionDismissAlertsAsyncsends a singleUPDATE ... WHERE (alert_time, server_id, metric_name) IN (VALUES ...)instead of looping, reducing round-trips and lock hold timeDismissAlertsAsyncusesBEGIN/COMMIT/ROLLBACKfor all-or-nothing semanticsAcquireWriteLock(TimeSpan?)now supports an optional timeout (5 seconds for UI paths) to prevent indefinite UI freeze when archival holds the lockTimeoutExceptionand show a "database busy, try again" message instead of silently failingWhich component(s) does this affect?
How was this tested?
DismissReliabilityTests.cscovering:TimeoutExceptioninstead of blocking indefinitelyChecklist
dotnet build -c Debug)