Skip to content

[BUG] DocLevelMonitorQueries: inverted condition causes query index to be deleted and recreated on every monitor execution #2153

@thecodingshrimp

Description

@thecodingshrimp

What is the bug?
In DocLevelMonitorQueries.kt (~line 500), the condition controlling when to re-fetch the write index for the query index alias uses != instead of ==, causing the re-fetch path to fire on every monitor execution rather than only in the backwards-compatibility case.

// BROKEN (current): fires ALWAYS because the stored concrete index name
// (e.g. ".opensearch-sap-pre-packaged-rules-queries-000001") always differs from the alias name
if (targetQueryIndex != monitor.dataSources.queryIndex && monitor.deleteQueryIndexInEveryRun == true)

// CORRECT: fires only when the alias name itself was mistakenly stored in metadata (legacy case)
if (targetQueryIndex == monitor.dataSources.queryIndex && monitor.deleteQueryIndexInEveryRun == true)

Every time the condition fires unnecessarily, getWriteIndexNameForAlias is called and metadata is rewritten, triggering a delete+recreate of the backing query index. This generates 6–10 MergeSchedulerConfig log lines per node per cycle.

How can one reproduce the bug?

  1. Deploy OpenSearch with the Security Analytics plugin and enable chained findings monitors (which set deleteQueryIndexInEveryRun=true).
  2. Run any doc-level monitor backed by the default query index alias.
  3. Observe log output: each monitor execution produces a burst of MergeSchedulerConfig log entries from the query index being deleted and recreated.
  4. At scale (3 master nodes, many detectors), this produces 23,000+ log lines/minute.

What is the expected behavior?
The re-fetch of the write index name from the alias should only occur in the backwards-compatibility case — when the metadata stored the alias name itself (not the concrete backing index name). Under normal operation the stored concrete index name differs from the alias, so the condition should evaluate to false and no delete+recreate should occur.

What is your host/environment?

  • OS: Linux
  • Version: OpenSearch 3.x
  • Plugins: Security Analytics plugin with chained findings monitors enabled

Do you have any additional context?
This bug compounds with a related issue in opensearch-project/security-analytics: the chained_findings monitor is created with deleteQueryIndexInEveryRun=true, which is the trigger that activates the broken condition path. Both fixes are required to fully eliminate the log storm. The security-analytics fix prevents the flag from being set unnecessarily; this fix corrects the inverted logic that acts on the flag.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions