Skip to content

Fix audit log writing errors for rollover-enabled alias indices#5900

Open
pCastq wants to merge 10 commits intoopensearch-project:mainfrom
pCastq:bug-#5878
Open

Fix audit log writing errors for rollover-enabled alias indices#5900
pCastq wants to merge 10 commits intoopensearch-project:mainfrom
pCastq:bug-#5878

Conversation

@pCastq
Copy link
Contributor

@pCastq pCastq commented Jan 12, 2026

Description

[Bug fix, Enhancement , Test fix]

the audit log cannot be written to an index that already exists when using an alias with a rollover policy. This causes errors and prevents audit events from being stored.

  • Old behavior:
    When an audit log index already exists under an alias, OpenSearch throws an error and does not insert audit events.
  • New behavior:
    The audit log now detects if the target index already exists and inserts new events into it instead of failing.

Issues Resolved

[#5878 + integration test]

Testing

  • Integration tests confirming that rollover policies are respected and no errors occur when writing to an existing index.

Check List

  • New functionality includes testing
  • New functionality has been documented
  • New Roles/Permissions have a corresponding security dashboards plugin PR
  • API changes companion pull request created
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@pCastq
Copy link
Contributor Author

pCastq commented Jan 12, 2026

The part of integration test is to hard.
(I've been racking my brains for three days trying to figure out the right parts to use)

We need separate test classes because they test distinct code paths that require different cluster configurations.
InternalOpenSearchSinkIntegrationTest -> validates the default date-based index creation (metadata.hasIndex branch), while
InternalOpenSearchSinkIntegrationTest_AuditAlias -> validates write alias support (metadata.hasAlias branch).
These configurations are mutually exclusive and cannot coexist in a single @ClassRule cluster.

Cleaning indices between tests would require cluster restarts or expensive index deletion, significantly slowing CI/CD pipelines. Instead, we use delta-based assertions (before/after counts) that ensure test isolation without cleanup overhead.

And there are other stuff that need to look carefully throught the object
For example when we use LocalCluster, if we use a public Builder audit(AuditConfiguration auditConfiguration) method , we using always TestRuleAuditLogSink and have some pre-setting or fixed setting...

There are a lot of comment and javadoc.

Copy link
Collaborator

@nibix nibix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this. I left a couple of comments.

Could you also please:

  • Check the CI errors regarding code hygiene? maybe it is just a missing spotlessApply
  • Write a changelog entry
  • Change the name of this PR to something more descriptive

Thank you :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need that here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not quite sure if we need anonymous auth?

pCastq and others added 5 commits January 16, 2026 00:07
Signed-off-by: Pietro Paolo Castagna <PietroPaolo.Castagna@gmail.com>
Removed unnecessary comments from audit log integration tests.

Signed-off-by: pCastq <131659139+pCastq@users.noreply.github.com>
Signed-off-by: pCastq <131659139+pCastq@users.noreply.github.com>
Signed-off-by: Pietro Paolo Castagna <PietroPaolo.Castagna@gmail.com>
Signed-off-by: Pietro Paolo Castagna <PietroPaolo.Castagna@gmail.com>
@pCastq
Copy link
Contributor Author

pCastq commented Jan 16, 2026

I intentionally did not unify these tests with InternalAuditLogTest because they validate different responsibilities and operate under fundamentally different assumptions.

InternalAuditLogTest is a cluster-level smoke test: it verifies that, in a fully secured single-node cluster, the audit index is created, shards are allocated, and the index reaches green health. Its purpose is infrastructure readiness, not application logic.

The tests in this PR instead focus on InternalOpenSearchSink behavior: index creation when absent, alias detection, and write routing for date-based vs alias-based configurations. They explicitly cover sink-level code paths that InternalAuditLogTest does not exercise at all.

Unifying the tests would also force a much heavier configuration (admin users, HTTP Basic auth, compliance logging, transport events). This would introduce unnecessary overhead, extra audit noise, and additional failure modes, without increasing coverage for the sink logic being tested.

Keeping these tests separate preserves clarity, focus, and maintainability. Each test suite remains aligned with its specific goal, and failures remain easy to diagnose. Combining them would add complexity without providing real value, in my opinion.

However , as @nibix suggest, I use Awaitility instead of Thread.sleep() because audit events are generated asynchronously allows the test to wait until a meaningful functional condition is met (i.e. a new audit event becomes visible) and to fail deterministically after a bounded timeout.

This approach makes the test more robust, avoids flakiness in CI, and ensures that we only proceed once the audit data has actually been indexed and refreshed.

@nibix nibix changed the title Fix bug #5878 Fix audit log writing errors for rollover-enabled alias indices Jan 20, 2026
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

si prega di preferire l'uso dell'inglese :-)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate why this is something related to the OpenSearch sink? Isnt this a behavior that occurs at a higher level?

* test created a new event.</p>
*/
@Test
public void testAuditDocumentsViaAliasContainMandatoryFields() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a test which covers the alias logic? It seems to be this covers logic from a higher level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test validates the end-to-end write path through the alias branch of createIndexIfAbsent(). The other two tests verify that the alias is recognized (testRecognizesAuditTargetAsWriteAlias) and that documents are counted (testWritesEventsToAliasSuccessfully), but neither inspects the document content. This test closes the loop by confirming that a document written through the alias pipeline — createIndexIfAbsent() → prepareIndex(aliasName) → execute() — arrives in the backing index structurally intact with all mandatory fields and correct values. Without it, we'd know documents are delivered but not that they're correct.

Signed-off-by: Pietro Paolo Castagna <PietroPaolo.Castagna@gmail.com>
Signed-off-by: Pietro Paolo Castagna <PietroPaolo.Castagna@gmail.com>
Copy link
Collaborator

@nibix nibix Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 seconds is actually the default timeout constraint, so this can be dropped. Same holds for the poll interval.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also use default timeout and poll interval here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hamcrest way of doing such assertions would be

assertThat(auditDoc, hasKey("@timestamp"));
assertThat(auditDoc, hasKey("audit_rest_request_method"));
assertThat(auditDoc, hasKey("audit_rest_request_path"));

This will yield much richer assertion errors in the case of test failures.

Copy link
Collaborator

@nibix nibix Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Objects.requireNonNull is redundant, with our without you will get a NPE in case getTotalHits() returns null.

@nibix
Copy link
Collaborator

nibix commented Jan 30, 2026

Thank you, looks much better! I have added a few minor comments. Also there are still a few leftover comments from the previous round. Could you please have a look at these?

@codecov
Copy link

codecov bot commented Feb 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.74%. Comparing base (0dcc1c3) to head (f7abda7).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5900      +/-   ##
==========================================
- Coverage   73.75%   73.74%   -0.01%     
==========================================
  Files         439      439              
  Lines       26884    26894      +10     
  Branches     3980     3983       +3     
==========================================
+ Hits        19828    19833       +5     
- Misses       5168     5177       +9     
+ Partials     1888     1884       -4     
Files with missing lines Coverage Δ
...security/auditlog/sink/InternalOpenSearchSink.java 100.00% <100.00%> (ø)

... and 11 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments