Skip to content

fix: health indicator#4636

Open
pablocarle wants to merge 13 commits into
v3.x.xfrom
reboot/fix/startup-message-multi
Open

fix: health indicator#4636
pablocarle wants to merge 13 commits into
v3.x.xfrom
reboot/fix/startup-message-multi

Conversation

@pablocarle

Copy link
Copy Markdown
Contributor

Description

Health indicator in multi-service deployment can show API Mediation Layer started message ZWEAM001I before the instance's Discovery / ZAAS (especially in slow environments)
This PR fixes it by verifying count of instance's of Discovery / ZAAS

Type of change

  • fix: Bug fix (non-breaking change which fixes an issue)

Checklist:

  • My code follows the style guidelines of this project
  • PR title conforms to commit message guideline ## Commit Message Structure Guideline
  • I have commented my code, particularly in hard-to-understand areas. In JS I did provide JSDoc
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • The java tests in the area I was working on leverage @nested annotations
  • Any dependent changes have been merged and published in downstream modules

Pablo Carle added 5 commits May 18, 2026 13:55
Signed-off-by: Pablo Carle <pablo.carle@broadcom.com>
Signed-off-by: Pablo Carle <pablo.carle@broadcom.com>
Signed-off-by: Pablo Carle <pablo.carle@broadcom.com>
Signed-off-by: Pablo Carle <pablo.carle@broadcom.com>
Signed-off-by: Pablo Carle <pablo.carle@broadcom.com>
pablocarle and others added 5 commits May 19, 2026 10:04
Signed-off-by: Pablo Carle <pablo.carle@broadcom.com>
Signed-off-by: Pablo Carle <pablo.carle@broadcom.com>
Signed-off-by: Pablo Carle <pablo.carle@broadcom.com>
@EvaJavornicka EvaJavornicka moved this from New to In Progress in API Mediation Layer Backlog Management May 20, 2026
var zaasUp = !this.discoveryClient.getInstances(CoreService.ZAAS.getServiceId()).isEmpty();

var gatewayCount = this.discoveryClient.getInstances(CoreService.GATEWAY.getServiceId()).size();
var discoveryCount = this.discoveryClient.getInstances(CoreService.DISCOVERY.getServiceId()).size();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about rather to get how many instances are in the configuration? The check itself has no exact amount in the validation.

Also, if there is no validation of instance, how do we know that only local instances are up or all in the HA setup?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it makes sense. This is a partial fix for a specific scenario in which the second instance would print the API Mediation Layer ready before Discovery and/or ZAAS are available.
I guess the question is more generic. It's true this does not cover scenarios with 3 or more instances. It's also true that the message in the second instance is correct even if there's only one Discovery and/or ZAAS.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to refactor it

balhar-jakub
balhar-jakub previously approved these changes Jun 2, 2026

@balhar-jakub balhar-jakub left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note - there is a scenario when the HA setup will be functionally correct even though not all the instances will be fully setup.

As this is atypical if not supported at all, I believe we are ok with approving and merging the PR.

The scenario explanation

Instance crashes during startup and never registers (in HA scenario instance means for example one zaas service on one LPAR out of three), the counts will never match and onFullyUp() will never be called. The "started" message never publishes. There's no timeout, no retry limit, no fallback.

Potential Suggestion: Add a time-based fallback — if the counts haven't converged after a configurable timeout (e.g., 5 minutes), log a warning and publish anyway

@sonarqubecloud

sonarqubecloud Bot commented Jun 2, 2026

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
75.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@balhar-jakub balhar-jakub self-requested a review June 5, 2026 08:10
@balhar-jakub balhar-jakub dismissed their stale review June 5, 2026 08:11

There will be additional work on this PR that will need full review afterwards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

4 participants