Skip to content

Conversation

@gmarciani
Copy link
Contributor

@gmarciani gmarciani commented Jan 26, 2026

Description of changes

Emit metric ClustermgtdHeartbeat to signal clustermgtd heartbeat.
The metric is emitted with dimensions: ClusterName and InstanceId.

The metric is intentionally emitted at the end of the clustermgtd loop to represent the real health of the daemon.
If it was emitted at the beginning of the iteration, it would be open to false negatives.

This PR depends on the permissions added in aws/aws-parallelcluster#7209

Tests

  • Unit tests (extended to cover the current changes)
  • Manually verified by creating a cluster and checking that (i) the metric is pushed (ii) when the metric push fails (manually removing the permissions to push it), the overall clustermgtd iteration is not compromised.

References

  1. The max length for a metric name is 255 chars. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_MetricDatum.html

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@gmarciani gmarciani force-pushed the wip/mgiacomo/3150/clustermgtd-metrics-0126-1 branch 2 times, most recently from 8b3ee75 to b9d50bb Compare January 26, 2026 22:21
@gmarciani gmarciani marked this pull request as ready for review January 26, 2026 22:26
@gmarciani gmarciani requested review from a team as code owners January 26, 2026 22:26
@gmarciani gmarciani force-pushed the wip/mgiacomo/3150/clustermgtd-metrics-0126-1 branch 3 times, most recently from 9983ec3 to af6495c Compare January 27, 2026 17:34
@gmarciani gmarciani force-pushed the wip/mgiacomo/3150/clustermgtd-metrics-0126-1 branch from af6495c to 165c3c1 Compare January 27, 2026 17:44
Copy link
Contributor

@himani2411 himani2411 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@gmarciani gmarciani merged commit ebb742f into aws:develop Jan 27, 2026
12 checks passed
@gmarciani gmarciani deleted the wip/mgiacomo/3150/clustermgtd-metrics-0126-1 branch January 27, 2026 22:54
@gmarciani gmarciani restored the wip/mgiacomo/3150/clustermgtd-metrics-0126-1 branch January 27, 2026 22:54
@gmarciani gmarciani deleted the wip/mgiacomo/3150/clustermgtd-metrics-0126-1 branch January 28, 2026 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants