Skip to content

Fixes #525 - Add checks to health subcommand#521

Merged
ehelms merged 1 commit into
theforeman:masterfrom
qcjames53:master
Jun 11, 2026
Merged

Fixes #525 - Add checks to health subcommand#521
ehelms merged 1 commit into
theforeman:masterfrom
qcjames53:master

Conversation

@qcjames53

@qcjames53 qcjames53 commented May 26, 2026

Copy link
Copy Markdown
Contributor

Why are you introducing these changes? (Problem description, related links). What are the changes introduced in this pull request?

This PR expands the existing foremanctl health subcommand to reach equivalency with the existing foreman-maintain health check tool.

The following tests are run on a default health check in foreman-maintain:

  • services_up
  • server_ping
  • foreman_tasks/not_paused
  • foreman/facts_names
  • foreman_proxy/check_tftp_storage
  • foreman_proxy/verify_dhcp_config_syntax
  • puppet/verify_no_empty_cacert_requests
  • system_registration

I've implemented these five checks:

  • health_services_up - Checks all currently implemented quadlet services are running (with different configs supported)
  • health_server_ping - Can we ping the foreman server?
  • health_foreman_tasks_not_paused - Are there paused foreman tasks?
  • health_facts_count - Does a host exist with more facts than health_max_fact_values_per_host?
  • check_duplicate_permissions - Are permissions duplicated.

I've skipped these checks (reasoning provided):

  • foreman_proxy/check_tftp_storage - No foreman proxy support yet.
  • foreman_proxy/verify_dhcp_config_syntax - Ditto.
  • puppet/verify_no_empty_cacert_requests - Puppet plugins for foreman and smart proxy are not yet compatible with foremanctl; work is underway to add support in the future.
  • system_registration - This self-registration concern isn't valid for an upstream centos9 container, to the best of my knowledge. Please double check this reasoning :)

How to test this pull request

Steps to reproduce:

  1. Spin up an environment and register a host.
  2. Run foremanctl health and verify all checks pass.
  3. vagrant ssh quadlet and manually stop a service. Rerun the check and confirm health_services_up fails. Restart the service.
  4. Close the server firewall and confirm health_server_ping fails. Change the config back afterwards.
  5. Start a sync (or similar) and pause the job. confirm health_foreman_tasks_not_paused fails.
  6. Lower the health_max_fact_values_per_host value in src/vars/health.yml to below the number of facts of your host and confirm health_facts_count fails.
  7. Run the unit tests and confirm they work. ./forge test first to generate needed files then run pytest tests/health_test.py.
  8. Create duplicate permissions in the foreman db and verify the health check fails. Run with --fix and see that the duplicates are cleared.

Checklist

  • Tests added/updated (if applicable)
  • Documentation updated (if applicable) <- Didn't see anything relevant to modify

@qcjames53 qcjames53 force-pushed the master branch 3 times, most recently from c9ad8d0 to 7da240c Compare May 26, 2026 15:17
@qcjames53

Copy link
Copy Markdown
Contributor Author

None of these test failures seem related. FWIW I cannot get the full unit test suite to run locally without issue either. My unit test passes, though :)

@ehelms

ehelms commented May 26, 2026

Copy link
Copy Markdown
Member

@qcjames53 Before I review in-depth, can you review #431? This is a previous contribution that has made overlapping progress and addresses some of what would be questions I have for this PR. Ideally, we would build on #431

@qcjames53

Copy link
Copy Markdown
Contributor Author

Of course, Eric. #431 wasn't on our team's radar at all so I'll need some time to rectify. I'll reach out to make sure we're not duplicating more effort than this.

@qcjames53 qcjames53 changed the title Add health subcommand to check health of a deployed system Fixes #525 - Add checks to health subcommand May 27, 2026
@qcjames53

Copy link
Copy Markdown
Contributor Author

This PR should now be up to par. I rebased on master and added a few additional checks to the existing foremanctl health subcommand (as well as a unit test). Let me know if you think we need negative-case testing for this.

@qcjames53

Copy link
Copy Markdown
Contributor Author

The newest push adds a flag to ignore errored out foreman tasks (for automation mainly). I've also removed health checks from the GH automation since the unit test checks it instead.

@qcjames53 qcjames53 force-pushed the master branch 3 times, most recently from 4bee8e3 to 8aeb0a1 Compare May 28, 2026 15:35
@qcjames53

Copy link
Copy Markdown
Contributor Author

The latest force push handles db auth correctly for external dbs.

Comment thread src/playbooks/health/metadata.obsah.yaml Outdated
Comment thread src/vars/health.yml Outdated
Comment thread src/roles/check_services/tasks/main.yaml Outdated
Comment thread src/roles/check_foreman_tasks/tasks/main.yaml
@qcjames53

Copy link
Copy Markdown
Contributor Author

I just pushed some changes to add --fix for duplicate permissions to help out Aneta's docs PR. We'd have to drop the sections on fixing duplicate permissions otherwise and it was an easy add.

New unit tests added to cover this and --ignore-foreman-tasks-check.

@qcjames53

Copy link
Copy Markdown
Contributor Author

Newest push should take care of linting.

Comment thread src/roles/check_duplicate_permissions/tasks/main.yaml
Comment thread src/roles/check_services/tasks/main.yaml Outdated
Comment thread src/roles/check_host_facts_count/tasks/main.yaml Outdated
@qcjames53 qcjames53 force-pushed the master branch 2 times, most recently from 54a44d2 to 1334070 Compare June 5, 2026 16:30
@qcjames53

Copy link
Copy Markdown
Contributor Author

@ehelms @ekohl I just pushed changes that should address both of your concerns. I've feature gated check_foreman_tasks and check_foreman_api properly and added unit testing. I've also changed check_host_facts to return the IDs of all hosts that exceed the threshold. I've updated documentation where neccessary.

@qcjames53 qcjames53 force-pushed the master branch 4 times, most recently from 665eaa9 to 8b44806 Compare June 5, 2026 20:53
Comment thread AGENTS.md
Comment on lines +171 to +188
Developer docs:
- [Check roles](docs/developer/checks.md) - How to integrate check roles; update as checks are created/modified
- [Container Image Builds](docs/developer/container-image-builds.md) - Info on image naming, registries
- [Deployment Architecture](docs/developer/deployment.md)
- [Development Environment](docs/developer/development-environment.md) - Dev environment setup with Foreman from source
- [How to Add a Feature](docs/developer/how-to-add-a-feature.md) - End-to-end feature development
- [Playbooks and Roles](docs/developer/playbooks-and-roles.md) - Playbook structure, naming, metadata
- [Testing](docs/developer/testing.md) - Additional info on test infrastructure, fixtures, patterns

User docs:
- [Certificates](docs/user/certificates.md) - Overview of certificate sources
- [Parameters](docs/user/parameters.md) - Map of Foreman installation parameters; update as parameters are created/modified

- [CONTRIBUTING](CONTRIBUTING.md) - How to contribute
- [Development](DEVELOPMENT.md) - Foremanctl development overview
- [IOP](docs/iop.md) - Overview of insights on premise
- [Migration Guide](docs/migration-guide.md) - Migrating from foreman-installer to foremanctl
- [Release](RELEASE.md) - Info on Foremanctl releases No newline at end of file

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but perhaps it's easier to add mkdocs and instruct the agent on how to use that instead. Just like https://github.com/theforeman/foreman-infra/tree/master/docs generates https://theforeman.github.io/foreman-infra/.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a big upgrade. I didn't want to modify the docs TOO much for a PR where sweeping changes are obviously out of scope.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion: building and trying to maintain a static site as the source of truth introduces overhead. Instead of reading locally, agents now have to query over the network. If you are making changes in a PR to any documentation and trying to consume those updates, you have to generate a static site. Additionally, agents then also have to parse HTML to find the relevant bits rather than just quickly parsing local markdown.

@ehelms

ehelms commented Jun 8, 2026

Copy link
Copy Markdown
Member

@qcjames53 a reboot should fix the iop tests

@qcjames53

Copy link
Copy Markdown
Contributor Author

Thanks Eric. I rebased and pushed to rerun CI. @ekohl would you like to review this PR again or are you okay with current changes?

@ehelms ehelms mentioned this pull request Jun 9, 2026
2 tasks
@qcjames53 qcjames53 requested a review from ehelms June 9, 2026 17:43
@qcjames53

qcjames53 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Ewoud uncovered https://projects.theforeman.org/issues/38465 which can cause duplicate permissions. Around 4:30 today I will re-add the duplicate permissions check for this release and include instructions in the markdown file to remove this check once the above issue is addressed.

Edit: I'm not planning on adding "--fix" since that's silly to carry over. It will be fail-only and the onus is on the user to clean up the duplicate permissions. Let me know if that's not the ideal approach.

@qcjames53

Copy link
Copy Markdown
Contributor Author

I've created #551 to track migrating these db queries to authenticated API calls.

I've also re-added the check_duplicate_permissions role (error detection only, no --fix) and documented that it needs to stay until https://projects.theforeman.org/issues/38465 is addressed.

@qcjames53 qcjames53 force-pushed the master branch 2 times, most recently from 8ab3582 to b6d7987 Compare June 10, 2026 18:49
@qcjames53

Copy link
Copy Markdown
Contributor Author

I force pushed to re-run automation (previous failure was unrelated to my changes).

@ehelms ehelms left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EL10 failure is unrelated, based on other tests I am going to merge.

@ehelms ehelms merged commit 5454b1a into theforeman:master Jun 11, 2026
38 of 47 checks passed
- "../../vars/flavors/{{ flavor }}.yml"
- "../../vars/base.yaml"
- "../../vars/database.yml"
- "../../vars/health.yml"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file does not exist.

changed_when: false

- name: Check for duplicate permissions
when: not (health_fix | default(false) | bool)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is health_fix doing here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants