Skip to content

INTEROP-8976, INTEROP-8979: Token expiry alerts and per-rule Slack notifications#274

Draft
amp-rh wants to merge 5 commits intoRedHatQE:mainfrom
amp-rh:interop-8976/token-rotation-alerts
Draft

INTEROP-8976, INTEROP-8979: Token expiry alerts and per-rule Slack notifications#274
amp-rh wants to merge 5 commits intoRedHatQE:mainfrom
amp-rh:interop-8976/token-rotation-alerts

Conversation

@amp-rh
Copy link
Copy Markdown
Collaborator

@amp-rh amp-rh commented Apr 7, 2026

Summary

Two related features unified under a shared Slack notification infrastructure.

Token expiry alerts (INTEROP-8976)

Jira Cloud does not support programmatic API token rotation. This PR adds tooling for scheduled expiry monitoring with manual rotation:

  • check-token-expiry.py: Reads expires_at from the Vault KV secret and posts Slack alerts at 30/14/7/3/1 days before expiry. Exit code 2 if expired. Uses SlackClient.post_webhook() from the shared Slack infrastructure.
  • prow-periodic.yaml: Reference Prow periodic job config (daily at 08:00 UTC). To be merged into openshift/release once approved.
  • docs/vault-schema.md: Full Vault KV secret schema documenting all 10 fields (5 managed by rotation, 5 that must not be modified). Warns against vault kv put in favor of vault kv patch.
  • docs/rotation-runbook.md: Step-by-step manual rotation procedure with staging verification, rollback, and troubleshooting. Tested against stage-redhat.atlassian.net.

Per-rule Slack notifications (INTEROP-8979)

Adds optional slack_channel field to firewatch config rules. When set, firewatch sends a Slack notification after creating or updating a Jira issue. Empty or absent value skips notification. Supports !default to read from $FIREWATCH_DEFAULT_SLACK_CHANNEL.

Changed files:

  • src/objects/rule.py: slack_channel field with !default / env var pattern
  • src/objects/slack_base.py: post_webhook() static method for Slack incoming webhooks
  • src/commands/report.py: --slack-bot-token and --slack-webhook-url CLI options
  • src/objects/configuration.py: Stores Slack credentials
  • src/report/report.py: _notify_slack(), _slack_new_issue(), _slack_duplicate(), _slack_success() hooks

Config example:

{
  "failure_rules": [
    {
      "step": "install",
      "failure_type": "pod_failure",
      "classification": "Infrastructure",
      "jira_project": "LPINTEROP",
      "slack_channel": "#ocp-ci-firewatch-tool"
    }
  ]
}

Shared infrastructure

Both features post to Slack via SlackClient.post_webhook(). The token-expiry script previously had its own slack_post() function; it now uses the shared method. Errors propagate to callers: the expiry script surfaces them as exit codes while Report notifications log and continue.

Vault schema change

The Vault secret at kv/selfservice/firewatch-tool/jira-credentials contains 10 fields. Five are managed during token rotation:

Field Purpose
email Jira account email (firewatch@redhat.com)
access_token Production Jira API token
access_token_msi Copy of access_token (kept in sync)
access_token_stage Staging-only token (rotated independently)
expires_at ISO 8601 date for alert script (firewatch itself does not read this)

The remaining 5 fields (account credentials, secretsync config) must not be modified during rotation. Always use vault kv patch, never vault kv put, to avoid destroying these fields.

Doc updates from staging walkthrough

The rotation runbook and vault schema were validated against the Jira staging environment (stage-redhat.atlassian.net). Key fixes applied:

  • Changed vault kv put to vault kv patch (a put would destroy 7 undocumented fields)
  • Documented all 10 actual Vault fields, not just 3
  • Added staging verification step before CI
  • Noted that API tokens are account-scoped (work across all Atlassian Cloud instances)
  • Specified vault login -method=oidc for Vault authentication
  • Added access_token_msi to the rotation patch command

Test plan

  • Review check-token-expiry.py for correctness (Vault read, date parsing, shared Slack webhook)
  • Verify prow-periodic.yaml is valid Prow config
  • Review docs/vault-schema.md (all 10 fields documented, patch not put)
  • Review docs/rotation-runbook.md (staging verification, OIDC auth, patch command)
  • Review Slack notification hooks in report.py
  • Review slack_channel field parsing in rule.py
  • After merge: deploy Prow periodic job, configure slack_channel in team configs

Relates: https://issues.redhat.com/browse/INTEROP-8976
Relates: https://issues.redhat.com/browse/INTEROP-8979

Jira Cloud cannot programmatically create or rotate API tokens, so
fully automated rotation is not possible with basic auth. Instead,
add tooling for scheduled expiry monitoring with manual rotation:

- check-token-expiry.py: standalone script that reads expires_at from
  Vault and posts Slack alerts at 30/14/7/3/1 days before expiry
- prow-periodic.yaml: reference Prow periodic job config (daily 08:00 UTC)
- docs/vault-schema.md: Vault KV secret schema with new expires_at field
- docs/rotation-runbook.md: step-by-step manual rotation procedure

Relates: https://issues.redhat.com/browse/INTEROP-8976
Made-with: Cursor
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 7, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

amp-rh added 3 commits April 8, 2026 11:40
Add optional slack_channel field to firewatch config rules. When set,
firewatch sends a Slack notification after creating a new Jira issue,
updating a duplicate, or filing a success story. Absent or empty value
skips notification. Supports !default to read from env var.

Changes:
- Rule: add _get_slack_channel with !default/$FIREWATCH_DEFAULT_SLACK_CHANNEL
- SlackClient: add post_webhook static method for webhook-based posting
- Configuration: accept slack_bot_token and slack_webhook_url params
- Report: send Slack after issue creation, duplicate comment, success
- CLI: add --slack-bot-token and --slack-webhook-url to report command

Relates: https://issues.redhat.com/browse/INTEROP-8979
Made-with: Cursor
check-token-expiry.py now uses SlackClient.post_webhook() instead of
a local slack_post() function. post_webhook() propagates errors so
callers can decide how to handle failures: the token-expiry script
surfaces them as exit codes while Report._notify_slack() logs and
continues.

Made-with: Cursor
- Add types-requests to mypy pre-commit additional_dependencies
- Fix bare URLs and emphasis-as-heading in rotation-runbook.md and
  vault-schema.md
- Accept ruff-format changes and add mypy type: ignore on dict lookups

Made-with: Cursor
@amp-rh amp-rh changed the title INTEROP-8976: Add Jira API token expiry alerts and rotation runbook INTEROP-8976, INTEROP-8979: Token expiry alerts and per-rule Slack notifications Apr 8, 2026
Findings from staging walkthrough:
- vault kv put destroys 7 undocumented fields (secretsync config,
  account credentials, staging token); switch to vault kv patch
- Document all 10 actual Vault secret fields, not just 3
- Add staging verification step (stage-redhat.atlassian.net)
- Note that API tokens are account-scoped, not instance-scoped
- Specify vault login -method=oidc for authentication
- Include access_token_msi in rotation patch command

Made-with: Cursor
@amp-rh amp-rh marked this pull request as ready for review April 10, 2026 14:52
@amp-rh amp-rh marked this pull request as draft April 10, 2026 14:56
@amp-rh
Copy link
Copy Markdown
Collaborator Author

amp-rh commented Apr 13, 2026

/test all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant