summary: better handle job cancellation [WIP]#155
Conversation
move logic about jobs failure or success
inside the summary job.
Sometimes if cacnellation happend too late,
the summary will still run even that we stated
```
if: ${{ !cancelled() }}
```
this patch handles those rare cases and votes=0
we could skip vote at all, but I like this for now
so we can debug better for near future.
Signed-off-by: Boris Glimcher <Boris.Glimcher@emc.com>
| run: | | ||
| vote="-1" | ||
| message="Build failed. " | ||
|
|
||
| # Combine results from jobs and autorun_post.py | ||
| if [[ "${{ inputs.result }}" == "success" && \ | ||
| "${{ needs.merge_outputs.outputs.result }}" == "success" ]]; then | ||
| set -x | ||
| if [[ "${{ inputs.result_common }}" == "cancelled" || "${{ inputs.result_hpe }}" == "cancelled" ]]; then | ||
| vote="0" |
There was a problem hiding this comment.
For cases where the common-tests were cancelled, leaving a message on the patch would be confusing for the submitter. Please remember that the actual result could be posted much later, leaving folks unsure how to proceed with such message.
There was a problem hiding this comment.
I think it would be nice to know that job was cancelled. we can post more actionable message here,
for example due to timeout or due to a push of new version of the patch for clarity.
at least for now posting always something could be useful for debugging and then we can re-consider to lower the verbosity once we confident with the long term of this or it ends up being too noisy...
There was a problem hiding this comment.
We already do not post anything for any of cancel conditions in https://github.com/spdk/spdk-ci/blob/main/.github/scripts/parse_gerrit_webhook.sh.
There is nothing actionable that a patch submitter can or should do. All GH cancellations are just result of patch state during handling of Gerrit event.
| if [[ "${{ inputs.result }}" == "success" && \ | ||
| "${{ needs.merge_outputs.outputs.result }}" == "success" ]]; then | ||
| set -x | ||
| if [[ "${{ inputs.result_common }}" == "cancelled" || "${{ inputs.result_hpe }}" == "cancelled" ]]; then |
There was a problem hiding this comment.
This condition will evaluate true when a job within either workflow times out. For such case we still need to post a failure.
I'm not sure how to differ the cancelled state due to concurrency and timeout. The autorun and pkgdep jobs would have to change to somehow track that.
There was a problem hiding this comment.
we can use REST API according to documentation to determine specifically timeouts
| # Summary should be last job, please make sure to include yours in 'needs'. | ||
| summary: | ||
| name: Job summary | ||
| if: ${{ !cancelled() }} |
There was a problem hiding this comment.
Would adding check for needs.common.result != 'cancelled' work here ?
We know that !cancelled() works for patch_set_status which consistently cancels the gerrit-webhook-handler.yml workflow, skipping summary. I'm guessing that due to common and hpe are reusable workflows, they are not caught with the same condition.
[note the timeout concern below]
move logic about jobs failure or success
inside the summary job.
Sometimes if cacnellation happend too late,
the summary will still run even that we stated
this patch handles those rare cases and votes=0
we could skip vote at all, but I like this for now so we can debug better for near future.