Skip to content

Tune AnarchyRunner scaling for burst workloads#188

Open
marcosmamorim wants to merge 1 commit into
mainfrom
anarchyrunner-tunning
Open

Tune AnarchyRunner scaling for burst workloads#188
marcosmamorim wants to merge 1 commit into
mainfrom
anarchyrunner-tunning

Conversation

@marcosmamorim

Copy link
Copy Markdown
Contributor

Reduce to reach maxReplicas faster under load:
scaleUpDelay (5m->30s)
scaleUpThreshold (20->5)
scalingCheckInterval (1m->15s)
Increase consecutiveFailureLimit (10→20) to reduce pod churn during mixed-failure workloads.

Reduce to reach maxReplicas faster under load:
  scaleUpDelay (5m->30s)
  scaleUpThreshold (20->5)
  scalingCheckInterval (1m->15s)
Increase consecutiveFailureLimit (10→20) to reduce pod churn during mixed-failure workloads.
Comment thread helm/values.yaml
runners:
default:
consecutiveFailureLimit: 10
consecutiveFailureLimit: 20

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about increasing the consecutive failure limit... I'll still approve, but not sure this is something that will actually help. We've seen cases where runner pods get into weird states and start failing every run and the faster those are replaced the better... but we haven't looked deeply into that or how often it happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants