fix(localdns): reduce startup polling interval#8534
Conversation
There was a problem hiding this comment.
Pull request overview
Reduces localdns startup polling intervals from 1s to 0.1s so the node bootstrap path waits less when CoreDNS comes up quickly, while preserving the existing wall-clock timeout semantics. The PID-file wait now counts in tenths of a second (10× the timeout), and the readiness call-site bumps maxattempts from 60 to 600 to match the 60s timeout at the finer interval.
Changes:
- Introduce
LOCALDNS_POLL_INTERVAL_SECONDS=0.1and reworkstart_localdnsto use tenth-second counters soSTART_LOCALDNS_TIMEOUT=10still yields a ~10s ceiling. - Switch
wait_for_localdns_ready's sleep to 0.1s and update the production call-site towait_for_localdns_ready 600 60. - Add ShellSpec tests asserting both loops sleep with
0.1.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| parts/linux/cloud-init/artifacts/localdns.sh | Faster polling in start_localdns/wait_for_localdns_ready plus updated max-attempt arg at the call-site. |
| spec/parts/linux/cloud-init/artifacts/localdns_spec.sh | New ShellSpec cases asserting the 0.1s sleep interval in both helper loops. |
2a76993 to
ca945a9
Compare
|
Two things on the polling-interval drop:
Spec tests look good for verifying the interval value gets passed; consider adding one that asserts the resulting wall-clock timeout (mocked clock) so a future regression of the interval/attempts coupling fails the test rather than slipping through. |
|
@yewmsft Addressed in the follow-up commits on this branch. The polling constants are now split by purpose ( |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
parts/linux/cloud-init/artifacts/localdns.sh:479
- This error message says the readiness poll interval is invalid, but calculate_max_poll_attempts can also fail due to an invalid timeout_duration value (or missing awk). Consider broadening the wording and/or logging the timeout + interval values to aid diagnosis.
max_attempts=$(calculate_max_poll_attempts "${timeout_duration}" "${LOCALDNS_READY_POLL_INTERVAL_SECONDS}") || {
echo "Invalid localdns readiness poll interval configuration."
return 1
| awk -v timeout="${timeout_duration}" -v interval="${poll_interval_seconds}" ' | ||
| BEGIN { | ||
| if (timeout < 0 || interval <= 0) { | ||
| exit 1 | ||
| } |
| local attempts=0 | ||
| local max_attempts | ||
| max_attempts=$(calculate_max_poll_attempts "${START_LOCALDNS_TIMEOUT}" "${LOCALDNS_PID_POLL_INTERVAL_SECONDS}") || { | ||
| echo "Invalid localdns PID poll interval configuration." |
Summary
Testing