Skip to content

fix(localdns): reduce startup polling interval#8534

Open
jingwenw15 wants to merge 5 commits into
mainfrom
jingwenwu/localdns-sleep-interval-only
Open

fix(localdns): reduce startup polling interval#8534
jingwenw15 wants to merge 5 commits into
mainfrom
jingwenwu/localdns-sleep-interval-only

Conversation

@jingwenw15
Copy link
Copy Markdown
Member

@jingwenw15 jingwenw15 commented May 19, 2026

Summary

  • reduce localdns PID-file polling from 1s to 0.1s while preserving the existing timeout semantics
  • reduce localdns readiness polling from 1s to 0.1s and scale the production call-site attempts accordingly
  • add focused ShellSpec coverage for the 0.1-second polling interval in both helper loops

Testing

  • shellspec --shell bash spec/parts/linux/cloud-init/artifacts/localdns_spec.sh:668
  • shellspec --shell bash spec/parts/linux/cloud-init/artifacts/localdns_spec.sh:733
Screenshot 2026-05-20 at 10 17 15 AM Reduces startup time by around 0.6s on average.

Copilot AI review requested due to automatic review settings May 19, 2026 01:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reduces localdns startup polling intervals from 1s to 0.1s so the node bootstrap path waits less when CoreDNS comes up quickly, while preserving the existing wall-clock timeout semantics. The PID-file wait now counts in tenths of a second (10× the timeout), and the readiness call-site bumps maxattempts from 60 to 600 to match the 60s timeout at the finer interval.

Changes:

  • Introduce LOCALDNS_POLL_INTERVAL_SECONDS=0.1 and rework start_localdns to use tenth-second counters so START_LOCALDNS_TIMEOUT=10 still yields a ~10s ceiling.
  • Switch wait_for_localdns_ready's sleep to 0.1s and update the production call-site to wait_for_localdns_ready 600 60.
  • Add ShellSpec tests asserting both loops sleep with 0.1.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
parts/linux/cloud-init/artifacts/localdns.sh Faster polling in start_localdns/wait_for_localdns_ready plus updated max-attempt arg at the call-site.
spec/parts/linux/cloud-init/artifacts/localdns_spec.sh New ShellSpec cases asserting the 0.1s sleep interval in both helper loops.

@jingwenw15 jingwenw15 force-pushed the jingwenwu/localdns-sleep-interval-only branch from 2a76993 to ca945a9 Compare May 19, 2026 23:57
@yewmsft
Copy link
Copy Markdown
Member

yewmsft commented May 20, 2026

Two things on the polling-interval drop:

  1. Hardcoded constant in a shared file. LOCALDNS_POLL_INTERVAL_SECONDS=0.1 sits at the top of localdns.sh and is used by both start_localdns and wait_for_localdns_ready. If anyone later tunes one site, they'll silently retune the other. Consider giving each call site its own constant (LOCALDNS_PID_POLL_INTERVAL, LOCALDNS_READY_POLL_INTERVAL) so future changes are explicit about scope.

  2. wait_for_localdns_ready 600 60 is now coupled to the interval. The first arg is MAX_ATTEMPTS, not seconds — at 0.1s/attempt × 600 attempts you get the same 60s wall clock the old 60 60 invocation produced, but only by coincidence. If LOCALDNS_POLL_INTERVAL_SECONDS changes again, this call site silently breaks the timeout. Either compute attempts from a wall-clock budget (attempts = budget_seconds / interval) or rename the arg / refactor to take a duration.

Spec tests look good for verifying the interval value gets passed; consider adding one that asserts the resulting wall-clock timeout (mocked clock) so a future regression of the interval/attempts coupling fails the test rather than slipping through.

Copilot AI review requested due to automatic review settings May 20, 2026 23:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread parts/linux/cloud-init/artifacts/localdns.sh
@jingwenw15
Copy link
Copy Markdown
Member Author

@yewmsft Addressed in the follow-up commits on this branch. The polling constants are now split by purpose (LOCALDNS_PID_POLL_INTERVAL_SECONDS and LOCALDNS_READY_POLL_INTERVAL_SECONDS), readiness now takes a real duration via LOCALDNS_READY_TIMEOUT_SECONDS instead of a hand-derived 600 attempt count, and the ShellSpec coverage now includes mocked-clock timeout behavior. I also added a derived max-attempt bound for readiness so the loop still terminates if the system clock does not advance as expected during early boot.

Copilot AI review requested due to automatic review settings May 21, 2026 02:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

parts/linux/cloud-init/artifacts/localdns.sh:479

  • This error message says the readiness poll interval is invalid, but calculate_max_poll_attempts can also fail due to an invalid timeout_duration value (or missing awk). Consider broadening the wording and/or logging the timeout + interval values to aid diagnosis.
    max_attempts=$(calculate_max_poll_attempts "${timeout_duration}" "${LOCALDNS_READY_POLL_INTERVAL_SECONDS}") || {
        echo "Invalid localdns readiness poll interval configuration."
        return 1

Comment on lines +77 to +81
awk -v timeout="${timeout_duration}" -v interval="${poll_interval_seconds}" '
BEGIN {
if (timeout < 0 || interval <= 0) {
exit 1
}
local attempts=0
local max_attempts
max_attempts=$(calculate_max_poll_attempts "${START_LOCALDNS_TIMEOUT}" "${LOCALDNS_PID_POLL_INTERVAL_SECONDS}") || {
echo "Invalid localdns PID poll interval configuration."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants