Skip to content

feat: stage duration tracking for progress estimation#197

Closed
RyanD66 wants to merge 1 commit intosethdford:mainfrom
RyanD66:feature/stage-duration-tracking
Closed

feat: stage duration tracking for progress estimation#197
RyanD66 wants to merge 1 commit intosethdford:mainfrom
RyanD66:feature/stage-duration-tracking

Conversation

@RyanD66
Copy link
Copy Markdown

@RyanD66 RyanD66 commented Mar 2, 2026

Summary

Add stage duration history tracking for pipeline progress estimation as requested in issue #180.

Changes

  • Added scripts/lib/stage-duration.sh with functions for:

    • load_stage_durations() - Load historical stage durations from JSON file
    • save_stage_duration() - Save stage duration to history (keeps last 20)
    • get_avg_stage_duration() - Get average duration for a stage
    • estimate_remaining_time() - Estimate remaining time based on pending stages
    • build_progress_display() - Format progress string like "Stage 3/12 (build) — ~32min remaining, $2.47 spent"
  • Stage durations stored in .claude/stage-durations.json

Acceptance Criteria

  • Store stage duration history in .claude/stage-durations.json
  • Functions for ETA estimation based on historical data
  • Progress display formatting

Note: Full integration with sw-pipeline.sh and sw-status.sh will require sourcing this library in those scripts.

Summary by CodeRabbit

  • New Features
    • Added pipeline stage duration history tracking to estimate remaining completion time based on historical performance
    • Progress display now shows current stage, estimated remaining time, and accumulated pipeline costs
    • System automatically learns from past runs to improve timing accuracy and provide better estimates

- Add stage-durations.json storage for historical stage durations
- Add load_stage_durations, save_stage_duration, get_avg_stage_duration functions
- Add estimate_remaining_time for ETA calculation
- Add build_progress_display for progress string formatting

Addresses issue #180: Pipeline Execution Progress Estimation
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 2, 2026

📝 Walkthrough

Walkthrough

A new Bash script introducing functions to track and utilize historical stage durations for estimating pipeline progress. Includes JSON-based persistence, duration averaging, remaining time calculation, and formatted progress display output.

Changes

Cohort / File(s) Summary
Stage Duration Tracking
scripts/lib/stage-duration.sh
New library script providing functions to load/save stage duration history from JSON, compute average durations per stage, estimate remaining pipeline time, format durations into human-readable strings, and build progress display strings combining stage info, time estimates, and cost tracking.

Sequence Diagram

sequenceDiagram
    participant PM as Pipeline Manager
    participant SDS as Stage Duration Script
    participant FS as File System
    participant Display as Progress Display

    PM->>SDS: load_stage_durations()
    SDS->>FS: read stage-durations.json
    FS-->>SDS: duration history
    SDS-->>PM: stage history object

    PM->>PM: execute current stage
    
    PM->>SDS: save_stage_duration(stage_id, duration)
    SDS->>FS: update stage-durations.json
    FS-->>SDS: ack
    
    PM->>SDS: get_avg_stage_duration(stage_id)
    SDS-->>PM: average duration
    
    PM->>SDS: estimate_remaining_time(current, completed, total)
    SDS->>SDS: sum averages of remaining stages
    SDS-->>PM: estimated time remaining
    
    PM->>SDS: build_progress_display(...)
    SDS-->>PM: formatted progress string
    
    PM->>Display: show progress string
    Display-->>PM: ack
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Poem

🐰 Hop along, dear pipeline track,
With durations saved and brought back,
History whispers how long stages stay,
Estimates guide us through the fray,
Progress blooms in seconds' display! 📊

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The description covers the summary, changes, and acceptance criteria, but lacks sections for Test Plan and Shipwright Standards Checklist required by the template. Add Test Plan section with testing details and Shipwright Standards Checklist to verify Bash 3.2 compatibility, jq usage, atomic writes, and error handling compliance.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main addition: a new Bash script for tracking stage durations to enable pipeline progress estimation.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
scripts/lib/stage-duration.sh (1)

30-34: Unused jq variable $s and potential race condition.

  1. The variable $s defined on line 31 is never used in the expression.
  2. Concurrent pipeline runs could cause lost updates due to read-modify-write without locking.
♻️ Simplified jq expression
-    stage_history=$(echo "$current" | jq --arg stage "$stage_id" --arg dur "$duration_sec" \
-        '($stage | if . == null then {} else . end) as $s | 
-         if has($stage) then .[$stage] += [$dur | tonumber] | .[$stage] = .[$stage][-20:] 
-         else .[$stage] = [$dur | tonumber] 
-         end' 2>/dev/null || echo "{}")
+    stage_history=$(echo "$current" | jq --arg stage "$stage_id" --argjson dur "$duration_sec" \
+        '.[$stage] = ((.[$stage] // []) + [$dur])[-20:]' 2>/dev/null || echo "{}")

For the race condition, consider using a lockfile if concurrent executions are expected.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lib/stage-duration.sh` around lines 30 - 34, The jq expression that
builds stage_history is declaring an unused variable $s and is fragile to
concurrent runs; replace the complex expression with a simpler one that omits
the unused ($s) binding and performs: if has($stage) then append the numeric
duration and trim to the last 20 entries else create a new array with the
numeric duration (use $stage, $dur/$duration_sec and $current to locate this
logic), and to prevent lost updates wrap the read-modify-write around a
lightweight lock (e.g., a lockfile around the code that reads $current and
writes stage_history) so concurrent pipeline runs cannot clobber each other.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/lib/stage-duration.sh`:
- Around line 46-50: The jq expression that computes avg uses ($stage | length)
which measures the stage_id string length instead of the durations array; update
the jq filter in the avg assignment (where current, stage_id and avg are
referenced) to check the array length with (.[$stage] | length) — e.g., change
the conditional to 'if has($stage) and (.[$stage] | length) > 0 then ...' so the
reduce and division only run when the durations array exists and is non-empty.
- Around line 56-94: The function estimate_remaining_time currently misparses
stage IDs and completed_stages, uses an unused parameter, and relies on bc;
update it as follows: remove the unused parameter total_stages (or stop
accepting it) and use jq -r to produce raw stage IDs (replace jq '.stages[] |
.id' with jq -r '.stages[].id' and ensure the default all_stages is an empty
list when PIPELINE_CONFIG is absent); parse completed_stages (which is a JSON
array string) by piping it to jq -r '.[]' (not treating it as a heredoc) to
iterate completed IDs correctly; and replace bc with a portable float-summing
approach (e.g., use awk to accumulate get_avg_stage_duration results) when
computing total_avg; refer to the function name estimate_remaining_time and the
helper get_avg_stage_duration to find and update the logic.
- Around line 124-131: The call to estimate_remaining_time currently passes a
hardcoded "[]" causing incorrect remaining time because completed stages should
be derived from stage_index; change the logic so
estimate_remaining_time("$current_stage", "$stage_index", "$total_stages") (or
the equivalent API your estimate_remaining_time expects) is used rather than the
literal "[]", and keep using format_duration_short for formatting; also ensure
get_stage_description is available by adding a requirement at the top of the
script (or sourcing) so pipeline-state.sh is sourced before using
get_stage_description, and add a brief guard/comment stating that
pipeline-state.sh must be sourced first to satisfy the dependency.
- Around line 104-113: The comparisons fail if secs is a float; truncate secs to
an integer before doing the -lt tests and arithmetic. Add a local integer like
secs_int derived from secs (e.g., secs_int=${secs%%.*} with a fallback to 0 if
empty) and replace all uses of secs in the if/elif/else tests and subsequent
calculations (mins/hours arithmetic and echo formatting) with secs_int so
integer-only comparisons and arithmetic are used in the block.

---

Nitpick comments:
In `@scripts/lib/stage-duration.sh`:
- Around line 30-34: The jq expression that builds stage_history is declaring an
unused variable $s and is fragile to concurrent runs; replace the complex
expression with a simpler one that omits the unused ($s) binding and performs:
if has($stage) then append the numeric duration and trim to the last 20 entries
else create a new array with the numeric duration (use $stage,
$dur/$duration_sec and $current to locate this logic), and to prevent lost
updates wrap the read-modify-write around a lightweight lock (e.g., a lockfile
around the code that reads $current and writes stage_history) so concurrent
pipeline runs cannot clobber each other.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d9f1cb8 and 4832dc0.

📒 Files selected for processing (1)
  • scripts/lib/stage-duration.sh

Comment on lines +46 to +50
avg=$(echo "$current" | jq --arg stage "$stage_id" \
'if has($stage) and ($stage | length) > 0 then
(reduce .[$stage][] as $item (0; . + $item)) / (.[$stage] | length)
else null
end' 2>/dev/null || echo "null")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Bug: Checking length of stage ID string instead of duration array.

($stage | length) evaluates the length of the stage ID string, not the duration array. This should be .[$stage] | length to check if the array has entries.

🐛 Proposed fix
     avg=$(echo "$current" | jq --arg stage "$stage_id" \
-        'if has($stage) and ($stage | length) > 0 then 
+        'if has($stage) and (.[$stage] | length) > 0 then 
             (reduce .[$stage][] as $item (0; . + $item)) / (.[$stage] | length) 
          else null 
          end' 2>/dev/null || echo "null")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
avg=$(echo "$current" | jq --arg stage "$stage_id" \
'if has($stage) and ($stage | length) > 0 then
(reduce .[$stage][] as $item (0; . + $item)) / (.[$stage] | length)
else null
end' 2>/dev/null || echo "null")
avg=$(echo "$current" | jq --arg stage "$stage_id" \
'if has($stage) and (.[$stage] | length) > 0 then
(reduce .[$stage][] as $item (0; . + $item)) / (.[$stage] | length)
else null
end' 2>/dev/null || echo "null")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lib/stage-duration.sh` around lines 46 - 50, The jq expression that
computes avg uses ($stage | length) which measures the stage_id string length
instead of the durations array; update the jq filter in the avg assignment
(where current, stage_id and avg are referenced) to check the array length with
(.[$stage] | length) — e.g., change the conditional to 'if has($stage) and
(.[$stage] | length) > 0 then ...' so the reduce and division only run when the
durations array exists and is non-empty.

Comment on lines +56 to +94
estimate_remaining_time() {
local current_stage="$1"
local completed_stages="$2" # JSON array of completed stage IDs
local total_stages="$3"

local total_avg=0
local remaining_count=0

# Get all stages from pipeline config
local all_stages
all_stages=$(echo '{"stages":[]}' | jq '.stages')
if [[ -n "${PIPELINE_CONFIG:-}" && -f "$PIPELINE_CONFIG" ]]; then
all_stages=$(jq '.stages[] | .id' "$PIPELINE_CONFIG" 2>/dev/null || echo '[]')
fi

# Calculate remaining stages
local remaining_stages=()
while IFS= read -r stage; do
[[ -z "$stage" ]] && continue
local found=0
while IFS= read -r comp; do
[[ "$comp" == "$stage" ]] && found=1
done <<< "$completed_stages"
if [[ $found -eq 0 ]]; then
remaining_stages+=("$stage")
fi
done <<< "$all_stages"

# Sum average durations for remaining stages
for stage in "${remaining_stages[@]}"; do
local avg
avg=$(get_avg_stage_duration "$stage")
if [[ "$avg" != "null" && -n "$avg" ]]; then
total_avg=$(echo "$total_avg + $avg" | bc 2>/dev/null || echo "$total_avg")
fi
done

echo "$total_avg"
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Multiple issues in stage filtering and arithmetic.

  1. Unused parameter: total_stages (line 59) is never used.

  2. Quoted stage IDs: jq '.stages[] | .id' outputs quoted strings (e.g., "build"). Comparisons will fail against unquoted values. Use -r flag for raw output.

  3. Broken completed_stages parsing: Line 76-78 iterates over $completed_stages as a heredoc, but it's a JSON array string. It won't parse correctly—the entire array becomes one line.

  4. bc dependency: Line 89 uses bc which may not be available on all systems.

🐛 Proposed fix
 estimate_remaining_time() {
     local current_stage="$1"
     local completed_stages="$2"  # JSON array of completed stage IDs
-    local total_stages="$3"
     
     local total_avg=0
-    local remaining_count=0
     
     # Get all stages from pipeline config
     local all_stages
-    all_stages=$(echo '{"stages":[]}' | jq '.stages')
     if [[ -n "${PIPELINE_CONFIG:-}" && -f "$PIPELINE_CONFIG" ]]; then
-        all_stages=$(jq '.stages[] | .id' "$PIPELINE_CONFIG" 2>/dev/null || echo '[]')
+        all_stages=$(jq -r '.stages[].id' "$PIPELINE_CONFIG" 2>/dev/null)
+    else
+        all_stages=""
     fi
     
     # Calculate remaining stages
     local remaining_stages=()
     while IFS= read -r stage; do
         [[ -z "$stage" ]] && continue
-        local found=0
-        while IFS= read -r comp; do
-            [[ "$comp" == "$stage" ]] && found=1
-        done <<< "$completed_stages"
-        if [[ $found -eq 0 ]]; then
+        # Check if stage is in completed_stages JSON array
+        if ! echo "$completed_stages" | jq -e --arg s "$stage" 'index($s) != null' >/dev/null 2>&1; then
             remaining_stages+=("$stage")
         fi
     done <<< "$all_stages"
     
     # Sum average durations for remaining stages
     for stage in "${remaining_stages[@]}"; do
         local avg
         avg=$(get_avg_stage_duration "$stage")
         if [[ "$avg" != "null" && -n "$avg" ]]; then
-            total_avg=$(echo "$total_avg + $avg" | bc 2>/dev/null || echo "$total_avg")
+            total_avg=$(awk "BEGIN {print $total_avg + $avg}")
         fi
     done
     
     echo "$total_avg"
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lib/stage-duration.sh` around lines 56 - 94, The function
estimate_remaining_time currently misparses stage IDs and completed_stages, uses
an unused parameter, and relies on bc; update it as follows: remove the unused
parameter total_stages (or stop accepting it) and use jq -r to produce raw stage
IDs (replace jq '.stages[] | .id' with jq -r '.stages[].id' and ensure the
default all_stages is an empty list when PIPELINE_CONFIG is absent); parse
completed_stages (which is a JSON array string) by piping it to jq -r '.[]' (not
treating it as a heredoc) to iterate completed IDs correctly; and replace bc
with a portable float-summing approach (e.g., use awk to accumulate
get_avg_stage_duration results) when computing total_avg; refer to the function
name estimate_remaining_time and the helper get_avg_stage_duration to find and
update the logic.

Comment on lines +104 to +113
if [[ "$secs" -lt 60 ]]; then
echo "~${secs}s"
elif [[ "$secs" -lt 3600 ]]; then
local mins=$((secs / 60))
echo "~${mins}min"
else
local hours=$((secs / 3600))
local mins=$(((secs % 3600) / 60))
echo "~${hours}h${mins}m"
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Integer comparisons will fail on float input.

$secs may contain decimal values from averaged durations (e.g., 127.5). Bash's -lt requires integers and will error on floats.

🐛 Proposed fix - truncate to integer
 format_duration_short() {
     local secs="$1"
     if [[ -z "$secs" || "$secs" == "0" || "$secs" == "null" ]]; then
         echo "~0s"
         return
     fi
+    
+    # Truncate to integer for comparisons
+    secs=${secs%.*}
     
     if [[ "$secs" -lt 60 ]]; then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lib/stage-duration.sh` around lines 104 - 113, The comparisons fail
if secs is a float; truncate secs to an integer before doing the -lt tests and
arithmetic. Add a local integer like secs_int derived from secs (e.g.,
secs_int=${secs%%.*} with a fallback to 0 if empty) and replace all uses of secs
in the if/elif/else tests and subsequent calculations (mins/hours arithmetic and
echo formatting) with secs_int so integer-only comparisons and arithmetic are
used in the block.

Comment on lines +124 to +131
local remaining_sec
remaining_sec=$(estimate_remaining_time "$current_stage" "[]" "$total_stages")

local remaining_fmt
remaining_fmt=$(format_duration_short "$remaining_sec")

local stage_name
stage_name=$(get_stage_description "$current_stage" 2>/dev/null || echo "$current_stage")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Hardcoded empty array and missing function dependency.

  1. Line 125: Passing "[]" ignores actual progress—the estimate always sums all stages. Should derive completed stages from stage_index.

  2. Line 131: get_stage_description is defined in pipeline-state.sh but this script doesn't source it. Callers must source both files in the correct order.

🐛 Proposed fix for completed stages logic
 build_progress_display() {
     local current_stage="$1"
     local stage_index="$2"
     local total_stages="$3"
     local elapsed_sec="$4"
     local cost_usd="$5"
     
+    # Build completed stages array from pipeline config based on stage_index
+    local completed_stages="[]"
+    if [[ -n "${PIPELINE_CONFIG:-}" && -f "$PIPELINE_CONFIG" ]]; then
+        completed_stages=$(jq -c --argjson idx "$((stage_index - 1))" \
+            '[.stages[:$idx] | .[].id]' "$PIPELINE_CONFIG" 2>/dev/null || echo "[]")
+    fi
+    
     local remaining_sec
-    remaining_sec=$(estimate_remaining_time "$current_stage" "[]" "$total_stages")
+    remaining_sec=$(estimate_remaining_time "$current_stage" "$completed_stages")

Consider adding a comment or guard at the top of this file indicating that pipeline-state.sh must be sourced first.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lib/stage-duration.sh` around lines 124 - 131, The call to
estimate_remaining_time currently passes a hardcoded "[]" causing incorrect
remaining time because completed stages should be derived from stage_index;
change the logic so estimate_remaining_time("$current_stage", "$stage_index",
"$total_stages") (or the equivalent API your estimate_remaining_time expects) is
used rather than the literal "[]", and keep using format_duration_short for
formatting; also ensure get_stage_description is available by adding a
requirement at the top of the script (or sourcing) so pipeline-state.sh is
sourced before using get_stage_description, and add a brief guard/comment
stating that pipeline-state.sh must be sourced first to satisfy the dependency.

@RyanD66 RyanD66 closed this by deleting the head repository Apr 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant