feat: stage duration tracking for progress estimation#197
feat: stage duration tracking for progress estimation#197RyanD66 wants to merge 1 commit intosethdford:mainfrom
Conversation
- Add stage-durations.json storage for historical stage durations - Add load_stage_durations, save_stage_duration, get_avg_stage_duration functions - Add estimate_remaining_time for ETA calculation - Add build_progress_display for progress string formatting Addresses issue #180: Pipeline Execution Progress Estimation
📝 WalkthroughWalkthroughA new Bash script introducing functions to track and utilize historical stage durations for estimating pipeline progress. Includes JSON-based persistence, duration averaging, remaining time calculation, and formatted progress display output. Changes
Sequence DiagramsequenceDiagram
participant PM as Pipeline Manager
participant SDS as Stage Duration Script
participant FS as File System
participant Display as Progress Display
PM->>SDS: load_stage_durations()
SDS->>FS: read stage-durations.json
FS-->>SDS: duration history
SDS-->>PM: stage history object
PM->>PM: execute current stage
PM->>SDS: save_stage_duration(stage_id, duration)
SDS->>FS: update stage-durations.json
FS-->>SDS: ack
PM->>SDS: get_avg_stage_duration(stage_id)
SDS-->>PM: average duration
PM->>SDS: estimate_remaining_time(current, completed, total)
SDS->>SDS: sum averages of remaining stages
SDS-->>PM: estimated time remaining
PM->>SDS: build_progress_display(...)
SDS-->>PM: formatted progress string
PM->>Display: show progress string
Display-->>PM: ack
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
scripts/lib/stage-duration.sh (1)
30-34: Unused jq variable$sand potential race condition.
- The variable
$sdefined on line 31 is never used in the expression.- Concurrent pipeline runs could cause lost updates due to read-modify-write without locking.
♻️ Simplified jq expression
- stage_history=$(echo "$current" | jq --arg stage "$stage_id" --arg dur "$duration_sec" \ - '($stage | if . == null then {} else . end) as $s | - if has($stage) then .[$stage] += [$dur | tonumber] | .[$stage] = .[$stage][-20:] - else .[$stage] = [$dur | tonumber] - end' 2>/dev/null || echo "{}") + stage_history=$(echo "$current" | jq --arg stage "$stage_id" --argjson dur "$duration_sec" \ + '.[$stage] = ((.[$stage] // []) + [$dur])[-20:]' 2>/dev/null || echo "{}")For the race condition, consider using a lockfile if concurrent executions are expected.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/lib/stage-duration.sh` around lines 30 - 34, The jq expression that builds stage_history is declaring an unused variable $s and is fragile to concurrent runs; replace the complex expression with a simpler one that omits the unused ($s) binding and performs: if has($stage) then append the numeric duration and trim to the last 20 entries else create a new array with the numeric duration (use $stage, $dur/$duration_sec and $current to locate this logic), and to prevent lost updates wrap the read-modify-write around a lightweight lock (e.g., a lockfile around the code that reads $current and writes stage_history) so concurrent pipeline runs cannot clobber each other.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/lib/stage-duration.sh`:
- Around line 46-50: The jq expression that computes avg uses ($stage | length)
which measures the stage_id string length instead of the durations array; update
the jq filter in the avg assignment (where current, stage_id and avg are
referenced) to check the array length with (.[$stage] | length) — e.g., change
the conditional to 'if has($stage) and (.[$stage] | length) > 0 then ...' so the
reduce and division only run when the durations array exists and is non-empty.
- Around line 56-94: The function estimate_remaining_time currently misparses
stage IDs and completed_stages, uses an unused parameter, and relies on bc;
update it as follows: remove the unused parameter total_stages (or stop
accepting it) and use jq -r to produce raw stage IDs (replace jq '.stages[] |
.id' with jq -r '.stages[].id' and ensure the default all_stages is an empty
list when PIPELINE_CONFIG is absent); parse completed_stages (which is a JSON
array string) by piping it to jq -r '.[]' (not treating it as a heredoc) to
iterate completed IDs correctly; and replace bc with a portable float-summing
approach (e.g., use awk to accumulate get_avg_stage_duration results) when
computing total_avg; refer to the function name estimate_remaining_time and the
helper get_avg_stage_duration to find and update the logic.
- Around line 124-131: The call to estimate_remaining_time currently passes a
hardcoded "[]" causing incorrect remaining time because completed stages should
be derived from stage_index; change the logic so
estimate_remaining_time("$current_stage", "$stage_index", "$total_stages") (or
the equivalent API your estimate_remaining_time expects) is used rather than the
literal "[]", and keep using format_duration_short for formatting; also ensure
get_stage_description is available by adding a requirement at the top of the
script (or sourcing) so pipeline-state.sh is sourced before using
get_stage_description, and add a brief guard/comment stating that
pipeline-state.sh must be sourced first to satisfy the dependency.
- Around line 104-113: The comparisons fail if secs is a float; truncate secs to
an integer before doing the -lt tests and arithmetic. Add a local integer like
secs_int derived from secs (e.g., secs_int=${secs%%.*} with a fallback to 0 if
empty) and replace all uses of secs in the if/elif/else tests and subsequent
calculations (mins/hours arithmetic and echo formatting) with secs_int so
integer-only comparisons and arithmetic are used in the block.
---
Nitpick comments:
In `@scripts/lib/stage-duration.sh`:
- Around line 30-34: The jq expression that builds stage_history is declaring an
unused variable $s and is fragile to concurrent runs; replace the complex
expression with a simpler one that omits the unused ($s) binding and performs:
if has($stage) then append the numeric duration and trim to the last 20 entries
else create a new array with the numeric duration (use $stage,
$dur/$duration_sec and $current to locate this logic), and to prevent lost
updates wrap the read-modify-write around a lightweight lock (e.g., a lockfile
around the code that reads $current and writes stage_history) so concurrent
pipeline runs cannot clobber each other.
| avg=$(echo "$current" | jq --arg stage "$stage_id" \ | ||
| 'if has($stage) and ($stage | length) > 0 then | ||
| (reduce .[$stage][] as $item (0; . + $item)) / (.[$stage] | length) | ||
| else null | ||
| end' 2>/dev/null || echo "null") |
There was a problem hiding this comment.
Bug: Checking length of stage ID string instead of duration array.
($stage | length) evaluates the length of the stage ID string, not the duration array. This should be .[$stage] | length to check if the array has entries.
🐛 Proposed fix
avg=$(echo "$current" | jq --arg stage "$stage_id" \
- 'if has($stage) and ($stage | length) > 0 then
+ 'if has($stage) and (.[$stage] | length) > 0 then
(reduce .[$stage][] as $item (0; . + $item)) / (.[$stage] | length)
else null
end' 2>/dev/null || echo "null")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| avg=$(echo "$current" | jq --arg stage "$stage_id" \ | |
| 'if has($stage) and ($stage | length) > 0 then | |
| (reduce .[$stage][] as $item (0; . + $item)) / (.[$stage] | length) | |
| else null | |
| end' 2>/dev/null || echo "null") | |
| avg=$(echo "$current" | jq --arg stage "$stage_id" \ | |
| 'if has($stage) and (.[$stage] | length) > 0 then | |
| (reduce .[$stage][] as $item (0; . + $item)) / (.[$stage] | length) | |
| else null | |
| end' 2>/dev/null || echo "null") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/lib/stage-duration.sh` around lines 46 - 50, The jq expression that
computes avg uses ($stage | length) which measures the stage_id string length
instead of the durations array; update the jq filter in the avg assignment
(where current, stage_id and avg are referenced) to check the array length with
(.[$stage] | length) — e.g., change the conditional to 'if has($stage) and
(.[$stage] | length) > 0 then ...' so the reduce and division only run when the
durations array exists and is non-empty.
| estimate_remaining_time() { | ||
| local current_stage="$1" | ||
| local completed_stages="$2" # JSON array of completed stage IDs | ||
| local total_stages="$3" | ||
|
|
||
| local total_avg=0 | ||
| local remaining_count=0 | ||
|
|
||
| # Get all stages from pipeline config | ||
| local all_stages | ||
| all_stages=$(echo '{"stages":[]}' | jq '.stages') | ||
| if [[ -n "${PIPELINE_CONFIG:-}" && -f "$PIPELINE_CONFIG" ]]; then | ||
| all_stages=$(jq '.stages[] | .id' "$PIPELINE_CONFIG" 2>/dev/null || echo '[]') | ||
| fi | ||
|
|
||
| # Calculate remaining stages | ||
| local remaining_stages=() | ||
| while IFS= read -r stage; do | ||
| [[ -z "$stage" ]] && continue | ||
| local found=0 | ||
| while IFS= read -r comp; do | ||
| [[ "$comp" == "$stage" ]] && found=1 | ||
| done <<< "$completed_stages" | ||
| if [[ $found -eq 0 ]]; then | ||
| remaining_stages+=("$stage") | ||
| fi | ||
| done <<< "$all_stages" | ||
|
|
||
| # Sum average durations for remaining stages | ||
| for stage in "${remaining_stages[@]}"; do | ||
| local avg | ||
| avg=$(get_avg_stage_duration "$stage") | ||
| if [[ "$avg" != "null" && -n "$avg" ]]; then | ||
| total_avg=$(echo "$total_avg + $avg" | bc 2>/dev/null || echo "$total_avg") | ||
| fi | ||
| done | ||
|
|
||
| echo "$total_avg" | ||
| } |
There was a problem hiding this comment.
Multiple issues in stage filtering and arithmetic.
-
Unused parameter:
total_stages(line 59) is never used. -
Quoted stage IDs:
jq '.stages[] | .id'outputs quoted strings (e.g.,"build"). Comparisons will fail against unquoted values. Use-rflag for raw output. -
Broken completed_stages parsing: Line 76-78 iterates over
$completed_stagesas a heredoc, but it's a JSON array string. It won't parse correctly—the entire array becomes one line. -
bcdependency: Line 89 usesbcwhich may not be available on all systems.
🐛 Proposed fix
estimate_remaining_time() {
local current_stage="$1"
local completed_stages="$2" # JSON array of completed stage IDs
- local total_stages="$3"
local total_avg=0
- local remaining_count=0
# Get all stages from pipeline config
local all_stages
- all_stages=$(echo '{"stages":[]}' | jq '.stages')
if [[ -n "${PIPELINE_CONFIG:-}" && -f "$PIPELINE_CONFIG" ]]; then
- all_stages=$(jq '.stages[] | .id' "$PIPELINE_CONFIG" 2>/dev/null || echo '[]')
+ all_stages=$(jq -r '.stages[].id' "$PIPELINE_CONFIG" 2>/dev/null)
+ else
+ all_stages=""
fi
# Calculate remaining stages
local remaining_stages=()
while IFS= read -r stage; do
[[ -z "$stage" ]] && continue
- local found=0
- while IFS= read -r comp; do
- [[ "$comp" == "$stage" ]] && found=1
- done <<< "$completed_stages"
- if [[ $found -eq 0 ]]; then
+ # Check if stage is in completed_stages JSON array
+ if ! echo "$completed_stages" | jq -e --arg s "$stage" 'index($s) != null' >/dev/null 2>&1; then
remaining_stages+=("$stage")
fi
done <<< "$all_stages"
# Sum average durations for remaining stages
for stage in "${remaining_stages[@]}"; do
local avg
avg=$(get_avg_stage_duration "$stage")
if [[ "$avg" != "null" && -n "$avg" ]]; then
- total_avg=$(echo "$total_avg + $avg" | bc 2>/dev/null || echo "$total_avg")
+ total_avg=$(awk "BEGIN {print $total_avg + $avg}")
fi
done
echo "$total_avg"
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/lib/stage-duration.sh` around lines 56 - 94, The function
estimate_remaining_time currently misparses stage IDs and completed_stages, uses
an unused parameter, and relies on bc; update it as follows: remove the unused
parameter total_stages (or stop accepting it) and use jq -r to produce raw stage
IDs (replace jq '.stages[] | .id' with jq -r '.stages[].id' and ensure the
default all_stages is an empty list when PIPELINE_CONFIG is absent); parse
completed_stages (which is a JSON array string) by piping it to jq -r '.[]' (not
treating it as a heredoc) to iterate completed IDs correctly; and replace bc
with a portable float-summing approach (e.g., use awk to accumulate
get_avg_stage_duration results) when computing total_avg; refer to the function
name estimate_remaining_time and the helper get_avg_stage_duration to find and
update the logic.
| if [[ "$secs" -lt 60 ]]; then | ||
| echo "~${secs}s" | ||
| elif [[ "$secs" -lt 3600 ]]; then | ||
| local mins=$((secs / 60)) | ||
| echo "~${mins}min" | ||
| else | ||
| local hours=$((secs / 3600)) | ||
| local mins=$(((secs % 3600) / 60)) | ||
| echo "~${hours}h${mins}m" | ||
| fi |
There was a problem hiding this comment.
Integer comparisons will fail on float input.
$secs may contain decimal values from averaged durations (e.g., 127.5). Bash's -lt requires integers and will error on floats.
🐛 Proposed fix - truncate to integer
format_duration_short() {
local secs="$1"
if [[ -z "$secs" || "$secs" == "0" || "$secs" == "null" ]]; then
echo "~0s"
return
fi
+
+ # Truncate to integer for comparisons
+ secs=${secs%.*}
if [[ "$secs" -lt 60 ]]; then🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/lib/stage-duration.sh` around lines 104 - 113, The comparisons fail
if secs is a float; truncate secs to an integer before doing the -lt tests and
arithmetic. Add a local integer like secs_int derived from secs (e.g.,
secs_int=${secs%%.*} with a fallback to 0 if empty) and replace all uses of secs
in the if/elif/else tests and subsequent calculations (mins/hours arithmetic and
echo formatting) with secs_int so integer-only comparisons and arithmetic are
used in the block.
| local remaining_sec | ||
| remaining_sec=$(estimate_remaining_time "$current_stage" "[]" "$total_stages") | ||
|
|
||
| local remaining_fmt | ||
| remaining_fmt=$(format_duration_short "$remaining_sec") | ||
|
|
||
| local stage_name | ||
| stage_name=$(get_stage_description "$current_stage" 2>/dev/null || echo "$current_stage") |
There was a problem hiding this comment.
Hardcoded empty array and missing function dependency.
-
Line 125: Passing
"[]"ignores actual progress—the estimate always sums all stages. Should derive completed stages fromstage_index. -
Line 131:
get_stage_descriptionis defined inpipeline-state.shbut this script doesn't source it. Callers must source both files in the correct order.
🐛 Proposed fix for completed stages logic
build_progress_display() {
local current_stage="$1"
local stage_index="$2"
local total_stages="$3"
local elapsed_sec="$4"
local cost_usd="$5"
+ # Build completed stages array from pipeline config based on stage_index
+ local completed_stages="[]"
+ if [[ -n "${PIPELINE_CONFIG:-}" && -f "$PIPELINE_CONFIG" ]]; then
+ completed_stages=$(jq -c --argjson idx "$((stage_index - 1))" \
+ '[.stages[:$idx] | .[].id]' "$PIPELINE_CONFIG" 2>/dev/null || echo "[]")
+ fi
+
local remaining_sec
- remaining_sec=$(estimate_remaining_time "$current_stage" "[]" "$total_stages")
+ remaining_sec=$(estimate_remaining_time "$current_stage" "$completed_stages")Consider adding a comment or guard at the top of this file indicating that pipeline-state.sh must be sourced first.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/lib/stage-duration.sh` around lines 124 - 131, The call to
estimate_remaining_time currently passes a hardcoded "[]" causing incorrect
remaining time because completed stages should be derived from stage_index;
change the logic so estimate_remaining_time("$current_stage", "$stage_index",
"$total_stages") (or the equivalent API your estimate_remaining_time expects) is
used rather than the literal "[]", and keep using format_duration_short for
formatting; also ensure get_stage_description is available by adding a
requirement at the top of the script (or sourcing) so pipeline-state.sh is
sourced before using get_stage_description, and add a brief guard/comment
stating that pipeline-state.sh must be sourced first to satisfy the dependency.
Summary
Add stage duration history tracking for pipeline progress estimation as requested in issue #180.
Changes
Added
scripts/lib/stage-duration.shwith functions for:load_stage_durations()- Load historical stage durations from JSON filesave_stage_duration()- Save stage duration to history (keeps last 20)get_avg_stage_duration()- Get average duration for a stageestimate_remaining_time()- Estimate remaining time based on pending stagesbuild_progress_display()- Format progress string like "Stage 3/12 (build) — ~32min remaining, $2.47 spent"Stage durations stored in
.claude/stage-durations.jsonAcceptance Criteria
Note: Full integration with sw-pipeline.sh and sw-status.sh will require sourcing this library in those scripts.
Summary by CodeRabbit