Skip to content

fix(docker): 修复健康检查误判并增强 0062 迁移幂等性#737

Closed
NieiR wants to merge 1 commit intoding113:devfrom
NieiR:fix/deploy-healthcheck-idempotent-migration
Closed

fix(docker): 修复健康检查误判并增强 0062 迁移幂等性#737
NieiR wants to merge 1 commit intoding113:devfrom
NieiR:fix/deploy-healthcheck-idempotent-migration

Conversation

@NieiR
Copy link
Contributor

@NieiR NieiR commented Feb 8, 2026

Summary

Fix Docker healthcheck false-negatives on node:20-slim (which lacks curl), harden migration 0062 with idempotent DDL, and optimize the Dockerfile to use the Next.js standalone output.

Problem

Two reproducible issues exist on the dev branch:

  1. Healthcheck always unhealthy - The docker-compose.yaml healthcheck uses curl, but the runtime image (node:20-slim) does not include curl, so the app container is perpetually marked unhealthy.
  2. Non-idempotent migration - drizzle/0062_aromatic_taskmaster.sql uses a bare ADD COLUMN which fails if the column already exists, breaking re-runs or environments where the column was added out-of-band.

Related PRs:

Solution

1. Healthcheck: replace curl with Node.js fetch

Use node -e "fetch(...)" instead of curl so the healthcheck works without extra binaries in the slim image.

2. Migration: ADD COLUMN IF NOT EXISTS

A single-word change that makes the migration idempotent per PostgreSQL best practice.

3. Dockerfile: switch to Next.js standalone output

  • Copy .next/standalone + .next/static instead of the full .next + node_modules, significantly reducing image size.
  • Add --mount=type=cache,target=/app/.next/cache for faster rebuilds.
  • Align the runtime port to 3000 (matching the docker-compose.yaml mapping).
  • Copy VERSION file into the image.
  • Run node server.js directly instead of node node_modules/.bin/next start.

Changes

Core Changes

File Change
docker-compose.yaml Replace curl-based healthcheck with node -e "fetch(...)"
drizzle/0062_aromatic_taskmaster.sql ADD COLUMN -> ADD COLUMN IF NOT EXISTS
Dockerfile Switch to standalone output, cache .next/cache, align port to 3000

Detailed Dockerfile diff

  • Build stage: add BuildKit cache mount for .next/cache
  • Runner stage: ENV PORT=3000 / EXPOSE 3000 (was 8080)
  • Copy layer: .next/standalone + .next/static + VERSION instead of full .next + node_modules
  • Entrypoint: CMD ["node", "server.js"] instead of CMD ["node", "node_modules/.bin/next", "start"]

Breaking Changes

None. All changes are infrastructure-only:

  • No API protocol or business logic changes.
  • No request/response format changes.
  • The idempotent migration enhancement only reduces failure probability; it does not alter data.

Testing

Automated Tests

  • No automated tests added (infrastructure-only changes to Dockerfile, Compose, and SQL migration).

Manual Testing

  1. docker compose up - container starts and transitions from starting to healthy
  2. Re-run migration on an environment where the gemini_google_search_preference column already exists - no error
  3. GET /api/actions/health returns 200

Risk Assessment

Low risk:

  • Healthcheck change only affects container liveness probing mechanism
  • Migration change follows standard PostgreSQL idempotent DDL practice
  • Dockerfile change does not affect application code paths, only build/runtime artifact organization

Checklist

  • Code follows project conventions
  • Self-review completed
  • No business logic changes
  • Verified locally: container healthy, migration idempotent, health endpoint returns 200

Description enhanced by Claude AI

@coderabbitai
Copy link

coderabbitai bot commented Feb 8, 2026

📝 Walkthrough

Walkthrough

此拉取请求对项目的构建、部署和数据库配置进行了调整。Dockerfile 更新了缓存策略、应用端口(从 8080 改为 3000)、Next.js 构建产物的复制方式和启动命令。docker-compose.yaml 修改了健康检查机制,drizzle 迁移脚本增加了条件约束。

Changes

Cohort / File(s) Summary
Docker Build Configuration
Dockerfile
启用了 Next.js 构建缓存挂载,将应用端口改为 3000,修改文件复制策略以使用 .next/standalone.next/static,将启动命令从 next start 改为 node server.js
Container Orchestration
docker-compose.yaml
用基于 Node.js fetch 的健康检查替换了 curl 命令,调整了退出码逻辑以在 HTTP OK 时返回 0,失败时返回 1。
Database Migration
drizzle/0062_aromatic_taskmaster.sql
在 ALTER TABLE 语句中添加了 IF NOT EXISTS 子句,以防止向 providers 表添加已存在的列时产生错误。

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • 支持 zeabur 部署 #679:两个拉取请求都修改了项目的 Dockerfile 构建和运行时步骤(阶段布局、从 .next 复制的文件、drizzle 处理、暴露端口和启动命令),因此相关。
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed PR标题明确指出修复健康检查误判和增强0062迁移幂等性,与变更内容完全对应。
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed PR 描述详细说明了解决的问题、具体改动内容、影响范围、验证说明和风险评估,与提供的代码变更内容相符。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @NieiR, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在解决 Docker 环境中的两个关键问题:一是由于基础镜像缺少 curl 导致健康检查误判,二是 Drizzle 数据库迁移脚本的非幂等性问题。通过将健康检查逻辑替换为 Node.js 原生 fetch 调用,并为数据库迁移添加幂等性检查,显著提升了系统的健壮性。此外,还对 Dockerfile 进行了多项优化,包括引入构建缓存、采用 Next.js standalone 模式以及统一端口,从而有效减小了镜像体积并提高了构建和运行效率。

Highlights

  • 修复 Docker 健康检查误判: 将健康检查从依赖 curl 的 shell 命令改为使用 Node.js 原生 fetch API,以避免基础镜像缺少工具导致的应用被误判为不健康。
  • 增强 Drizzle 迁移幂等性: 修改了 drizzle/0062_aromatic_taskmaster.sql 中的 ALTER TABLE 语句,增加了 IF NOT EXISTS,确保在列已存在时不会重复创建导致迁移失败。
  • 优化 Dockerfile 构建与运行: 为 bun run build 增加了构建缓存,提升构建效率;将应用运行端口统一为 3000;利用 Next.js 的 standalone 输出模式,减少 Docker 镜像体积并优化启动方式。
Changelog
  • Dockerfile
    • bun run build 命令添加了 --mount=type=cache,target=/app/.next/cache 以利用构建缓存。
    • PORT 环境变量从 8080 修改为 3000,并相应地暴露 3000 端口。
    • 调整了 COPY 指令,从 builder 阶段复制 .next/standalone 到根目录,以及 .next/static.next/static,不再复制整个 .next 目录和 node_modules
    • 新增复制 VERSION 文件。
    • CMD 命令从 node node_modules/.bin/next start 更改为 node server.js,以适应 Next.js standalone 模式。
  • docker-compose.yaml
    • 修改了 app 服务的 healthcheck 配置,将 test 命令从 ["CMD-SHELL", "curl -f http://localhost:3000/api/actions/health || exit 1"] 更改为使用 node -e "fetch(...)" 的方式进行健康检查。
  • drizzle/0062_aromatic_taskmaster.sql
    • ALTER TABLE "providers" ADD COLUMN "gemini_google_search_preference" varchar(20); 语句中添加了 IF NOT EXISTS,使其变为 ALTER TABLE "providers" ADD COLUMN IF NOT EXISTS "gemini_google_search_preference" varchar(20);,以确保操作的幂等性。
Activity
  • 目前没有与此拉取请求相关的评论或审查活动。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions bot added bug Something isn't working area:deployment labels Feb 8, 2026
@github-actions github-actions bot added the size/XS Extra Small PR (< 50 lines) label Feb 8, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

这个 PR 的改动很棒,主要解决了两个问题:修复了 Docker 健康检查因缺少 curl 导致的误判,以及通过使用 ADD COLUMN IF NOT EXISTS 增强了数据库迁移的幂等性。此外,Dockerfile 的优化(如使用构建缓存和 Next.js standalone 输出)也值得称赞,有效提升了构建效率并减小了镜像体积。整体来看,这些改动提升了应用的稳定性和部署效率。我只在 docker-compose.yaml 中发现了一个可以轻微简化的地方。

"CMD",
"node",
"-e",
"fetch('http://' + (process.env.HOSTNAME || '127.0.0.1') + ':3000/api/actions/health').then((r)=>process.exit(r.ok?0:1)).catch(()=>process.exit(1))",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

这个原生 Node.js 的健康检查脚本很巧妙,成功解决了基础镜像不含 curl 的问题。不过,其中的 URL 构建逻辑 'http://' + (process.env.HOSTNAME || '127.0.0.1') + ... 似乎有些复杂。在 Docker 的健康检查上下文中,容器内的服务可以通过 127.0.0.1localhost 访问。直接使用 127.0.0.1 会更简洁、明确,并且能避免不必要的环境变量读取和字符串拼接。建议简化一下。

          "fetch('http://127.0.0.1:3000/api/actions/health').then((r)=>process.exit(r.ok?0:1)).catch(()=>process.exit(1))",

@NieiR
Copy link
Contributor Author

NieiR commented Feb 8, 2026

相关修复已并入 PR #701 对应分支,避免重复评审,故关闭此 PR。

@NieiR NieiR closed this Feb 8, 2026
@github-project-automation github-project-automation bot moved this from Backlog to Done in Claude Code Hub Roadmap Feb 8, 2026
COPY --from=builder /app/package.json ./package.json
COPY --from=builder /app/drizzle ./drizzle
COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] [LOGIC-BUG] Missing .next/server copy breaks Server Actions

Why this is a problem: The old Dockerfile copied the entire .next directory (COPY --from=builder /app/.next ./.next), which included .next/server containing Server Action manifests and action ID resolution files. The new standalone approach copies only .next/standalone and .next/static, but omits .next/server. Without it, Next.js cannot resolve Server Action IDs at runtime.

This project uses Server Actions extensively (15+ action modules in src/actions/). The production deploy/Dockerfile handles this correctly at line 54-55:

# Server Actions live inside .next/server; copy it or Next.js cannot resolve action IDs.
COPY --from=build --chown=node:node /app/.next/server ./.next/server

Suggested fix:

COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/.next/server ./.next/server
COPY --from=builder /app/drizzle ./drizzle
COPY --from=builder /app/VERSION ./VERSION

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR fixes the Docker healthcheck to use Node's built-in fetch instead of curl (which is unavailable in node:20-slim), makes the migration SQL idempotent, and switches the root Dockerfile to use Next.js standalone output. The healthcheck and migration changes are sound. However, the standalone Dockerfile conversion is missing a critical COPY directive that will break Server Actions at runtime.

PR Size: XS

  • Lines changed: 26
  • Files changed: 3

Issues Found

Category Critical High Medium Low
Logic/Bugs 1 0 0 0
Security 0 0 0 0
Error Handling 0 0 0 0
Types 0 0 0 0
Comments/Docs 0 0 0 0
Tests 0 0 0 0
Simplification 0 0 0 0

Critical Issues (Must Fix)

  1. Dockerfile:24 - Missing .next/server COPY directive. The old Dockerfile copied the entire .next directory, which included .next/server (Server Action manifests and action ID resolution files). The new standalone approach only copies .next/standalone and .next/static, omitting .next/server. The production deploy/Dockerfile already handles this correctly with an explicit copy and explanatory comment. Without this fix, all Server Actions (15+ modules) will fail to resolve at runtime.

Review Coverage

  • Logic and correctness
  • Security (OWASP Top 10)
  • Error handling
  • Type safety
  • Documentation accuracy
  • Test coverage
  • Code clarity

Automated review by Claude AI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:deployment bug Something isn't working size/XS Extra Small PR (< 50 lines)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant