Skip to content

fix: Switch worker image build from GCC to Clang-17#150

Open
20001020ycx wants to merge 1 commit intoy-scope:release-0.297-edge-10-clp-connectorfrom
20001020ycx:fix/use-clang17-for-worker-image
Open

fix: Switch worker image build from GCC to Clang-17#150
20001020ycx wants to merge 1 commit intoy-scope:release-0.297-edge-10-clp-connectorfrom
20001020ycx:fix/use-clang17-for-worker-image

Conversation

@20001020ycx
Copy link
Copy Markdown

@20001020ycx 20001020ycx commented Mar 6, 2026

Description

GCC (all tested versions: 11, 13) generates incorrect code in Release mode (-O3) for proxygen's HTTP session/transport handling, causing SIGSEGV at address 0x0 during worker-to-coordinator announcement.

The crash occurs in ResponseHandler::setTransaction() when calling txn_->getTransport().getCodec().getProtocol(). This is the same class of bug as prestodb#22995, which was partially fixed upstream (PR prestodb#23531) but only for the createTransaction() code path.

Clang does not exhibit this codegen issue, which is why Meta/Facebook (who use Clang internally) never encountered it. The upstream CI e2e tests also pass because they build with GCC 12 on CentOS, which happens to not trigger this specific bug.

Changes:

  • Install clang-17 and lld-17 in the Ubuntu dependency image
  • Set CC/CXX to clang-17 and use lld-17 linker in the runtime image build step

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Summary by CodeRabbit

  • Chores
    • Updated build environment configurations to use clang-17 compiler and lld-17 linker with optimized linker flags across container images.

GCC (all tested versions: 11, 13) generates incorrect code in Release
mode (-O3) for proxygen's HTTP session/transport handling, causing
SIGSEGV at address 0x0 during worker-to-coordinator announcement.

The crash occurs in ResponseHandler::setTransaction() when calling
txn_->getTransport().getCodec().getProtocol(). This is the same class
of bug as prestodb#22995, which was partially fixed upstream
(PR prestodb#23531) but only for the createTransaction() code path.

Clang does not exhibit this codegen issue, which is why Meta/Facebook
(who use Clang internally) never encountered it. The upstream CI e2e
tests also pass because they build with GCC 12 on CentOS, which
happens to not trigger this specific bug.

Changes:
- Install clang-17 and lld-17 in the Ubuntu dependency image
- Set CC/CXX to clang-17 and use lld-17 linker in the runtime
  image build step
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 6, 2026

📝 Walkthrough

Walkthrough

The changes introduce clang-17 and lld-17 compiler toolchain into the Docker build environment. The dependency Dockerfile installs these compiler tools, while the runtime Dockerfile configures CMake to use clang-17 as the C and C++ compiler and lld-17 as the linker.

Changes

Cohort / File(s) Summary
Docker dependency setup
presto-native-execution/scripts/dockerfiles/ubuntu-22.04-dependency.dockerfile
Added a new RUN instruction to install clang-17 and lld-17 packages via apt-get with no-install-recommends flag.
Docker build configuration
presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile
Modified CMake configuration to set CC and CXX environment variables to clang-17 and append linker flags for CMAKE_EXE_LINKER_FLAGS and CMAKE_SHARED_LINKER_FLAGS to use lld-17.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description provides context on the GCC codegen issue and the switch to Clang-17, but does not follow the required template structure with sections like Motivation/Context, Impact, Test Plan, and Release Notes. Restructure the description to match the repository template: add clear Motivation/Context, Impact, Test Plan sections, and include a Release Notes section (or explicitly state NO RELEASE NOTE is required).
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: switching the worker image build from GCC to Clang-17, which aligns with the changeset modifications to Dockerfiles.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile`:
- Around line 31-33: The Dockerfile hard-codes CC=/usr/bin/clang-17,
CXX=/usr/bin/clang++-17 and EXTRA_CMAKE_FLAGS with -fuse-ld=lld-17 which breaks
when the CentOS dependency image only provides clang-15; either ensure the
CentOS dependency image installs clang-17 and lld-17, or make the
compiler/linker selection conditional: detect which compilers/linkers exist at
build time (e.g., test for /usr/bin/clang-17 and /usr/bin/lld-17) and fall back
to clang-15/appropriate lld or set alternatives accordingly before exporting
CC/CXX and setting EXTRA_CMAKE_FLAGS so CC, CXX, and the -fuse-ld flag match the
installed toolchain.
- Around line 31-33: The Dockerfile currently unconditionally exports CC/CXX to
clang-17 and appends clang-specific EXTRA_CMAKE_FLAGS which overrides the
earlier gcc-toolset-14 setup used when PRESTO_ENABLE_CUDF=ON; move the clang
exports and the EXTRA_CMAKE_FLAGS addition into a conditional branch that runs
only when PRESTO_ENABLE_CUDF is OFF (or otherwise not ON), or refactor the
compiler selection so that when PRESTO_ENABLE_CUDF=ON the script sources
gcc-toolset-14 and keeps CC/CXX pointing to GCC, and when PRESTO_ENABLE_CUDF is
OFF it sets CC=/usr/bin/clang-17, CXX=/usr/bin/clang++-17 and appends the lld-17
linker flags to EXTRA_CMAKE_FLAGS.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f63e4fff-b467-4ff2-a2b6-9997a702f499

📥 Commits

Reviewing files that changed from the base of the PR and between dea6fee and 0cb4521.

📒 Files selected for processing (2)
  • presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile
  • presto-native-execution/scripts/dockerfiles/ubuntu-22.04-dependency.dockerfile

Comment on lines +31 to +33
export CC=/usr/bin/clang-17 && \
export CXX=/usr/bin/clang++-17 && \
EXTRA_CMAKE_FLAGS="${EXTRA_CMAKE_FLAGS} -DCMAKE_EXE_LINKER_FLAGS=-fuse-ld=lld-17 -DCMAKE_SHARED_LINKER_FLAGS=-fuse-ld=lld-17" \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== prestissimo-runtime defaults and hard-coded toolchain =="
nl -ba presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile | sed -n '13,35p'

echo
echo "== centos dependency image toolchain setup =="
nl -ba presto-native-execution/scripts/dockerfiles/centos-dependency.dockerfile | sed -n '13,39p'

echo
echo "== ubuntu dependency image toolchain setup =="
nl -ba presto-native-execution/scripts/dockerfiles/ubuntu-22.04-dependency.dockerfile | sed -n '47,51p'

echo
echo "== docker-compose build args for centos-native-runtime =="
nl -ba presto-native-execution/docker-compose.yml | sed -n '48,60p'

Repository: y-scope/presto

Length of output: 177


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== prestissimo-runtime defaults and hard-coded toolchain =="
cat -n presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile | sed -n '13,35p'

echo
echo "== centos dependency image toolchain setup =="
cat -n presto-native-execution/scripts/dockerfiles/centos-dependency.dockerfile | sed -n '13,39p'

echo
echo "== ubuntu dependency image toolchain setup =="
cat -n presto-native-execution/scripts/dockerfiles/ubuntu-22.04-dependency.dockerfile | sed -n '47,51p'

echo
echo "== docker-compose build args for centos-native-runtime =="
cat -n presto-native-execution/docker-compose.yml | sed -n '48,60p'

Repository: y-scope/presto

Length of output: 3718


Hard-coded clang-17 and lld-17 in runtime Dockerfile incompatible with default CentOS dependency image.

The prestissimo-runtime.dockerfile defaults to presto/prestissimo-dependency:centos9 (line 13) and does not override it in docker-compose.yml. However, lines 31–33 hard-code /usr/bin/clang-17, /usr/bin/clang++-17, and -fuse-ld=lld-17, while the CentOS dependency image installs only clang-15 via install_clang15. The Ubuntu dependency image explicitly installs clang-17 and lld-17. This version mismatch causes the default CentOS build to fail before reaching the Presto compilation.

Update the CentOS dependency image to install clang-17 and lld-17, or make the compiler/linker selection conditional on the chosen dependency image.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile`
around lines 31 - 33, The Dockerfile hard-codes CC=/usr/bin/clang-17,
CXX=/usr/bin/clang++-17 and EXTRA_CMAKE_FLAGS with -fuse-ld=lld-17 which breaks
when the CentOS dependency image only provides clang-15; either ensure the
CentOS dependency image installs clang-17 and lld-17, or make the
compiler/linker selection conditional: detect which compilers/linkers exist at
build time (e.g., test for /usr/bin/clang-17 and /usr/bin/lld-17) and fall back
to clang-15/appropriate lld or set alternatives accordingly before exporting
CC/CXX and setting EXTRA_CMAKE_FLAGS so CC, CXX, and the -fuse-ld flag match the
installed toolchain.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== compiler selection logic around PRESTO_ENABLE_CUDF =="
nl -ba presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile | sed -n '29,34p'

echo
echo "== PRESTO_ENABLE_CUDF usage in build configuration =="
rg -n -C2 'PRESTO_ENABLE_CUDF' presto-native-execution/docker-compose.yml presto-native-execution/scripts

Repository: y-scope/presto

Length of output: 174


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== compiler selection logic around PRESTO_ENABLE_CUDF =="
cat -n presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile | sed -n '29,34p'

echo
echo "== PRESTO_ENABLE_CUDF usage in build configuration =="
rg -n -C2 'PRESTO_ENABLE_CUDF' presto-native-execution/docker-compose.yml presto-native-execution/scripts

Repository: y-scope/presto

Length of output: 2064


The CUDF-specific GCC compiler setup is overridden unconditionally by clang-17.

Line 30 conditionally unsets CC/CXX and sources gcc-toolset-14 for PRESTO_ENABLE_CUDF=ON builds, but lines 31–32 immediately export clang-17 regardless. Since docker-compose.yml passes -DPRESTO_ENABLE_CUDF=${GPU:-OFF}, the CUDF path is reachable but the GCC compiler choice becomes ineffective. The clang exports must be moved into a conditional branch (when CUDF is OFF) or the entire compiler selection logic requires refactoring to avoid overriding the CUDF-specific compiler stack.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile`
around lines 31 - 33, The Dockerfile currently unconditionally exports CC/CXX to
clang-17 and appends clang-specific EXTRA_CMAKE_FLAGS which overrides the
earlier gcc-toolset-14 setup used when PRESTO_ENABLE_CUDF=ON; move the clang
exports and the EXTRA_CMAKE_FLAGS addition into a conditional branch that runs
only when PRESTO_ENABLE_CUDF is OFF (or otherwise not ON), or refactor the
compiler selection so that when PRESTO_ENABLE_CUDF=ON the script sources
gcc-toolset-14 and keeps CC/CXX pointing to GCC, and when PRESTO_ENABLE_CUDF is
OFF it sets CC=/usr/bin/clang-17, CXX=/usr/bin/clang++-17 and appends the lld-17
linker flags to EXTRA_CMAKE_FLAGS.

@jackluo923 jackluo923 self-assigned this Mar 6, 2026
Copy link
Copy Markdown
Member

@jackluo923 jackluo923 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not do this since gcc is used in quite a few companies such as Uber. They explicitly forbid the use of clang because of other it crashes velox due to other issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants