fix: Switch worker image build from GCC to Clang-17#150
fix: Switch worker image build from GCC to Clang-17#15020001020ycx wants to merge 1 commit intoy-scope:release-0.297-edge-10-clp-connectorfrom
Conversation
GCC (all tested versions: 11, 13) generates incorrect code in Release mode (-O3) for proxygen's HTTP session/transport handling, causing SIGSEGV at address 0x0 during worker-to-coordinator announcement. The crash occurs in ResponseHandler::setTransaction() when calling txn_->getTransport().getCodec().getProtocol(). This is the same class of bug as prestodb#22995, which was partially fixed upstream (PR prestodb#23531) but only for the createTransaction() code path. Clang does not exhibit this codegen issue, which is why Meta/Facebook (who use Clang internally) never encountered it. The upstream CI e2e tests also pass because they build with GCC 12 on CentOS, which happens to not trigger this specific bug. Changes: - Install clang-17 and lld-17 in the Ubuntu dependency image - Set CC/CXX to clang-17 and use lld-17 linker in the runtime image build step
📝 WalkthroughWalkthroughThe changes introduce clang-17 and lld-17 compiler toolchain into the Docker build environment. The dependency Dockerfile installs these compiler tools, while the runtime Dockerfile configures CMake to use clang-17 as the C and C++ compiler and lld-17 as the linker. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile`:
- Around line 31-33: The Dockerfile hard-codes CC=/usr/bin/clang-17,
CXX=/usr/bin/clang++-17 and EXTRA_CMAKE_FLAGS with -fuse-ld=lld-17 which breaks
when the CentOS dependency image only provides clang-15; either ensure the
CentOS dependency image installs clang-17 and lld-17, or make the
compiler/linker selection conditional: detect which compilers/linkers exist at
build time (e.g., test for /usr/bin/clang-17 and /usr/bin/lld-17) and fall back
to clang-15/appropriate lld or set alternatives accordingly before exporting
CC/CXX and setting EXTRA_CMAKE_FLAGS so CC, CXX, and the -fuse-ld flag match the
installed toolchain.
- Around line 31-33: The Dockerfile currently unconditionally exports CC/CXX to
clang-17 and appends clang-specific EXTRA_CMAKE_FLAGS which overrides the
earlier gcc-toolset-14 setup used when PRESTO_ENABLE_CUDF=ON; move the clang
exports and the EXTRA_CMAKE_FLAGS addition into a conditional branch that runs
only when PRESTO_ENABLE_CUDF is OFF (or otherwise not ON), or refactor the
compiler selection so that when PRESTO_ENABLE_CUDF=ON the script sources
gcc-toolset-14 and keeps CC/CXX pointing to GCC, and when PRESTO_ENABLE_CUDF is
OFF it sets CC=/usr/bin/clang-17, CXX=/usr/bin/clang++-17 and appends the lld-17
linker flags to EXTRA_CMAKE_FLAGS.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: f63e4fff-b467-4ff2-a2b6-9997a702f499
📒 Files selected for processing (2)
presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfilepresto-native-execution/scripts/dockerfiles/ubuntu-22.04-dependency.dockerfile
| export CC=/usr/bin/clang-17 && \ | ||
| export CXX=/usr/bin/clang++-17 && \ | ||
| EXTRA_CMAKE_FLAGS="${EXTRA_CMAKE_FLAGS} -DCMAKE_EXE_LINKER_FLAGS=-fuse-ld=lld-17 -DCMAKE_SHARED_LINKER_FLAGS=-fuse-ld=lld-17" \ |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== prestissimo-runtime defaults and hard-coded toolchain =="
nl -ba presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile | sed -n '13,35p'
echo
echo "== centos dependency image toolchain setup =="
nl -ba presto-native-execution/scripts/dockerfiles/centos-dependency.dockerfile | sed -n '13,39p'
echo
echo "== ubuntu dependency image toolchain setup =="
nl -ba presto-native-execution/scripts/dockerfiles/ubuntu-22.04-dependency.dockerfile | sed -n '47,51p'
echo
echo "== docker-compose build args for centos-native-runtime =="
nl -ba presto-native-execution/docker-compose.yml | sed -n '48,60p'Repository: y-scope/presto
Length of output: 177
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== prestissimo-runtime defaults and hard-coded toolchain =="
cat -n presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile | sed -n '13,35p'
echo
echo "== centos dependency image toolchain setup =="
cat -n presto-native-execution/scripts/dockerfiles/centos-dependency.dockerfile | sed -n '13,39p'
echo
echo "== ubuntu dependency image toolchain setup =="
cat -n presto-native-execution/scripts/dockerfiles/ubuntu-22.04-dependency.dockerfile | sed -n '47,51p'
echo
echo "== docker-compose build args for centos-native-runtime =="
cat -n presto-native-execution/docker-compose.yml | sed -n '48,60p'Repository: y-scope/presto
Length of output: 3718
Hard-coded clang-17 and lld-17 in runtime Dockerfile incompatible with default CentOS dependency image.
The prestissimo-runtime.dockerfile defaults to presto/prestissimo-dependency:centos9 (line 13) and does not override it in docker-compose.yml. However, lines 31–33 hard-code /usr/bin/clang-17, /usr/bin/clang++-17, and -fuse-ld=lld-17, while the CentOS dependency image installs only clang-15 via install_clang15. The Ubuntu dependency image explicitly installs clang-17 and lld-17. This version mismatch causes the default CentOS build to fail before reaching the Presto compilation.
Update the CentOS dependency image to install clang-17 and lld-17, or make the compiler/linker selection conditional on the chosen dependency image.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile`
around lines 31 - 33, The Dockerfile hard-codes CC=/usr/bin/clang-17,
CXX=/usr/bin/clang++-17 and EXTRA_CMAKE_FLAGS with -fuse-ld=lld-17 which breaks
when the CentOS dependency image only provides clang-15; either ensure the
CentOS dependency image installs clang-17 and lld-17, or make the
compiler/linker selection conditional: detect which compilers/linkers exist at
build time (e.g., test for /usr/bin/clang-17 and /usr/bin/lld-17) and fall back
to clang-15/appropriate lld or set alternatives accordingly before exporting
CC/CXX and setting EXTRA_CMAKE_FLAGS so CC, CXX, and the -fuse-ld flag match the
installed toolchain.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== compiler selection logic around PRESTO_ENABLE_CUDF =="
nl -ba presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile | sed -n '29,34p'
echo
echo "== PRESTO_ENABLE_CUDF usage in build configuration =="
rg -n -C2 'PRESTO_ENABLE_CUDF' presto-native-execution/docker-compose.yml presto-native-execution/scriptsRepository: y-scope/presto
Length of output: 174
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== compiler selection logic around PRESTO_ENABLE_CUDF =="
cat -n presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile | sed -n '29,34p'
echo
echo "== PRESTO_ENABLE_CUDF usage in build configuration =="
rg -n -C2 'PRESTO_ENABLE_CUDF' presto-native-execution/docker-compose.yml presto-native-execution/scriptsRepository: y-scope/presto
Length of output: 2064
The CUDF-specific GCC compiler setup is overridden unconditionally by clang-17.
Line 30 conditionally unsets CC/CXX and sources gcc-toolset-14 for PRESTO_ENABLE_CUDF=ON builds, but lines 31–32 immediately export clang-17 regardless. Since docker-compose.yml passes -DPRESTO_ENABLE_CUDF=${GPU:-OFF}, the CUDF path is reachable but the GCC compiler choice becomes ineffective. The clang exports must be moved into a conditional branch (when CUDF is OFF) or the entire compiler selection logic requires refactoring to avoid overriding the CUDF-specific compiler stack.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile`
around lines 31 - 33, The Dockerfile currently unconditionally exports CC/CXX to
clang-17 and appends clang-specific EXTRA_CMAKE_FLAGS which overrides the
earlier gcc-toolset-14 setup used when PRESTO_ENABLE_CUDF=ON; move the clang
exports and the EXTRA_CMAKE_FLAGS addition into a conditional branch that runs
only when PRESTO_ENABLE_CUDF is OFF (or otherwise not ON), or refactor the
compiler selection so that when PRESTO_ENABLE_CUDF=ON the script sources
gcc-toolset-14 and keeps CC/CXX pointing to GCC, and when PRESTO_ENABLE_CUDF is
OFF it sets CC=/usr/bin/clang-17, CXX=/usr/bin/clang++-17 and appends the lld-17
linker flags to EXTRA_CMAKE_FLAGS.
jackluo923
left a comment
There was a problem hiding this comment.
We should not do this since gcc is used in quite a few companies such as Uber. They explicitly forbid the use of clang because of other it crashes velox due to other issues.
Description
GCC (all tested versions: 11, 13) generates incorrect code in Release mode (-O3) for proxygen's HTTP session/transport handling, causing SIGSEGV at address 0x0 during worker-to-coordinator announcement.
The crash occurs in ResponseHandler::setTransaction() when calling txn_->getTransport().getCodec().getProtocol(). This is the same class of bug as prestodb#22995, which was partially fixed upstream (PR prestodb#23531) but only for the createTransaction() code path.
Clang does not exhibit this codegen issue, which is why Meta/Facebook (who use Clang internally) never encountered it. The upstream CI e2e tests also pass because they build with GCC 12 on CentOS, which happens to not trigger this specific bug.
Changes:
Checklist
breaking change.
Validation performed
Summary by CodeRabbit