Conversation
b7bbf03 to
1bccd37
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive pooling operations support to the ckTileEngine, implementing 2D and 3D pooling with configurable reduction operations (max, min, average). The implementation follows the existing GEMM module architecture with individual kernel compilation for improved build parallelism and testing capabilities.
Key Changes:
- Added pooling kernel generation system with Python-based instance builder
- Implemented benchmark infrastructure for pooling operations
- Created configuration system for kernel parameters and GPU target selection
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tile_engine/ops/pooling/pool_profiler.hpp | Profiler class for benchmarking pooling kernels with verification and performance metrics |
| tile_engine/ops/pooling/pool_instance_builder.py | Python script to generate C++ kernel instances from JSON configurations |
| tile_engine/ops/pooling/pool_common.hpp | Common type traits and data type definitions for pooling operations |
| tile_engine/ops/pooling/pool_benchmark_single.cpp | Single kernel benchmark executable with argument parsing and verification |
| tile_engine/ops/pooling/pool_benchmark.py | Python orchestration script for running and analyzing benchmark sweeps |
| tile_engine/ops/pooling/pool_benchmark.hpp | Core data structures and enums for pooling benchmarks |
| tile_engine/ops/pooling/configs/default_config.json | Default configuration for block sizes, warp configs, and traits |
| tile_engine/ops/pooling/README.md | Comprehensive documentation for building, running, and configuring pooling benchmarks |
| tile_engine/ops/pooling/CMakeLists.txt | CMake build system for generating and compiling individual pooling kernels |
| tile_engine/ops/gemm/README.md | Added missing gpu_target argument to example command |
| tile_engine/ops/commons/test_benchmark.sh | Extended test script to support both GEMM and pooling benchmarks |
| tile_engine/ops/CMakeLists.txt | Added pooling subdirectory to build system |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "Sz": stride_z, | ||
| "Sy": stride_y, | ||
| "Sx": stride_x, | ||
| "pool_dim": pool_dim, |
There was a problem hiding this comment.
The parameter 'pool_dim' is included in the params dictionary but is not a valid command-line parameter for the benchmark executables according to pool_benchmark_single.cpp (which doesn't define a 'pool_dim' argument). This will be silently ignored or could cause errors. The pool dimension is determined by the kernel itself (POOL_DIM constant), not passed as an argument.
| "pool_dim": pool_dim, |
| --config_json configs/user_provided_config.json \ | ||
| --gen_all_individual | ||
| --gen_all_individual \ | ||
| --gpu_target gfx942 |
There was a problem hiding this comment.
Gemm TileEngine target is just gfx942?
And this change is related anyway to the adding pooling to TileEngine?
There was a problem hiding this comment.
Great comment. TileEngine supports on gfx90a and gfx1_ series.
| "block_n": { | ||
| "values": [1] |
There was a problem hiding this comment.
If block_n is always 1 for pooling, does it make sense to hard code it in instance_builder or can theoretically select other values?
There was a problem hiding this comment.
Block_n could also be 2 in pooling.
There was a problem hiding this comment.
I have a general question about the instance builder. In the GEMM tile engine, there’s a gemm_validation_utils.py module that is used by gemm_instance_builder to generate only valid instances that can later be compiled. However, it seems that for pooling, there isn’t a similar validation step. and, some checks are performed directly within pool_instance_builder. Would it make sense to adopt a similar structure for these tile engine operations to keep things same?
|
Error importing due to merge conflicts – please reopen the PR on ROCm/rocm-libraries |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| "block_n": { | ||
| "values": [1] |
There was a problem hiding this comment.
Block_n could also be 2 in pooling.
| --config_json configs/user_provided_config.json \ | ||
| --gen_all_individual | ||
| --gen_all_individual \ | ||
| --gpu_target gfx942 |
There was a problem hiding this comment.
Great comment. TileEngine supports on gfx90a and gfx1_ series.
| # Copyright (c) Advanced Micro Devices, Inc., or its affiliates. | ||
| # SPDX-License-Identifier: MIT | ||
|
|
||
| set(POOL_DATATYPE "fp16;fp32" CACHE STRING "List of datatypes for Pool (semicolon-separated)") |
There was a problem hiding this comment.
We could also add fp8 here.
| string(REPLACE "_" ";" trait_parts ${trait}) | ||
| list(GET trait_parts 0 output_index) | ||
| list(GET trait_parts 1 propagate_nan) | ||
| list(GET trait_parts 2 pool_dim) |
There was a problem hiding this comment.
We have both the pool_dim in command line and the json input. Could we unify that and only keep 1? I suggest we only keep it in json.
| #include "ck_tile/host/reference/reference_pool.hpp" | ||
| #include "pool_benchmark.hpp" | ||
|
|
||
| class PoolProfiler |
There was a problem hiding this comment.
Could I know what is the difference between the pool_profiler and pool_benchmark_single.cpp? I didn't see the compilation of the pool_profiler.hpp.
Proposed changes
Add pooling to ckTileEngine
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed files