Skip to content

Comments

[CK_TILE] Add pooling to ckTileEngine#3416

Open
aledudek wants to merge 11 commits intodevelopfrom
ckTileEnginePooling
Open

[CK_TILE] Add pooling to ckTileEngine#3416
aledudek wants to merge 11 commits intodevelopfrom
ckTileEnginePooling

Conversation

@aledudek
Copy link
Contributor

Proposed changes

Add pooling to ckTileEngine

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

@aledudek aledudek force-pushed the ckTileEnginePooling branch from b7bbf03 to 1bccd37 Compare December 16, 2025 10:40
spolifroni-amd
spolifroni-amd previously approved these changes Dec 17, 2025
@aosewski aosewski requested a review from Copilot December 18, 2025 10:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive pooling operations support to the ckTileEngine, implementing 2D and 3D pooling with configurable reduction operations (max, min, average). The implementation follows the existing GEMM module architecture with individual kernel compilation for improved build parallelism and testing capabilities.

Key Changes:

  • Added pooling kernel generation system with Python-based instance builder
  • Implemented benchmark infrastructure for pooling operations
  • Created configuration system for kernel parameters and GPU target selection

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tile_engine/ops/pooling/pool_profiler.hpp Profiler class for benchmarking pooling kernels with verification and performance metrics
tile_engine/ops/pooling/pool_instance_builder.py Python script to generate C++ kernel instances from JSON configurations
tile_engine/ops/pooling/pool_common.hpp Common type traits and data type definitions for pooling operations
tile_engine/ops/pooling/pool_benchmark_single.cpp Single kernel benchmark executable with argument parsing and verification
tile_engine/ops/pooling/pool_benchmark.py Python orchestration script for running and analyzing benchmark sweeps
tile_engine/ops/pooling/pool_benchmark.hpp Core data structures and enums for pooling benchmarks
tile_engine/ops/pooling/configs/default_config.json Default configuration for block sizes, warp configs, and traits
tile_engine/ops/pooling/README.md Comprehensive documentation for building, running, and configuring pooling benchmarks
tile_engine/ops/pooling/CMakeLists.txt CMake build system for generating and compiling individual pooling kernels
tile_engine/ops/gemm/README.md Added missing gpu_target argument to example command
tile_engine/ops/commons/test_benchmark.sh Extended test script to support both GEMM and pooling benchmarks
tile_engine/ops/CMakeLists.txt Added pooling subdirectory to build system

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"Sz": stride_z,
"Sy": stride_y,
"Sx": stride_x,
"pool_dim": pool_dim,
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter 'pool_dim' is included in the params dictionary but is not a valid command-line parameter for the benchmark executables according to pool_benchmark_single.cpp (which doesn't define a 'pool_dim' argument). This will be silently ignored or could cause errors. The pool dimension is determined by the kernel itself (POOL_DIM constant), not passed as an argument.

Suggested change
"pool_dim": pool_dim,

Copilot uses AI. Check for mistakes.
--config_json configs/user_provided_config.json \
--gen_all_individual
--gen_all_individual \
--gpu_target gfx942
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gemm TileEngine target is just gfx942?
And this change is related anyway to the adding pooling to TileEngine?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great comment. TileEngine supports on gfx90a and gfx1_ series.

Comment on lines +6 to +7
"block_n": {
"values": [1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If block_n is always 1 for pooling, does it make sense to hard code it in instance_builder or can theoretically select other values?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Block_n could also be 2 in pooling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a general question about the instance builder. In the GEMM tile engine, there’s a gemm_validation_utils.py module that is used by gemm_instance_builder to generate only valid instances that can later be compiled. However, it seems that for pooling, there isn’t a similar validation step. and, some checks are performed directly within pool_instance_builder. Would it make sense to adopt a similar structure for these tile engine operations to keep things same?

@ammallya
Copy link

ammallya commented Feb 3, 2026

Error importing due to merge conflicts – please reopen the PR on ROCm/rocm-libraries

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Comment on lines +6 to +7
"block_n": {
"values": [1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Block_n could also be 2 in pooling.

--config_json configs/user_provided_config.json \
--gen_all_individual
--gen_all_individual \
--gpu_target gfx942
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great comment. TileEngine supports on gfx90a and gfx1_ series.

# Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
# SPDX-License-Identifier: MIT

set(POOL_DATATYPE "fp16;fp32" CACHE STRING "List of datatypes for Pool (semicolon-separated)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also add fp8 here.

string(REPLACE "_" ";" trait_parts ${trait})
list(GET trait_parts 0 output_index)
list(GET trait_parts 1 propagate_nan)
list(GET trait_parts 2 pool_dim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have both the pool_dim in command line and the json input. Could we unify that and only keep 1? I suggest we only keep it in json.

#include "ck_tile/host/reference/reference_pool.hpp"
#include "pool_benchmark.hpp"

class PoolProfiler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could I know what is the difference between the pool_profiler and pool_benchmark_single.cpp? I didn't see the compilation of the pool_profiler.hpp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants