improve tutorial kernel by SimeonEhrig · Pull Request #34 · psychocoderHPC/alpaka3

SimeonEhrig · 2026-05-07T12:16:30Z

No description provided.

coderabbitai · 2026-05-07T12:16:46Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: dab446ec-1fb4-4982-86c2-723985742cc6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

SimeonEhrig · 2026-05-07T12:23:46Z

-Launching the Kernel
--------------------
-
-On the host side, the pattern is straightforward:
-
-1. Allocate buffers on the compute device.
-
-  .. literalinclude:: ../../snippets/example/050_kernel.cpp
-    :language: cpp
-    :start-after: BEGIN-TUTORIAL-allocateBuffers
-    :end-before: END-TUTORIAL-allocateBuffers
-    :dedent:
-
-2. Copy input data to the device.
-
-  .. literalinclude:: ../../snippets/example/050_kernel.cpp
-    :language: cpp
-    :start-after: BEGIN-TUTORIAL-copyToDevice
-    :end-before: END-TUTORIAL-copyToDevice
-    :dedent:
-
-3. Choose a frame specification.
-
-  .. literalinclude:: ../../snippets/example/050_kernel.cpp
-    :language: cpp
-    :start-after: BEGIN-TUTORIAL-kernelFrameSpec
-    :end-before: END-TUTORIAL-kernelFrameSpec
-    :dedent:
-
-4. Enqueue the kernel.
+.. literalinclude:: ../../snippets/example/050_kernel.cpp
+  :language: cpp
+  :start-after: BEGIN-TUTORIAL-kernelStructure
+  :end-before: END-TUTORIAL-kernelStructure
+  :dedent:


I'm not against to show some code around the kernel launch. But I would keep it one code block, because it is much better readable. The only problem is, we cannot define a code region in a code region, without that the outer code region will display the BEGIN and END mark.

// BEGIN-TUTORIAL-kernelaround // Copy input data to the device. onHost::memcpy(queue, lhsBuffer, lhs); onHost::memcpy(queue, rhsBuffer, rhs); // ... // BEGIN-TUTORIAL-kernelLaunch // Let alpaka calculate a well-functioning `frameSpec` for you. // This assumes that you are using `onAcc::makeIdxMap` in the kernel. onHost::concepts::FrameSpec auto frameSpec = onHost::getFrameSpec<int>(device, Vec{numElements}); // Create a kernel object and enqueue it along with the `frameSpec´ and kernel arguments. // Depending on how many tasks are still in the queue, the kernel may be executed immediately or after a delay. queue.enqueue(frameSpec, KernelBundle{VectorAddKernel{}, resultBuffer, lhsBuffer, rhsBuffer}); // If you use a non-blocking queue, the kernel runs asynchronously with respect to the host. // To synchronize the kernel, you must call `onHost::wait(queue)`. // onHost::wait(queue); // END-TUTORIAL-kernelLaunch // Copy the result back and wait for completion before reading it. onHost::memcpy(queue, result, resultBuffer); onHost::wait(queue); // BEGIN-TUTORIAL-kernelaround

If we display the code region BEGIN-TUTORIAL-kernelaround - BEGIN-TUTORIAL-kernelaround, the code will also show the comments BEGIN-TUTORIAL-kernelLaunch and BEGIN-TUTORIAL-kernelLaunch` in the source code.

I can bring back it. But this takes a little bit work. Also at the end of the page, there is the whole source code to provide the context and we already assume people read this code to understand the context on the pages before. Therefore I think it is not necessary.

That's why it is split into sections. It is a workaround for the fact that we can not handle nested includes.
This can be revisited later and is IMO not mandatory.

From the technical perspective, that makes sense. But if we want to have the this documentation, I suggest simply to copy the code and set different markers. The maintenance overhead is small.

ok kept yours

SimeonEhrig · 2026-05-07T12:26:41Z

-Choosing the Correct Frame Specification
----------------------------------------
-
-For a first implementation, frame selection should be boring.
-The host chooses how much work is grouped into one frame, and the kernel then iterates over the valid data indices assigned to it.
-
-  .. literalinclude:: ../../snippets/example/050_kernel.cpp
-    :language: cpp
-    :start-after: BEGIN-TUTORIAL-kernelFrameSpec
-    :end-before: END-TUTORIAL-kernelFrameSpec
-    :dedent:
-
-Rules of thumb:
-
- ``onHost::getFrameSpec<T>(device, extents)`` is the easiest way to get a reasonable first frame specification.
- Start with simple sizes. For 1D kernels, something around ``128`` to ``256`` elements per frame is usually a reasonable first try.
- When you have multiple dimensions, prefer more work in the fastest varying dimension, which is usually ``x``.
- Use a compile-time ``CVec`` frame extent when the kernel benefits from knowing the frame size at compile time.


I removed the whole section, because it is not easy to explain how to select a good Frame Spec without explaining a lot of details. In my opinion, it is super frustrating that the tutorial tell you, that you have to select a good Frame Spec but does not explain how. And I think onHost::getFrameSpec does a good job at this point.

improve tutorial kernel

a2d1d9f

SimeonEhrig commented May 7, 2026

View reviewed changes

SimeonEhrig mentioned this pull request May 7, 2026

add tutorial kernel section alpaka-group/alpaka3#508

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve tutorial kernel#34

improve tutorial kernel#34
SimeonEhrig wants to merge 1 commit into
psychocoderHPC:topic-tutorial_IIfrom
SimeonEhrig:improveTutorialKernel

SimeonEhrig commented May 7, 2026

Uh oh!

coderabbitai Bot commented May 7, 2026

Review skipped

Uh oh!

SimeonEhrig May 7, 2026

Uh oh!

psychocoderHPC May 11, 2026

Uh oh!

SimeonEhrig May 11, 2026 •

edited

Loading

Uh oh!

psychocoderHPC May 11, 2026

Uh oh!

SimeonEhrig May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SimeonEhrig commented May 7, 2026

Uh oh!

coderabbitai Bot commented May 7, 2026

Review skipped

Uh oh!

SimeonEhrig May 7, 2026

Choose a reason for hiding this comment

Uh oh!

psychocoderHPC May 11, 2026

Choose a reason for hiding this comment

Uh oh!

SimeonEhrig May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

psychocoderHPC May 11, 2026

Choose a reason for hiding this comment

Uh oh!

SimeonEhrig May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SimeonEhrig May 11, 2026 •

edited

Loading