Skip to content

improve tutorial kernel#34

Open
SimeonEhrig wants to merge 1 commit into
psychocoderHPC:topic-tutorial_IIfrom
SimeonEhrig:improveTutorialKernel
Open

improve tutorial kernel#34
SimeonEhrig wants to merge 1 commit into
psychocoderHPC:topic-tutorial_IIfrom
SimeonEhrig:improveTutorialKernel

Conversation

@SimeonEhrig

Copy link
Copy Markdown

No description provided.

@coderabbitai

coderabbitai Bot commented May 7, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: dab446ec-1fb4-4982-86c2-723985742cc6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines -36 to +30
Launching the Kernel
--------------------

On the host side, the pattern is straightforward:

1. Allocate buffers on the compute device.

.. literalinclude:: ../../snippets/example/050_kernel.cpp
:language: cpp
:start-after: BEGIN-TUTORIAL-allocateBuffers
:end-before: END-TUTORIAL-allocateBuffers
:dedent:

2. Copy input data to the device.

.. literalinclude:: ../../snippets/example/050_kernel.cpp
:language: cpp
:start-after: BEGIN-TUTORIAL-copyToDevice
:end-before: END-TUTORIAL-copyToDevice
:dedent:

3. Choose a frame specification.

.. literalinclude:: ../../snippets/example/050_kernel.cpp
:language: cpp
:start-after: BEGIN-TUTORIAL-kernelFrameSpec
:end-before: END-TUTORIAL-kernelFrameSpec
:dedent:

4. Enqueue the kernel.
.. literalinclude:: ../../snippets/example/050_kernel.cpp
:language: cpp
:start-after: BEGIN-TUTORIAL-kernelStructure
:end-before: END-TUTORIAL-kernelStructure
:dedent:

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against to show some code around the kernel launch. But I would keep it one code block, because it is much better readable. The only problem is, we cannot define a code region in a code region, without that the outer code region will display the BEGIN and END mark.

// BEGIN-TUTORIAL-kernelaround
// Copy input data to the device.
onHost::memcpy(queue, lhsBuffer, lhs);
onHost::memcpy(queue, rhsBuffer, rhs);
// ...

// BEGIN-TUTORIAL-kernelLaunch
// Let alpaka calculate a well-functioning `frameSpec` for you.
// This assumes that you are using `onAcc::makeIdxMap` in the kernel.
onHost::concepts::FrameSpec auto frameSpec = onHost::getFrameSpec<int>(device, Vec{numElements});

// Create a kernel object and enqueue it along with the `frameSpec´ and kernel arguments.
// Depending on how many tasks are still in the queue, the kernel may be executed immediately or after a delay.
queue.enqueue(frameSpec, KernelBundle{VectorAddKernel{}, resultBuffer, lhsBuffer, rhsBuffer});
// If you use a non-blocking queue, the kernel runs asynchronously with respect to the host.
// To synchronize the kernel, you must call `onHost::wait(queue)`.
// onHost::wait(queue);
// END-TUTORIAL-kernelLaunch

// Copy the result back and wait for completion before reading it.
onHost::memcpy(queue, result, resultBuffer);
onHost::wait(queue);
// BEGIN-TUTORIAL-kernelaround

If we display the code region BEGIN-TUTORIAL-kernelaround - BEGIN-TUTORIAL-kernelaround, the code will also show the comments BEGIN-TUTORIAL-kernelLaunch and BEGIN-TUTORIAL-kernelLaunch` in the source code.

I can bring back it. But this takes a little bit work. Also at the end of the page, there is the whole source code to provide the context and we already assume people read this code to understand the context on the pages before. Therefore I think it is not necessary.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why it is split into sections. It is a workaround for the fact that we can not handle nested includes.
This can be revisited later and is IMO not mandatory.

@SimeonEhrig SimeonEhrig May 11, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the technical perspective, that makes sense. But if we want to have the this documentation, I suggest simply to copy the code and set different markers. The maintenance overhead is small.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok kept yours

Comment on lines -104 to -121
Choosing the Correct Frame Specification
----------------------------------------

For a first implementation, frame selection should be boring.
The host chooses how much work is grouped into one frame, and the kernel then iterates over the valid data indices assigned to it.

.. literalinclude:: ../../snippets/example/050_kernel.cpp
:language: cpp
:start-after: BEGIN-TUTORIAL-kernelFrameSpec
:end-before: END-TUTORIAL-kernelFrameSpec
:dedent:

Rules of thumb:

- ``onHost::getFrameSpec<T>(device, extents)`` is the easiest way to get a reasonable first frame specification.
- Start with simple sizes. For 1D kernels, something around ``128`` to ``256`` elements per frame is usually a reasonable first try.
- When you have multiple dimensions, prefer more work in the fastest varying dimension, which is usually ``x``.
- Use a compile-time ``CVec`` frame extent when the kernel benefits from knowing the frame size at compile time.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the whole section, because it is not easy to explain how to select a good Frame Spec without explaining a lot of details. In my opinion, it is super frustrating that the tutorial tell you, that you have to select a good Frame Spec but does not explain how. And I think onHost::getFrameSpec does a good job at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants