improve tutorial kernel#34
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| Launching the Kernel | ||
| -------------------- | ||
|
|
||
| On the host side, the pattern is straightforward: | ||
|
|
||
| 1. Allocate buffers on the compute device. | ||
|
|
||
| .. literalinclude:: ../../snippets/example/050_kernel.cpp | ||
| :language: cpp | ||
| :start-after: BEGIN-TUTORIAL-allocateBuffers | ||
| :end-before: END-TUTORIAL-allocateBuffers | ||
| :dedent: | ||
|
|
||
| 2. Copy input data to the device. | ||
|
|
||
| .. literalinclude:: ../../snippets/example/050_kernel.cpp | ||
| :language: cpp | ||
| :start-after: BEGIN-TUTORIAL-copyToDevice | ||
| :end-before: END-TUTORIAL-copyToDevice | ||
| :dedent: | ||
|
|
||
| 3. Choose a frame specification. | ||
|
|
||
| .. literalinclude:: ../../snippets/example/050_kernel.cpp | ||
| :language: cpp | ||
| :start-after: BEGIN-TUTORIAL-kernelFrameSpec | ||
| :end-before: END-TUTORIAL-kernelFrameSpec | ||
| :dedent: | ||
|
|
||
| 4. Enqueue the kernel. | ||
| .. literalinclude:: ../../snippets/example/050_kernel.cpp | ||
| :language: cpp | ||
| :start-after: BEGIN-TUTORIAL-kernelStructure | ||
| :end-before: END-TUTORIAL-kernelStructure | ||
| :dedent: |
There was a problem hiding this comment.
I'm not against to show some code around the kernel launch. But I would keep it one code block, because it is much better readable. The only problem is, we cannot define a code region in a code region, without that the outer code region will display the BEGIN and END mark.
// BEGIN-TUTORIAL-kernelaround
// Copy input data to the device.
onHost::memcpy(queue, lhsBuffer, lhs);
onHost::memcpy(queue, rhsBuffer, rhs);
// ...
// BEGIN-TUTORIAL-kernelLaunch
// Let alpaka calculate a well-functioning `frameSpec` for you.
// This assumes that you are using `onAcc::makeIdxMap` in the kernel.
onHost::concepts::FrameSpec auto frameSpec = onHost::getFrameSpec<int>(device, Vec{numElements});
// Create a kernel object and enqueue it along with the `frameSpec´ and kernel arguments.
// Depending on how many tasks are still in the queue, the kernel may be executed immediately or after a delay.
queue.enqueue(frameSpec, KernelBundle{VectorAddKernel{}, resultBuffer, lhsBuffer, rhsBuffer});
// If you use a non-blocking queue, the kernel runs asynchronously with respect to the host.
// To synchronize the kernel, you must call `onHost::wait(queue)`.
// onHost::wait(queue);
// END-TUTORIAL-kernelLaunch
// Copy the result back and wait for completion before reading it.
onHost::memcpy(queue, result, resultBuffer);
onHost::wait(queue);
// BEGIN-TUTORIAL-kernelaroundIf we display the code region BEGIN-TUTORIAL-kernelaround - BEGIN-TUTORIAL-kernelaround, the code will also show the comments BEGIN-TUTORIAL-kernelLaunch and BEGIN-TUTORIAL-kernelLaunch` in the source code.
I can bring back it. But this takes a little bit work. Also at the end of the page, there is the whole source code to provide the context and we already assume people read this code to understand the context on the pages before. Therefore I think it is not necessary.
There was a problem hiding this comment.
That's why it is split into sections. It is a workaround for the fact that we can not handle nested includes.
This can be revisited later and is IMO not mandatory.
There was a problem hiding this comment.
From the technical perspective, that makes sense. But if we want to have the this documentation, I suggest simply to copy the code and set different markers. The maintenance overhead is small.
| Choosing the Correct Frame Specification | ||
| ---------------------------------------- | ||
|
|
||
| For a first implementation, frame selection should be boring. | ||
| The host chooses how much work is grouped into one frame, and the kernel then iterates over the valid data indices assigned to it. | ||
|
|
||
| .. literalinclude:: ../../snippets/example/050_kernel.cpp | ||
| :language: cpp | ||
| :start-after: BEGIN-TUTORIAL-kernelFrameSpec | ||
| :end-before: END-TUTORIAL-kernelFrameSpec | ||
| :dedent: | ||
|
|
||
| Rules of thumb: | ||
|
|
||
| - ``onHost::getFrameSpec<T>(device, extents)`` is the easiest way to get a reasonable first frame specification. | ||
| - Start with simple sizes. For 1D kernels, something around ``128`` to ``256`` elements per frame is usually a reasonable first try. | ||
| - When you have multiple dimensions, prefer more work in the fastest varying dimension, which is usually ``x``. | ||
| - Use a compile-time ``CVec`` frame extent when the kernel benefits from knowing the frame size at compile time. |
There was a problem hiding this comment.
I removed the whole section, because it is not easy to explain how to select a good Frame Spec without explaining a lot of details. In my opinion, it is super frustrating that the tutorial tell you, that you have to select a good Frame Spec but does not explain how. And I think onHost::getFrameSpec does a good job at this point.
No description provided.