refactor(trace): pass scalar Trace internal args by value and unify naming#861
refactor(trace): pass scalar Trace internal args by value and unify naming#861IvanaGyro wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the CPU and GPU trace internal functions by changing the parameter passing of cheap-to-copy types (such as bool and cytnx_uint64) from const references to pass-by-value, as well as standardizing parameter casing. The review feedback highlights several instances of unused parameters in functions like _trace_nd, _trace_nd_gpu, and Trace_internal_b, and suggests commenting out or omitting their names to prevent potential compiler warnings.
| void _trace_nd(Tensor &out, const Tensor &Tn, cytnx_uint64 Ndiag, cytnx_uint64 Nelem, | ||
| const std::vector<cytnx_uint64> &accu, | ||
| const std::vector<cytnx_uint64> &remain_rank_id, | ||
| const std::vector<cytnx_int64> &shape, const cytnx_uint64 &ax1, | ||
| const cytnx_uint64 &ax2) { | ||
| const std::vector<cytnx_int64> &shape, cytnx_uint64 ax1, cytnx_uint64 ax2) { |
There was a problem hiding this comment.
The parameters Nelem, accu, remain_rank_id, and shape are completely unused in the _trace_nd function (since the direct indexing loop is commented out). To prevent compiler warnings (such as -Wunused-parameter), we can comment out or omit the parameter names in the function definition while keeping the signature intact for the dispatchers.
| void _trace_nd(Tensor &out, const Tensor &Tn, cytnx_uint64 Ndiag, cytnx_uint64 Nelem, | |
| const std::vector<cytnx_uint64> &accu, | |
| const std::vector<cytnx_uint64> &remain_rank_id, | |
| const std::vector<cytnx_int64> &shape, const cytnx_uint64 &ax1, | |
| const cytnx_uint64 &ax2) { | |
| const std::vector<cytnx_int64> &shape, cytnx_uint64 ax1, cytnx_uint64 ax2) { | |
| void _trace_nd(Tensor &out, const Tensor &Tn, cytnx_uint64 Ndiag, cytnx_uint64 /*Nelem*/, | |
| const std::vector<cytnx_uint64> &/*accu*/, | |
| const std::vector<cytnx_uint64> &/*remain_rank_id*/, | |
| const std::vector<cytnx_int64> &/*shape*/, cytnx_uint64 ax1, cytnx_uint64 ax2) { |
There was a problem hiding this comment.
This is resolved by the stacked design rather than this commit. In #862, _trace_nd is replaced entirely by a single TraceImpl<T> that uses every parameter it derives (it computes the strided base offset and reduces the diagonal in place), so there is no commented-out body and no unused parameter to silence. This PR (#861) is only the signature/naming refactor of the pre-existing Contract-based bodies; on its own branch I omitted the unused names in the stub/2d paths (see f894e02), and the ND body that still had the commented-out loop is rewritten in the next PR.
Generated by Claude Code
| void Trace_internal_b(bool is_2d, Tensor &out, const Tensor &Tn, cytnx_uint64 Ndiag, | ||
| cytnx_uint64 Nelem, const std::vector<cytnx_uint64> &accu, | ||
| const std::vector<cytnx_uint64> &remain_rank_id, | ||
| const std::vector<cytnx_int64> &shape, const cytnx_uint64 &ax1, | ||
| const cytnx_uint64 &ax2) { | ||
| const std::vector<cytnx_int64> &shape, cytnx_uint64 ax1, | ||
| cytnx_uint64 ax2) { |
There was a problem hiding this comment.
The Trace_internal_b function only throws an error and does not use any of its parameters. Commenting out or omitting the parameter names prevents potential -Wunused-parameter compiler warnings.
| void Trace_internal_b(bool is_2d, Tensor &out, const Tensor &Tn, cytnx_uint64 Ndiag, | |
| cytnx_uint64 Nelem, const std::vector<cytnx_uint64> &accu, | |
| const std::vector<cytnx_uint64> &remain_rank_id, | |
| const std::vector<cytnx_int64> &shape, const cytnx_uint64 &ax1, | |
| const cytnx_uint64 &ax2) { | |
| const std::vector<cytnx_int64> &shape, cytnx_uint64 ax1, | |
| cytnx_uint64 ax2) { | |
| void Trace_internal_b(bool /*is_2d*/, Tensor &/*out*/, const Tensor &/*Tn*/, cytnx_uint64 /*Ndiag*/, | |
| cytnx_uint64 /*Nelem*/, const std::vector<cytnx_uint64> &/*accu*/, | |
| const std::vector<cytnx_uint64> &/*remain_rank_id*/, | |
| const std::vector<cytnx_int64> &/*shape*/, cytnx_uint64 /*ax1*/, | |
| cytnx_uint64 /*ax2*/) { |
There was a problem hiding this comment.
Done in f894e02 — Trace_internal_b (the bool stub that only raises an unsupported-type error) now has all parameter names commented out: void Trace_internal_b(Tensor & /*out*/, const Tensor & /*Tn*/, const bool /*is_2d*/, ...), so -Wunused-parameter stays quiet while the shared dispatch signature is preserved.
Generated by Claude Code
| void _trace_nd_gpu(Tensor &out, const Tensor &Tn, cytnx_uint64 Ndiag, cytnx_uint64 Nelem, | ||
| const std::vector<cytnx_uint64> &accu, | ||
| const std::vector<cytnx_uint64> &remain_rank_id, | ||
| const std::vector<cytnx_int64> &shape, const cytnx_uint64 &ax1, | ||
| const cytnx_uint64 &ax2) { | ||
| const std::vector<cytnx_int64> &shape, cytnx_uint64 ax1, cytnx_uint64 ax2) { |
There was a problem hiding this comment.
The parameters Nelem, accu, remain_rank_id, and shape are completely unused in the _trace_nd_gpu function. Commenting out or omitting the parameter names in the function definition prevents potential -Wunused-parameter compiler warnings while keeping the signature intact for the dispatchers.
void _trace_nd_gpu(Tensor &out, const Tensor &Tn, cytnx_uint64 Ndiag, cytnx_uint64 /*Nelem*/,
const std::vector<cytnx_uint64> &/*accu*/,
const std::vector<cytnx_uint64> &/*remain_rank_id*/,
const std::vector<cytnx_int64> &/*shape*/, cytnx_uint64 ax1, cytnx_uint64 ax2) {
There was a problem hiding this comment.
Same as the CPU case: in this PR (#861) the GPU _trace_nd_gpu is still the Contract-based body and does read its parameters, so there's nothing unused here yet. The next PR (#850) replaces it with the TraceNdKernel launcher, where the host helper now decodes the layout on the device and uses every derived value — no unused parameter remains. The stub cuTrace_internal_b does get its names commented out in f894e02.
Generated by Claude Code
IvanaGyro
left a comment
There was a problem hiding this comment.
(Review written by Claude on behalf of @IvanaGyro.)
LGTM. This is a behavior-preserving mechanical refactor: scalar/bool args go from const& to by-value (the right call for cytnx_uint64/bool — passing a primitive by const-ref is pure overhead), and the tn/ndiag/nelem names are unified to Tn/Ndiag/Nelem to match the rest of the family. The CPU .cpp/.hpp, GPU .cu/.hpp, and the Tracefunc_oii typedef all move in lockstep, so the dispatch table stays type-consistent.
No new tests needed — the trace bodies are unchanged and the existing *UniTensorTest.Trace / gpu_Trace suites cover the surface. Verified the cumulative stack (through #850) builds and all Trace tests pass on both openblas-cuda and mkl-cuda.
One nit, non-blocking: the PR title here is the sig-refactor, but note the GPU body is still the Contract-based implementation at this point in the stack — the actual CUDA kernels arrive in #850. That's fine for a stacked series; just flagging so the commit message isn't read as "GPU kernels landed here."
dd601ed to
92ae99d
Compare
d85fd93 to
1e86067
Compare
…aming The Trace internal dispatchers (Trace_internal_*, cuTrace_internal_*) and the Tracefunc_oii function-pointer typedef took their scalar arguments (is_2d, Ndiag, Nelem, ax1, ax2) by const reference. Passing scalars by const reference adds an indirection without benefit; the convention in this codebase is to take small parameters by value. Vector and Tensor arguments remain by const reference. Several dispatchers also named the same parameters inconsistently (tn, ndiag, nelem in the integer-typed overloads versus Tn, Ndiag, Nelem elsewhere). Unify on Tn/Ndiag/Nelem across every overload so the signatures read uniformly. This commit is scoped to the Trace API surface; other linalg_internal typedefs are left untouched. Co-authored-by: Claude <noreply@anthropic.com>
92ae99d to
f894e02
Compare
|
Acting on your "merge into #862 and #850" — done. Rebased the chain so #861 is dropped: Closing #861 now. Generated by Claude Code |
Summary
The Trace internal dispatchers (
Trace_internal_*,cuTrace_internal_*) and theTracefunc_oiifunction-pointer typedef took their scalar arguments (is_2d,Ndiag,Nelem,ax1,ax2) by const reference. Passing scalars by const reference adds an indirection without benefit; the convention in this codebase is to take small parameters by value. Vector and Tensor arguments stay by const reference.Several dispatchers also named the same parameters inconsistently (
tn,ndiag,nelemin the integer-typed overloads versusTn,Ndiag,Nelemelsewhere). Unify onTn/Ndiag/Nelemacross every overload so the signatures read uniformly.Scope is strictly the Trace API surface: only
Tracefunc_oiiand the eleven Trace dispatcher signatures change. Otherlinalg_internaltypedefs are left untouched.Why split
This is a pure mechanical refactor that's easy to review on its own; landing it first keeps the follow-up "compute Trace in the container layer" PR focused on the algorithm rather than mixing in cosmetic signature churn.
Test plan
openblas-cpu: full suite 955/955 passed (includes fix(sum): use pairwise summation for floating-point linalg::Sum #849's re-enabled Sum-accuracy tests + newStrideViewtests from the stride_view PR; no Trace behavior change so the existingDenseUniTensorTest.Traceand friends still pass).mkl-cpu: full suite 955/955 passed.pre-commit(clang-format-14 + EOF + whitespace) clean.Draft — opened for review.
Generated by Claude Code