Skip to content

Questions/suggestions on performance comparisons in the paper #10

@mreineck

Description

@mreineck

It would be great if some information could be added to the SHT performance comparison section (IV-B2), which makes it easier to assess the relative performance of the various implementations.

  • How was ducc installed on the testing machine? Did you use the recommended method of compiling from source (using pip3 install --no-binary ducc0 --user ducc0) or did you use a precompiled binary wheel? Since your target CPU supports AVX-512, while the portable binary wheel can only support SSE-2, this makes a large performance difference.
  • What was the relation between lmax and nside in the SHT performance tests?
  • Is there a strong reason to only use 20 cores of the CPU for ducc? A node with AMD EPYC 9454 should have at least 48 CPU cores, and utilizing it to its full capability seems fair, given that the employed GPU is in a whole different ballpark price-wise.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions