Skip to content

Improve performance of TPC-DS q72 #622

@andygrove

Description

@andygrove

What is the problem the feature request solves?

I ran our benchmark derived from TPC-DS @ sf=100 locally and saw that q72 shows the largest regression (measured in seconds rather than percentage) and was 754 seconds (12.5 minutes) slower with Comet enabled. Spark took 1.1 hours, and Comet took 1.3 hours.

This was based on a single run of all 99 queries in Spark and then again with Comet enabled.

Comet does not currently support the many sort-merge joins in the query, so Comet is only performing the initial file scans, filters, and exchanges (and sometimes sorts) before transitioning back to Spark for the joins.

This issue is for discussing possible solutions to avoid this regression.

Describe the potential solution

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions