Skip to content

Optimize parquet chunk downloading strategy #10

@rdettai

Description

@rdettai

The parquet table downloads each column chunk individually. If a large proportion of the columns are used and there is a large number of row groups in the file, this implies many small downloads.

A strategy could be implemented to group the downloads of column chunks if

  • they are close
  • download parallelism is already high enough (having multiple downloads in parallel increases the total bandwidth)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions