Skip to content

Improve way parquet metadata size is handled #9

@rdettai

Description

@rdettai

The parquet metadata size is not known from the catalog. Having a dedicated call to the footer containing the metadata size would also be quite inefficient. This is why currently the first call downloads 1MB at the end of the file and hopes that the entire metadata will be within this range:

  • on one side 1MB is kind of large and download duration is not negligible
  • at the same time, parquet metadata can be large for files with many row groups

Solutions might be:

  • reduce the default size to 256KB and implement the logic that fetches the rest of the metadata if it didn't fit in the initial dl

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions