Skip to content

2. basic ops #13

@ashlinrichardson

Description

@ashlinrichardson

Basic operations to support minimal data quality assessment, make life more live-able, and increase the ease and effectiveness for data-science swat-team deployments, all in the large-tabular-data context

  • path normalization for interop between environments (classify path format by OS and translate to native format)
  • data type detect: nominal, numeric, date, geo
  • date detect and format validation
  • data dictionary vs file matching
  • data dict normalization plus recovery from multiline cells
  • metadata: fields search, description search, w support for fuzzy matching
  • semantic matching
  • autodetect and application of human-readable lookups present in other tables
  • flatfile parsing -- all sets
  • dataset identification and integration
  • redundant records detection -- large data
  • lossless data compression
  • windowing for multitemporal analysis
  • low memory (large data) sorting, incl. but not limited to: by date!
  • not require specific install location
  • allow people to select versions for data
  • parse and filter largest files bypassing RAM memory limitation restrictions

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions