Skip to content

NPY file format for regular matrix data #21

@Lestropie

Description

@Lestropie

This idea was going to result in #15 getting peppered with repetitive comments, so I'm going to write it here separately instead.

TRX currently has novel handling of matrix dimensions & datatype for various data files, achieved via file names. When looking through the code in #15 I also see what looks like novel enumeration / single-character encoding of data type. This may be creating a novel solution for a problem for which many solutions already exist.

The NPY format provides an established solution for these issues. Matrix dimensions and data type (including endianness) are encoded in the file header as part of a dictionary literal. I've myself recently implemented C++ support for that format in MRtrix3/mrtrix3#2437. Using this file format as part of the higher-order TRX format would be fairly trivial for Python, in particular facilitating reading / writing of data with no dependence on TRX libraries, and for other languages the overhead would be no greater than that demanded by the current specification. Potential downsides are that features such as matrix dimensionality / size and data type would no longer be visible from a filesystem view (though they could be pretty easily seen just using head), and memory-mapping implementations would need to support loading from a non-zero offset into a file (which shouldn't be difficult, it's a common operation). But the upsides in terms of not reinventing the wheel may more than offset that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions