-
Notifications
You must be signed in to change notification settings - Fork 16
Description
This idea was going to result in #15 getting peppered with repetitive comments, so I'm going to write it here separately instead.
TRX currently has novel handling of matrix dimensions & datatype for various data files, achieved via file names. When looking through the code in #15 I also see what looks like novel enumeration / single-character encoding of data type. This may be creating a novel solution for a problem for which many solutions already exist.
The NPY format provides an established solution for these issues. Matrix dimensions and data type (including endianness) are encoded in the file header as part of a dictionary literal. I've myself recently implemented C++ support for that format in MRtrix3/mrtrix3#2437. Using this file format as part of the higher-order TRX format would be fairly trivial for Python, in particular facilitating reading / writing of data with no dependence on TRX libraries, and for other languages the overhead would be no greater than that demanded by the current specification. Potential downsides are that features such as matrix dimensionality / size and data type would no longer be visible from a filesystem view (though they could be pretty easily seen just using head), and memory-mapping implementations would need to support loading from a non-zero offset into a file (which shouldn't be difficult, it's a common operation). But the upsides in terms of not reinventing the wheel may more than offset that.