file listings

Example proposed output, note x_parameterSchema

{
    "HAPI": "3.2",
    "x_createdAt": "2017-02-21T17:27Z",
    "modificationDate": "2026-01-01T00:00Z",
    "x_parameterSchema": "list>fileList>jpgFileList",
    "parameters": [
        {
            "length": 20,
            "name": "Time",
            "type": "isotime",
            "x_format": "$Y-$m-$dT$H:$M:$SZ",
            "fill": null,
            "units": "UTC",
            "timeStampLocation" : "begin"
        },
        {
            "description": "Picture of the creek, unmodified",
            "fill": null,
            "name": "fileURI",
            "length": 26,
            "type": "string",
            "units": null,
            "stringType": {
                "uri": {
                    "base": "https://cottagesystems.com/data/hapi/pics/",
                    "mediaType": "image/jpeg"
                }
            }
        },
        {
            "description": "File modification time",
            "name": "modificationDate",
            "type": "isotime",
            "fill": null,
            "x_format": "$Y-$m-$dT$H:$MZ",
            "length": 17,
            "units": "UTC"
        },
        {
            "description": "File size in kilobytes",
            "name": "fileSize",
            "fill": null,
            "type": "integer",
            "units": "KiB"
        }
    ],
    "sampleStartDate": "2023-01-01T00:00Z",
    "sampleStopDate": "2023-02-01T00:00Z",
    "startDate": "2022-11-01T00:00Z",
    "stopDate": "2026-03-06T00:00Z",
    "cadence": "PT10M",
    "status": {
        "code": 1200,
        "message": "OK"
    }
}

One issue is how to deal with the units on the file size. We could use IEEE units, which seem to be similar (the same?) as what is used in VO units, and astropy units, and probably also IEEE units: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9714443

Message sent 2026-04-06 to HAPI dev mailing list with status update:

For a summary of where we are now: We would like there to be a schema to indicate that a HAPI response is a listing of files that are available as URIs. (We did not provide this or encourage it so far because we don’t want providers just offering a file listing and saying they made their data available via HAPI.) If people do list files using HAPI, we would prefer that they all use the same format, so that it becomes possible to interpret file listings interoperably from any HAPI service. Therefore, we will offer a schema, that if followed, will allow clients to: a) know that they are getting a file listing, and b) be able to interpret such a listing from any server with computer precision using a single client.

The most basic file listing will be a HAPI dataset that has only 2 required columns:

a time column as the first column (required by HAIP for any dataset); for a file listing, this represents the start time of the data in the file
filename as a URI; this is a string column that has a special string sub-type of URI (this URI sub-type is part of the existing HAPI spec as of version 3.2) with a link to the file the start time of the data in the file. See here for URI string types: https://github.com/hapi-server/data-specification/blob/master/hapi-3.2.0/HAPI-data-access-spec-3.2.0.md#3616-the-stringtype-object

There can be optional elements after this for: file size, end time of data in the file, file modification time, file creation time, last file access time, checksum If any of these items are included, there are constraints that must be followed for them to be recognized by HAPI. Following any of these optional but constrained items, a dataset may include any number of other, additional columns relevant for these files, such as wavelength, frequency range, observed target, DOI, image type, quality flag, data version, processing level, etc. HAPI does not place any restriction on the number or structure of these additional columns. They just need to be valid HAPI parameters. Any “x_” items in these parameters are of course allowed, as always.

2026-04-20

Discussion about fileSize:

JavaScript does not even have integers, so what should size be? Pandering to JSON and JavaScript is hard since it doesn't have integers (or comments!)
Current thinking: use double and recommend that it be shown as an integer with as full precision as possible so that you get the exact value; if you are above 2GB (more digits than fits in double)
JavaScript: may lose precision for integers larger than 9007199254740991 (2^53 - 1)
see this binary presentation converter: https://www.binaryconvert.com/result_double.html
If a double is in this range: +/- 9,007,199,254,740,991 then represent it exactly, and this value will be represented exactly s a double
Discussed and abandoned: We could suggest that people add their own x_exactFileSize as a clandestine long by actually being a string type JSON; such as "123456789012345" (quotes make it a string to JSON, and then it requires special parsing, like a BigInt)
What about making fileSize as a string
Will summarize and clean this up tomorrow.
This is useful to show that most file sizes (much bigger than 2GB) would be precisely represented: https://www.binaryconvert.com/convert_double.html

See also: https://github.com/hapi-server/data-specification/issues/218

Sample info response for a file listing

{
   "HAPI": "3.3",
   "status": { "code": 1200, "message": "OK"},
   "$schema": "https://hapi-server.org/schemas/HAPI-3.2.json#info-fileListing",
   "startDate": "1998-001Z",
   "stopDate" : "2017-100Z",
   "parameters": [
       { "name": "time",
         "type": "isotime",
         "units": "UTC",
         "fill": null,
         "length": 24 },
       { "name": "fileURI",
         "type": "string",
         "stringType": {"uri": { "base": "https://sample.com/listing", "mediaType": "image/fits" } },
         "fill": null,
         "description": "solar images at 580 nm",
         "label": "filename"},
       { "name": "checksum",
         "type": "string",
         "length": 32,
         "stringType": {"checksum": { "algorithm": "md5" } },
         "fill": null,
         "description": "pre-calculated checksum using MD5 algorithm"},
       { "name": "stopDate",
         "type": "isotime",
         "length": 24,
         "units": "UTC",
         "fill": null,
         "description": "end date and time when the image was taken; integration times range from 10s to 30s",
         "label": "image stop date"}
   ]
}

How to handle duration of files and events

How to handle the fact that event listing and file listings involve content that has an intrinsic time range. Regular HAIP data content has each row associated with a point in time, at least with respect to the query for data.

We decided to keep the query mechanism and rules the same, and will just add a statement about the need to expand a query time range to include potential edge cases, something like: Because event lists and file listings refer to items with an implied durations, a HAPI query for items in this kind of list may need to be expanded, since the query will return only items whose start time falls in the query range. If a server wants to communicate a duration, the stopDate should be used.

How to handle duplicate times in file listings or event lists

Repeated time tags are allowed in fileListing or eventList data schemas. Equivalently, we could say that data must never be decreasing.

We just noticed that the HAPI spec never actually states that HAPI times must only ever increase. So we need to add that to the spec! The definitions for "monotonically increasing: vary, so we will avoid that language. The spec shoudl say that values can only ever increase, with no duplicates.

Comments on case and capitalization

Three places where we have specific capitalization:

http query parameters: we use snake case, such as include_parameters
camelCase everywhere else
AlertCamelCase for the name of the first column, the Time parameter (sort of, since it's only one word)

Defining the schema for what the parameters are

Like the unitsSchema and coordinateSystemSchema, we will use parameterSchema as the keyword.

Other options: datasetSchema - this means keywords outside the parameters have extra requirements

Could datsetSchema be an array? So far, these potential values are envisioned:

Should it just be called "dataType"? Do we need to worry about other usage of "schema"? We have "stringType" already.

AI summary for 2026-04-29

Meeting Summary for HAPI FileListings - working meeting

Quick recap

The team established file size representation standards and discussed document formatting requirements for file sizes, algorithm names, and schema conventions. They explored design considerations for event lists and file listings, including the handling of time ranges and the potential for shared base classes. The group concluded by addressing documentation updates regarding time tags and data schemas, while also discussing server-side data constraints and API query requirements.

Next steps

Jon to update documentation to allow repeated time tags for event and file listings
Jon to modify data section to state "Data time values must be monotonically increasing unless specified otherwise
Bob to update verifier to check for "not decreasing" instead of "increasing" for time tags
Jon to add keyword in info response to indicate parameter schema for file and event listings
Team to discourage but allow file listings that mix multiple datasets
Team to remove "file" from file URI and file size parameter names
Team to maintain consistent capitalization across API, metadata, and parameter names

Summary

File Size Representation Standards

The team discussed file size representation standards, focusing on units and formatting. They agreed that units should always be in bytes, with values formatted as integers, unless the file size exceeds 9 petabytes, in which case exponential notation should be used. Bob emphasized that the verifier would enforce these rules, while Jon captured the decisions in the wiki page.

Document Formatting and Schema Standards

The team discussed and refined text in a document, focusing on formatting and content related to file sizes, algorithm names, and schema conventions. They agreed to modify certain phrases, including replacing specific numbers with tilde symbols and adjusting the placement of parenthetical statements. Jeremy suggested listing algorithm names directly in the document to avoid external dependencies, and the team decided to include examples for clarity. They also discussed the handling of user-added columns and emphasized the importance of encouraging users to develop their own schemas for additional columns while maintaining consistency for file listings.

Event and File Listing Design

Jon and Jeremy discussed the design of a base class for event lists and file listings. They debated whether to have a common base class for both, but ultimately decided against it due to the different requirements of events and file listings. They also discussed the minimal requirements for a VO event, noting that the current structure includes RA and DEC information which may not be relevant for all use cases.

Event and File Listing Challenges

The team discussed the challenges of listing events and file listings, particularly regarding the need for descriptions and the handling of time ranges. They explored the possibility of using a base class for file listings as a subclass of events lists, but Jeremy expressed concerns about the implications of this change. The group agreed that additional API flags might be necessary to better communicate the nature of different types of listings, such as file listings, event listings, or availability listings. They also considered relaxing the rule about HAPI servers returning exact start and stop times for certain datasets.

Event Data Schema Overlap Discussion

The team discussed handling event listings and data schemas, focusing on how to manage overlapping events and time ranges. Bob proposed allowing the start and stop parameters in API queries to refer to different columns than the primary time index, which would better accommodate events with multiple stop times. Jeremy suggested that some events might be described by ranges rather than instances, and Jon agreed that the API should be flexible enough to handle both cases. The team also considered the potential server load of implementing complex queries and agreed that widening queries to include overlapping data might be a simpler solution.

Server Data Constraints and Time Handling

The team discussed implementing constraints on server-side data, particularly focusing on handling time-based queries and data with intrinsic duration. They agreed to maintain the current approach of requiring widened time ranges for event lists and file listings, rather than adding server complications. The group also debated whether to allow non-monotonic data, with Jeremy suggesting that allowing repetitions could be safe while Jon raised concerns about potential hash key issues. They concluded that while duplicate times would be allowed, they must be accompanied by stop dates to describe durations.

Documentation Updates for Time Tags

The team discussed changes to the documentation regarding time tags and data schemas. They agreed to allow repeated time tags in event lists and file listings, though discouraged for file listings. They decided to use "dataset type" instead of "dataset schema" to describe the overall constraints for a dataset. The team also aligned on using consistent capitalization for parameter names, with "time" as an exception to maintain consistency with existing usage. They left open the question of how to declare a particular dataset type in the info response.

AI-generated content may be inaccurate or misleading. Always check for accuracy.

2026-05-04

For identifying an info response, we are settling on datasetSchema, since there may be requirements levied on dataset-level options. For example, if we add a schema for ground-based datasets, we could require the location or geoLocation.

List of potential schemas:

fileListing
FAIR
eventList
groundMagnetometer (and other possible measurement types)
spaceMagnetometer

What about multiple schemas in one info response? A file listing schema could

info response:

{
   "HAPI": "3.3",
   "status": { "code": 1200, "message": "OK"},
   "startDate": "1998-001Z",
   "stopDate" : "2017-100Z",
   "coordinateSystemSchema" : { "schemaName": "SPASE", "schemaURI": "TBD"},
   "datasetSchema": { "fileListing" },
   "datasetSchema": { "fileListing": { "parameters" : { "startDate" ; "startDate", "uri": "uri" } } },

   "datasetSchema": { "fileListing": { "parameters" : { "startDate" ; "Time", "uri": "fileURL" },
                                        "dataset":    { "geoLocation" : "x_placeAsLatAndLon" } },
   # or just give the mappings
   "datasetSchema": { "fileListing": { "startDate" : " Time", "uri": "fileURL" } }
   # for ground magnetometers
   "datasetSchema": { "groundMagnetometer": { "vectorField" : "field" } },
   "datasetSchema": { "groundMagnetometer": { "vectorField" : [ "bx", "by", "bz"] } },
   "datasetSchema": { "groundMagnetometer": { "vectorFieldBaselineSubtracted" : "fieldBGSubtr" } },
   "datasetSchema": { "groundMagnetometer": { "vectorFieldBaselineSubtracted" : [ "bx_b", "by_b", "bz_b"] } },

   "datasetSchema": { "FAIR": { "dataset": {"licenseURL": "x_APL_license_URL" } },
   "datasetSchema": { "FAIR": { "dataset": {"licenseURL": "x_license_URL" } }
   # we concluded this is not appropriate (people should use the right HAPI keyword names!
}

After discussion, we realized that HAPI server creators control the dataset level keyword names, so we should not allow people to use non-HAPI compliant dataset keywords. I.e., we don't want a separate parameters and dataset elements - just parameters one, so we don't need a title for it.

So just this then:

{
   "HAPI": "3.3",
   "status": { "code": 1200, "message": "OK"},
   "startDate": "1998-001Z",
   "stopDate" : "2017-100Z",
   "coordinateSystemSchema" : { "schemaName": "SPASE", "schemaURI": "TBD"},
   "datasetSchema": "fileListing",
   "datasetSchema": { "fileListing": {} },  # allowed?, but same as above
   "datasetSchema": { "fileListing": { "parameterMap" : { "startDate" ; "Time", "uri": "fileURL" } } },
   # or for data:
   "datasetSchema": { "groundMagnetometer": { "vectorField" : "field" } },
   "datasetSchema": { "groundMagnetometer": { "vectorField" : [ "bx", "by", "bz"] } },
   "datasetSchema": { "groundMagnetometer": { "vectorFieldBaselineSubtracted" : "fieldBGSubtr" } },
   "datasetSchema": { "groundMagnetometer": { "vectorFieldBaselineSubtracted" : [ "bx_b", "by_b", "bz_b"] } },
}

file listings

Example proposed output, note x_parameterSchema

Message sent 2026-04-06 to HAPI dev mailing list with status update:

2026-04-20

How to handle duration of files and events

How to handle duplicate times in file listings or event lists

Comments on case and capitalization

Defining the schema for what the parameters are

AI summary for 2026-04-29

Quick recap

Next steps

Summary

File Size Representation Standards

Document Formatting and Schema Standards

Event and File Listing Design

Event and File Listing Challenges

Event Data Schema Overlap Discussion

Server Data Constraints and Time Handling

Documentation Updates for Time Tags

2026-05-04

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally