Skip to content

cdflib.cdfread.CDF() reads illegal paths built from legal paths with characters appended to it #328

@ErikPGJ

Description

@ErikPGJ

cdflib can, at least under some circumstances, read CDF files using illegal paths constructed using a legal path and adding characters to it at the end.

>>> import cdflib
>>> a = cdflib.cdfread.CDF('/media/erjo/juice/datasets/2023/07/13/JUICE_L1a_RPWI-LF-SID7_20230713T043826_V02.cdfINVALID')
>>> a.cdf_info()
CDFInfo(CDF=PosixPath('/tmp/tmp1btjyu9h.cdf'), Version='3.9.0', Encoding=6, Majority='Row_major', rVariables=[], zVariables=['Epoch', 'SCET', 'TIME_RELATIVE', 'HW_SWITCHES_1', 'HW_SWITCHES_2', 'ARTEFACTS', 'COMPONENT_MASK', 'SNAPSHOT_NUMBER', 'SAMPLING_RATE', 'N_SAMPLES', 'SEQ_COUNTER', 'DATA'], Attributes=[{'Acknowledgement': 'Global'}, {'Data_type': 'Global'}, {'Data_version': 'Global'}, {'Dataset_ID': 'Global'}, {'Descriptor': 'Global'}, {'Discipline': 'Global'}, {'DOI': 'Global'}, {'Generated_by': 'Global'}, {'Generated_with_software': 'Global'}, {'Generation_date': 'Global'}, {'Generation_time_UTC': 'Global'}, {'git_log_message_DC': 'Global'}, {'git_log_message_HF': 'Global'}, {'git_log_message_LF': 'Global'}, {'git_log_message_LP': 'Global'}, {'git_log_message_MB': 'Global'}, {'git_log_message_MM': 'Global'}, {'git_log_message_PL': 'Global'}, {'HTTP_LINK': 'Global'}, {'Instrument_type': 'Global'}, {'LINK_TEXT': 'Global'}, {'LINK_TITLE': 'Global'}, {'Loaded_SPICE_kernels': 'Global'}, {'Local_TM_source_files': 'Global'}, {'Logical_file_id': 'Global'}, {'Logical_source': 'Global'}, {'Logical_source_description': 'Global'}, {'Mission_group': 'Global'}, {'Parents': 'Global'}, {'PDS_collection_id': 'Global'}, {'PDS_start_time': 'Global'}, {'PDS_stop_time': 'Global'}, {'PI_affiliation': 'Global'}, {'PI_name': 'Global'}, {'Project': 'Global'}, {'RPWI_FSW_version': 'Global'}, {'Rules_of_use': 'Global'}, {'SDUS_updates': 'Global'}, {'Skeleton_version': 'Global'}, {'Software_version': 'Global'}, {'Source_name': 'Global'}, {'Spacecraft_clock_to_TT2000_time_conversion_linear_approximation_epoch': 'Global'}, {'Spacecraft_clock_to_TT2000_time_conversion_type': 'Global'}, {'spase_DatasetResourceID': 'Global'}, {'CATDESC': 'Variable'}, {'DISPLAY_TYPE': 'Variable'}, {'FIELDNAM': 'Variable'}, {'FILLVAL': 'Variable'}, {'FORMAT': 'Variable'}, {'LABLAXIS': 'Variable'}, {'MONOTON': 'Variable'}, {'TIME_BASE': 'Variable'}, {'UNITS': 'Variable'}, {'VALIDMIN': 'Variable'}, {'VALIDMAX': 'Variable'}, {'VAR_NOTES': 'Variable'}, {'VAR_TYPE': 'Variable'}, {'DEPEND_0': 'Variable'}], Copyright='\nCommon Data Format (CDF)\nhttps://cdf.gsfc.nasa.gov\nSpace Physics Data Facility\nNASA/Goddard Space Flight Center\nGreenbelt, Maryland 20771 USA\n(User support: gsfc-cdf-support@lists.nasa.gov)\n', Checksum=True, Num_rdim=0, rDim_sizes=[], Compressed=True, LeapSecondUpdate=None)

This behavor has been observed for

  • cdflib 1.3.9 and 1.3.3.
  • python 3.12.10 and 3.11.14.

There seems to be an upper bound to how many characters one can add to a legal path before it starts raising an error. This is the smallest amount of extra characters I could add which triggers an error for this particular example.

>>> s = '/media/erjo/juice/datasets/2023/07/13/JUICE_L1a_RPWI-LF-SID7_20230713T043826_V02.cdf' + 'A'*210
>>> len(s)
294
>>> a = cdflib.cdfread.CDF(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/nonhome_data/work_files/JUICE/pipeline_code/normal/rpwi_pipeline_venv/lib/python3.11/site-packages/cdflib/cdfread.py", line 90, in __init__
    if not path.is_file():
           ^^^^^^^^^^^^^^
  File "/nonstd_installs/pyenv/versions/3.11.14/lib/python3.11/pathlib.py", line 1267, in is_file
    return S_ISREG(self.stat().st_mode)
                   ^^^^^^^^^^^
  File "/nonstd_installs/pyenv/versions/3.11.14/lib/python3.11/pathlib.py", line 1013, in stat
    return os.stat(self, follow_symlinks=follow_symlinks)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [Errno 74] Bad message: '/media/erjo/juice/datasets/2023/07/13/JUICE_L1a_RPWI-LF-SID7_20230713T043826_V02.cdfAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'

Prepending a legal path with characters does not work.

a = cdflib.cdfread.CDF('INVALID/media/erjo/juice/datasets/2023/07/13/JUICE_L1a_RPWI-LF-SID7_20230713T043826_V02.cdf')

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions