Skip to content

Releases: glamod/cdm_reader_mapper

v2.4.1

16 Apr 10:39
c796a22

Choose a tag to compare

2.4.1 (2016-04-16)

Contributor to this version: @ludwiglierhammer

New features and enhancements

  • mdf_mapper / cdm_mapper: add new project CMEMS for drifting iridium buoy data (PR/405)
  • mdf_mapper / cdm_mapper: new parameter "separator" to define filename separator while reading and writing files (PR/414)

Breaking changes

  • cdm_mapper: update element names in MAROB CDM mapping tables (PR/393)
  • cdm_mapper.util.mapping_functions: change default MAROB datetime string format to "%Y-%m-%dT%H:%M:%S" (PR/393)
  • cdm_mapper: keep pd.NA value and do not convert them to strings (PR/414)
  • test_data: load parquet files instead of csv files (GH/410, PR/414)
  • mdf_mapper / cdm_mapper: default file name extension is "pq" while reading and writing files (PR/414)

Bug fixes

  • duplicates: do not change data types when updating quality flags and history description (PR/408)
  • mdf_reader: decode data to "utf-8" to avoid misleading file encoding (PR/414)

Internal changes

  • cdm_mapper.util.mapping_functions: delete function convert_to_decimal (PR/393)

v2.4.0

01 Apr 09:12
57670bb

Choose a tag to compare

Contributors to this version: Ludwig Lierhammer (@ludwiglierhammer) and Jan Marius Willruth (@JanWillruth)

New features and enhancements

  • cdm_mapper.utils.mapping_functions: Add function gdac_pressure in anticipation of moving conversion steps to the mapper in the future. (PR/350)
  • mdf_reader/cdm_mapper: optionally, convert data types to strings when reading and writing data from/to disk (PR/401)
  • mdf_reader/cdm_mapper: optionally, convert data types from strings when reading and writing data from/to disk (PR/401)

Breaking changes

  • mdf_reader: Update and rename GDAC variable names (schemas/gdac) and code tables (codes/gdac) to align with current standards. (GH/341, PR/350)

  • cdm_mapper: (PR/350)

    • Update and rename GDAC variable names in tables/gdac.
    • Fix gdac_latitude and gdac_longitude (utils/mapping_functions.py) not being used in observations.json
  • mdf_reader/cdm_mapper: use parquet as default instead of csv when reading and writing data from/to disk (PR/401)

  • cdm_mapper: do not convert data types to strings while mapping to the CDM (GH/398, PR/401)

  • cdm_mapper: set default decimal_places from 0 to 1 for location_accuracy, report_time_accuracy, station_speed and station_course (PR/401)

Bug fixes

  • cdm_mapper.mapper.map_model: write data columns to df._attrs instead of df.attrs to avoid crashing class methods (GH/390, PR/391)
  • cdm_mapper.utils.mapping_functions: Change method_b in mapping_function.py to work with both str and int. (PR/350)
  • cdm_mapper.map_models: write columns directly as an attribute to result to avoid crashing further DataFrame methods (GH/394, PR/397)

v2.3.0

12 Mar 11:25
5b917e4

Choose a tag to compare

Contributors to this version: Ludwig Lierhammer (@ludwiglierhammer) and Jan Marius Willruth @JanWillruth)

New features and enhancements

  • mdf_reader.read_data now supports chunking (PR/360)

  • read and write both parquet and feather files including new parameter data_format (GH/353, PR/363):

    • mdf_reader.read_data,
    • mdf_reader.write_data
    • cdm_mapper.read_tables
    • cdm_mapper.write_tables
  • introduce ParquetStreamReader to replace pd.parsers.io.TextfileReader (GH/8, PR/348)

  • cdm_reader.map_model now supports both pd.DataFrame and ParquetStreamReader as output (PR/348)

  • common.replace_columns now supports both pd.DataFrame and ParquetStreamReader as output (PR/348)

  • cdm_mapper.utils.mapping_functions: new mapping function convert_to_decimal (PR/370)

  • test_data: add MAROB test data (PR/370)

  • mdf_reader.read_data: new parameter "delimiter" (PR370)

  • cdm_mapper.map_model's output now has attribute "attrs" where columns are stored (PR/379)

  • ParquetStreamReader now support item assignment (PR/383)

  • ParquetStreamReader now works with both list and tuple as input data (PR/383)

Breaking changes

  • DataBundle.stack_v and DataBundle.stack_h only support pd.DataFrames as input, otherwise raises an ValueError (PR/360)

  • set default for extension from psv to specified data_format (PR/363):

    • cdm_mapper.read_tables
    • cdm_mapper.write_tables
  • set default for extension from ``csv to specifieddata_format` in `mdf_reader.write_data` (PR/363)

  • mdf_reader.read_data: save dtypes in return DataBundle as pd.Series not dict (PR/363)

  • remove common.pandas_TextParser_hdlr (GH/8, PR/348)

  • cdm_reader_mapper now raises errors instead of logging them (PR/348)

  • DataBundle now converts all iterables of pd.DataFrame/pd.Series to ParquetStreamReader when initialized (PR/348)

  • all main functions in common.select now return a tuple of 4 (selected values, rejected values, original indexes of selected values, original indexes of rejected values) (PR/348)

  • move ParquetStreamReader and all corresponding methods to common.iterables to handle chunking outside of mdf_reader/cdm_mapper/core/metmetpy (GH/349, PR/348)

  • cdm_mapper.read_tables: if "suffix" is None no suffix is selected instead of the wildcard "*" (PR/379)

  • ParquetStreamReader.empty now is a property not a class method (PR/379)

  • cdm_mapper.utils.mapping_functions.string_add does no longer have parameters zfill_col and zfill (PR/383)

Bug fixes

  • replace "ICOADS-30-" with "ICOADS-300-" in icoads_r300 mapping tables (GH/385, PR/386)

Internal changes

  • re-work internal structure for more readability and better performance (PR/360)
  • use pre-defined Literal constants in cdm_reader_mapper.properties (PR/363)
  • mdf_reader.utils.utilities.read_csv: parameter columns to column_names (PR/363)
  • introduce post-processing decorator that handles both pd.DataFrame and ParquetStreamReader (PR/348)
  • cdm_mapper.mapper._map_data_model now returns a tuple of DataFrame and columns (PR/379)
  • delete unused function cdm_mapper.utils.mapping_functions.marob_location_quality (PR/383)
  • delete unreachable code snippets (PR/383)
  • mainly increase test coverage (:issue:365, PR/383)

v2.2.1

23 Jan 11:37
84f39cc

Choose a tag to compare

Contributors to this version: Ludwig Lierhammer (@ludwiglierhammer)

Bug fixes

  • cdm_reader_mapper.cdm_mapper: set indexes to input data indexed when setting default values (PR/356).

v2.2.0

23 Jan 10:28
26b18e4

Choose a tag to compare

Contributors to this version: Ludwig Lierhammer (@ludwiglierhammer) and Joseph Siddons (@jtsiddons)

Announcements

This release adds support for Python 3.14 (PR/339).

New features and enhancements

  • new parameter in function map_model (PR/327).

    • drop_duplicates: If True remove duplicated rows (default: True).
    • drop_missing_obs: If True remove observation rows without a valid observation_value (default: True).
  • new Pub47 testdata (test_data["test_pub47"]) (PR/327)

Breaking changes

  • cdm_reader_mapper.cdm_mapper: rename map_and_convert to helper function _map_and_convert (PR/343)
  • replace logging.error with raise error statements (PR/345)

Internal changes

  • implement map_model test for Pub47 data (GH/310, PR/327)

  • rename test data class from test_data to TestData (PR/327)

  • update .gitignore (PR/324)

  • update and add docstrings for multiple functions (PR/324)

  • cdm_reader_mapper.cdm_mapper: update mapping functions for more readability (PR/324)

  • cdm_reader_mapper.cdm_mapper: introduce some helper functions (PR/324)

  • add more unit tests (GH/311, PR/324)

  • cdm_reader_mapper.cdm_mapper: split map_and_convert into multiple helper functions (GH/333, PR/343)

  • exclude tests/*.py from pre-commit codespell hook (PR/345)

  • replace many os functions with pathlib.Path (PR/345)

  • re-work mdf_reader (GH/334, PR/345)

    • remove reader.MDFFileReader class
    • remove utils.configurator module
    • remove both utils.decoder and mdf_reader.utils.converter modules
    • introduce utils.parser module: bunch of functions to parse input data into MDF data
    • introduce utils.convert_and_decode: make converter and decoder functions more modular
    • make utils.validator module more modular
    • utils.filereader.FileReader uses utils.parser function for parsing
    • move many helper function to utils.utilities
    • serialize schemas.schemas module
  • add type hints and docstrings to mdf_reader (PR/345)

  • add unit tests for mdf_reader module to testing suite (PR/345)
    Bug fixes


  • add Pub47 mapping code tables (observing_frequency and vessel_type) (GH/308, PR/327)
  • observation tables are not empty anymore after mapping Pub47 raw data to the CDM (GH/309,PR/327)

v2.1.1

21 Oct 06:06
44d2068

Choose a tag to compare

Contributors to this version: Ludwig Lierhammer (@ludwiglierhammer), Joseph Siddons (@jtsiddons) and Jan Marius Willruth (@JanWillruth)

New features and enhancements

  • add encoding optional argument to cdm_reader_mapper.read_mdf and cdm_reader_mapper.read_data which overrides default value set by model schema if set (GH/268, PR/273).
  • cdm_reader_mapper.mdf_reader: Added preprocessing function to convert air pressure (PPPP) in IMMT format (PR/287)
  • cdm_reader_mapper.cdm_mapper: Added mapping functions for IMMT datetime, latitude, and longitude conversions (PR/287)
  • cdm_reader_mapper.cdm_mapper: New mapping function datetime_imma_d701 for icoads_r300_d701 (GH/288, PR/295)
  • cdm_reader_mapper.cdm_mapper: New mapping function datetime_imma1_to_utc for mapping local midday to UTC (GH/288, PR/295)

License and Legal

Updated copyright statements in LICENSE (GH/271, PR/272).

Breaking changes

  • cdm_reader_mapper: Replace "gcc" with "gdac" (PR/287)
  • cdm_reader_mapper: Update gdac schemas to adhere to IMMT-5 documentation (PR/287)
  • cdm_reader_mapper: combine icoads_r300_d701_type1 and icoads_r300_d701_type1 test and result data to icoads_r300_d701 (GH/288, PR/295)
  • cdm_reader_mapper.cdm_mapper: combine icoads_r300_d701_type1 and icoads_r300_d701_type1 mapping tables to icoads_r300_d701 (GH/288, PR/295)
  • cdm_reader_mapper.read: Allow strings as input for cdm_subset (PR/281)
  • cdm_reader_mapper.cdm_mapper: Remove timestamps and/or previous history information in column history (PR/281)
  • cdm_reader_mapper.DataBundle: Set empty pd.DataFrames as defaults for both data and mask (PR/281)
  • cdm_reader_mapper.mdf_reader: read drifter numbers as strings not as integers with C-RAID (PR/281)

Internal changes

  • tests: create test data result hidden directory (PR/291)
  • ```cdm_reader_mapper.mdf_reader``: update and tidy-up ICOADS mapping tables (PR/281)
  • timezonefinde is pinned below v7.0.0 (PR/281)

Bug fixes

  • cdm_reader_mapper.write_data: fix doubling of output file name (PR/273)
  • cdm_reader_mapper.cdm_mapper.mapping_functions: datetime conversion now ignores unformatable dates (GH/277, PR/278)
  • README: fixing hyperlink (GH/279, PR/280)
  • tests: raise OSError on checksum mismatch (PR/291)

v2.1.0

08 Apr 12:23
38ef56d

Choose a tag to compare

Contributors to this version: Ludwig Lierhammer (@ludwiglierhammer) and Joseph Siddons (@jtsiddons)

New features and enhancements

  • implement both wrapper functions read and write that call the appropriate function based on mode argument (PR/238):

    • mode == "mdf"; calls cdm_reader_mapper.read_mdf
    • mode == "data"; calls cdm_reader_mapper.read_data or cdm_reader_mapper.write_data
    • mode == "tables"; calls cdm_reader_mapper.read_tables or cdm_reader_mapper.write_tables
  • optionally, call cdm_reader_mapper.read_tables with either source file or source directory path (PR/238).

  • apply attribute to DataBundle.data if attribute is nor defined in DataBundle (PR/248).

  • apply pandas functions directly to DataBundle.data by calling DataBundle.<pandas-func> (PR/248).

  • make DataBundle support item assignment for DataBundle.data (PR/248).

  • optionally, apply selections to DataBundle.mask in DataBundle.select_* functions (PR/248).

  • cdm_reader.reader.read_tables: optionally, set null_label (PR/242)

  • new method function: DataBundle.select_where_all_false (PR/242)

  • new method functions: DataBundle.split_* which split a DataBundle into two new DataBundles containing data selected and rejected after user-defined selection criteria (PR/242)

    • DataBundle.split_by_boolean_true
    • DataBundle.split_by_boolean_false
    • DataBundle.split_by_column_entries
    • DataBundle.split_by_index
  • implement pandas indexer like iloc for not chunked data (PR/242)

Internal changes

  • cdm_reader_mapper.common.select: restructure, simplify and summarize functions (PR/242)
  • split DataBundle class into main class (cdm_reader_mapper.core._utilities) and method function class (cdm_reader_mapper.core.databundle) (PR/242)

Breaking changes

  • remove property tables from DataBundle object. Instead, DataBundle.map_model overwrites .DataBundle.data (PR/238).

  • set default overwrite values from True to False that is consistent with pandas inplace argument and rename overwrite to inplace (PR/238, PR/248).

  • inplace returns None that is consistent with pandas (PR/242)

  • DataBundle method functions return a DataBundle instead of a pandas.DataFrame (PR/248).

  • DataBundle.select_* functions write only selected entries to DataBundle.data and do not take other list entries from common.select_* function returns into account (PR/248).

  • select functions do not reset indexes by default (PR/242)

  • rename DataBundle.select_* functions:

    • DataBundle.select_true -> DataBundle.select_where_all_boolean
    • DataBundle.select_from_list -> DataBundle.select_where_entry_isin
    • DataBundle.select_from_index -> DataBundle.select_where_index_isin
  • rename cdm_reader_mapper.common.select_* functions and make them returning a tuple of selected and rejected data after user-defined selection criteria (PR/242):

    • select_true -> split_by_boolean_true
    • select_from_list -> split_by_column_entries
    • select_from_index -> spit_by_index

Bug fixes

  • cdm_reder_mapper.metmetpy: set deck keys from ??? to d??? in icoads json files which makes values accessible again (PR/238).
  • cdm_reder_mapper.metmetpy: set imma1 to icoads and immt to gcc in icoads/gcc json files which makes properties accessible again (PR/238).
  • DataBundle.copy function now makes a real deepcopy of DataBundle object (PR/248).
  • correct key index->section for self.df.attrs in open_netcdf (PR/252)
  • cdm_reader_mapper.map_model: return null_label if conversion fails (PR/242)
  • keep indexes during duplicate check (PR/242)

v2.0.1

25 Feb 09:20
78ed99e

Choose a tag to compare

Contributors to this version: Ludwig Lierhammer (@ludwiglierhammer) and Joseph Siddons (@jtsiddons)

Announcements

This release drops support for Python 3.9 and adds support for Python 3.13 (PR/228, PR/229)

New features and enhancements

  • add environment.yml file (PR/229)

  • cdm_reader_mapper now separates the optional dependencies into dev and docs recipes (PR/232).

    • $ python -m pip install cdm_reader_mapper # Install minimum dependency version
    • $ python -m pip install cdm_reader_mapper[dev] # Install optional development dependencies in addition
    • $ python -m pip install cdm_reader_mapper[docs] # Install optional dependencies for the documentation in addition
    • $ python -m pip install cdm_reader_mapper[all] # Install all the above for complete dependency version

Internal changes

  • GitHub workflow for testing_suite now uses uv for environment management, replacing micromamba (PR/228)
  • rename ci/requirements to CI and tidy up requirements/dependencies (PR/229)

v2.0.0

14 Feb 12:05
51bef32

Choose a tag to compare

Contributors to this version: Ludwig Lierhammer (@ludwiglierhammer) and Joseph Siddons (@jtsiddons)

New features and enhancements

  • New core DataBundle object including callable cdm_mapper, metmemtpy and operations methods (#84, #188, #197)
  • Update readthedocs documentation (#191, #197)
  • new function: write_data to write MDF data and validation mask according to write_tables for writing CDM tables (#201)
  • new function: read_data to read MDF data and validation mask according to read_tables for reading CDM tables (#201)
  • new property: DataBundle.encoding (#222)
  • add overwrite option to some DataBundel method functions (#224)

Breaking changes

  • cdm_mapper: map_model returns pandas.DataFrame instead of CDM dictionary (#189)
  • cdm_mapper: rename function cdm_to_ascii to write_tables (#182, #185)
  • cdm_mapper: update parameter names and list of functions read_tables and write_tables (#185)
  • main cdm_mapper, mdf_reader and duplicates modules are directly callable from cdm_reader_mapper (#188)
  • new list of imported submodules: [map_model, cdm_tables, read_tables, write_tables, duplicate_check and read_mdf] (#188)
  • removed list of imported submodules: [cdm_mapper, common, mdf_reader, metmetpy, operations] (#188)
  • remove imported submodules from cdm_mapper, mdf_reader (#188)
  • read_tables: returning DataBundle object (#188)
  • read_tables: resulting dataframe always includes multi-indexed columns (#188)
  • duplicates is now a direct submodule of cdm_reader_mapper (#188)
  • import read function from mdf_reader.read as read_mdf (#188)
  • read_mdf: returning DataBundle object (#188)
  • read_mdf: remove parameter out_path to dump attribute information on disk (#201)
  • move function open_code_table from common.json_dict to cdm_mapper.codes.codes (#221)
  • operations to common (#224)
  • cdm_mapper: rename table_writer to writer and table_reader to reader (#224)
  • mdf_reader: rename write to writer and read to reader (#224)
  • metmetpy: gather correction functions to correct module and validation functions to validate module (#224)
  • DataBundle: remove properties selected, deselected, tables_dup_flagged and tables_dups_removed (#224)

Internal changes

  • cdm_mapper: dtype conversion from write_tables to new submodule _conversions of map_model (#189)

  • cdm_mapper: rename mappings to _mapping_functions (#189)

  • cdm_mapper: mapping functions from mapper to new submodule _mappings (#189)

  • cdm_mapper: save utility functions from table_reader.py and table_writer.py to _utilities.py (#185)

  • reduce complexity of several functions (#25, #200):

    • mdf_reader.read.read
    • mdf_reader.validate.validate
    • mfd_reader.utils.decoders.signed_overpunch
    • cdm_mapper._mappings._mapping
    • metmetmpy.station_id.validate.validate
  • split mdf_reader.utils.auxiliary into mdf_reader.utils.filereader, mdf_reader.utils.configurator and mdf_reader.utils.utilities (#25, #200)

  • simplify cdm_mapper.read_tables function (#192)

  • mdf_reader: Refactored Configurator class, Configurator.open_pandas method, to handle looping through rows (#208, #210)

  • mdf_reader: Refactored Configurator class, Configurator.open_data method, to avoid creating a pre-validation missing_value mask (#216)

  • mdf_reader: move validate to utils.validators (#216)

  • mdf_reader: no need for multi-column key codes (e.g. ("core", "VS")) (#221)

  • mdf_reader.utils.validator: simplify function code_validation (#221)

  • cdm_mapper.codes.common: convert range-key properties to list (#221)

  • testing_suite: new chunksize test with icoads_r300_d721 (#222)

  • mdf_reader, cdm_nmapper: use model-depending encoding while writing data on disk (#222)

  • code restructuring (#224)

  • remove unused functions and methods (#224)

Bug fixes

  • Solve SettingWithCopyWarning (#151, #184)
  • mdf_reader: utils.converters.decode returns values not only None (#214)
  • mdf_reader: solving misleading reading due to German "umlauts"(#212, #214, #222)

v1.0.2

13 Nov 10:38
4722af1

Choose a tag to compare

Contributors to this version: Ludwig Lierhammer (@ludwiglierhammer)

Announcements

  • New PyPi Classifiers:

    • Development Status :: 5 - Production/Stable
    • Development Status :: Intended Audience :: Science/Research
    • License :: OSI Approved :: Apache Software License
    • Operating System :: OS Independent