diff --git a/README.md b/README.md index 54daf38f2..72ce5f06e 100644 --- a/README.md +++ b/README.md @@ -69,7 +69,7 @@ | [pivotp](docs/help/pivotp.md)✨
đŸģâ€â„ī¸đŸš€đŸĒ„ | Pivot CSV data. Features "smart" aggregation auto-selection based on data type & stats. | | [pragmastat](docs/help/pragmastat.md)
đŸ“‡đŸ¤¯đŸŽ˛đŸĒ„ | Compute pragmatic statistics using the [Pragmastat](https://pragmastat.dev/) library. Uses the stats cache to auto-filter non-numeric columns and support Date/DateTime columns. | | [pro](docs/help/pro.md) | Interact with the [qsv pro](https://qsvpro.dathere.com) API. | -| [profile](docs/help/profile.md)✨
đŸ“‡đŸ§ đŸ¤–đŸ“šâ›Šī¸ ![CKAN](docs/images/ckan.png) | Extract, derive & infer metadata from a CSV (local path or URL) - using the statistical profile of a dataset, mapped and driven by a configurable metadata [scheming](https://github.com/ckan/ckanext-scheming) YAML spec ([DCAT-US v3](https://resources.data.gov/resources/dcat-us3/), [DCAT-AP v3](https://semiceu.github.io/DCAT-AP/releases/3.0.0/), [Croissant 1.1](https://docs.mlcommons.org/croissant/docs/croissant-spec-1.1.html) and [Geoconnex](https://docs.geoconnex.us/reference/overview) profiles bundled), with optional CKAN/DCAT metadata discovery for URL inputs. This enables [FAIRification](https://www.go-fair.org/fair-principles/fairification-process/) at scale. | +| [profile](docs/help/profile.md)✨
đŸ“‡đŸ§ đŸ¤–đŸ“šâ›Šī¸ ![CKAN](docs/images/ckan.png) | Extract, derive & infer metadata from a CSV (local path or URL) - using the statistical profile of a dataset, mapped and driven by a configurable metadata [scheming](https://github.com/ckan/ckanext-scheming) YAML spec ([DCAT-US v3](https://resources.data.gov/resources/dcat-us3/), [DCAT-AP v3](https://semiceu.github.io/DCAT-AP/releases/3.0.0/) and [Croissant 1.1](https://docs.mlcommons.org/croissant/docs/croissant-spec-1.1.html) bundled; [Geoconnex](https://docs.geoconnex.us/reference/overview) when built with the `geoconnex` feature), with optional CKAN/DCAT metadata discovery for URL inputs. This enables [FAIRification](https://www.go-fair.org/fair-principles/fairification-process/) at scale. | | [prompt](docs/help/prompt.md)✨
đŸģâ€â„ī¸đŸ–Ĩī¸ | Open a file dialog to either pick a file as input or save output to a file. | | [pseudo](docs/help/pseudo.md)
đŸ”ŖđŸ‘† | [Pseudonymise](https://en.wikipedia.org/wiki/Pseudonymization) the value of the given column by replacing them with an incremental identifier. | | [py](docs/help/py.md)✨
đŸ“‡đŸ”Ŗ | Create a new computed column or filter rows by evaluating a Python expression on every row of a CSV file. Python's [f-strings](https://www.freecodecamp.org/news/python-f-strings-tutorial-how-to-use-f-strings-for-string-formatting/) is particularly useful for extended formatting, [with the ability to evaluate Python expressions as well](https://github.com/dathere/qsv/blob/4cd00dca88addf0d287247fa27d40563b6d46985/src/cmd/python.rs#L23-L31). [Requires Python 3.10 or greater](https://github.com/dathere/qsv/blob/master/docs/INTERPRETERS.md#building-qsv-with-python-feature). | diff --git a/docs/help/TableOfContents.md b/docs/help/TableOfContents.md index 64fbe4651..b641ebcd6 100644 --- a/docs/help/TableOfContents.md +++ b/docs/help/TableOfContents.md @@ -48,7 +48,7 @@ | [pivotp](pivotp.md)
[đŸģâ€â„ī¸](#legend "command powered/accelerated by vectorized query engine.")[🚀](#legend "multithreaded even without an index.")[đŸĒ„](#legend "\"automagical\" commands that uses stats and/or frequency tables to work \"smarter\" & \"faster\".") | Pivot CSV data. Features "smart" aggregation auto-selection based on data type & stats. | | [pragmastat](pragmastat.md)
[📇](#legend "uses an index when available.")[đŸ¤¯](#legend "loads entire CSV into memory, though `dedup`, `stats` & `transpose` have \"streaming\" modes as well.")[🎲](#legend "randomly generated or randomized output with a --seed option for reproducibility.")[đŸĒ„](#legend "\"automagical\" commands that uses stats and/or frequency tables to work \"smarter\" & \"faster\".") | Compute pragmatic statistics using the [Pragmastat](https://pragmastat.dev/) library. Uses the stats cache to auto-filter non-numeric columns and support Date/DateTime columns. | | [pro](pro.md) | Interact with the [qsv pro](https://qsvpro.dathere.com) API. | -| [profile](profile.md)
[📇](#legend "uses an index when available.")[🧠](#legend "expensive operations are memoized with available inter-session Redis/Disk caching for fetch commands.")[🤖](#legend "command uses Natural Language Processing or Generative AI.")[📚](#legend "has lookup table support, enabling runtime \"lookups\" against local or remote reference CSVs.")[â›Šī¸](#legend "uses Mini Jinja template engine.") [![CKAN](../images/ckan.png)](#legend "has CKAN-aware integration options.") | Extract, derive & infer metadata from a CSV (local path or URL) - using the statistical profile of a dataset, mapped and driven by a configurable metadata [scheming](https://github.com/ckan/ckanext-scheming) YAML spec ([DCAT-US v3](https://resources.data.gov/resources/dcat-us3/), [DCAT-AP v3](https://semiceu.github.io/DCAT-AP/releases/3.0.0/), [Croissant 1.1](https://docs.mlcommons.org/croissant/docs/croissant-spec-1.1.html) and [Geoconnex](https://docs.geoconnex.us/reference/overview) profiles bundled), with optional CKAN/DCAT metadata discovery for URL inputs. This enables [FAIRification](https://www.go-fair.org/fair-principles/fairification-process/) at scale. | +| [profile](profile.md)
[📇](#legend "uses an index when available.")[🧠](#legend "expensive operations are memoized with available inter-session Redis/Disk caching for fetch commands.")[🤖](#legend "command uses Natural Language Processing or Generative AI.")[📚](#legend "has lookup table support, enabling runtime \"lookups\" against local or remote reference CSVs.")[â›Šī¸](#legend "uses Mini Jinja template engine.") [![CKAN](../images/ckan.png)](#legend "has CKAN-aware integration options.") | Extract, derive & infer metadata from a CSV (local path or URL) - using the statistical profile of a dataset, mapped and driven by a configurable metadata [scheming](https://github.com/ckan/ckanext-scheming) YAML spec ([DCAT-US v3](https://resources.data.gov/resources/dcat-us3/), [DCAT-AP v3](https://semiceu.github.io/DCAT-AP/releases/3.0.0/) and [Croissant 1.1](https://docs.mlcommons.org/croissant/docs/croissant-spec-1.1.html) bundled; [Geoconnex](https://docs.geoconnex.us/reference/overview) when built with the `geoconnex` feature), with optional CKAN/DCAT metadata discovery for URL inputs. This enables [FAIRification](https://www.go-fair.org/fair-principles/fairification-process/) at scale. | | [prompt](prompt.md)
[đŸģâ€â„ī¸](#legend "command powered/accelerated by vectorized query engine.")[đŸ–Ĩī¸](#legend "part of the User Interface (UI) feature group.") | Open a file dialog to either pick a file as input or save output to a file. | | [pseudo](pseudo.md)
[đŸ”Ŗ](#legend "requires UTF-8 encoded input.")[👆](#legend "has powerful column selector support. See `select` for syntax.") | [Pseudonymise](https://en.wikipedia.org/wiki/Pseudonymization) the value of the given column by replacing them with an incremental identifier. | | [py](py.md)
[📇](#legend "uses an index when available.")[đŸ”Ŗ](#legend "requires UTF-8 encoded input.") | Create a new computed column or filter rows by evaluating a Python expression on every row of a CSV file. Python's [f-strings](https://www.freecodecamp.org/news/python-f-strings-tutorial-how-to-use-f-strings-for-string-formatting/) is particularly useful for extended formatting, [with the ability to evaluate Python expressions as well](https://github.com/dathere/qsv/blob/4cd00dca88addf0d287247fa27d40563b6d46985/src/cmd/python.rs#L23-L31). [Requires Python 3.10 or greater](https://github.com/dathere/qsv/blob/master/docs/INTERPRETERS.md#building-qsv-with-python-feature). | diff --git a/docs/help/profile.md b/docs/help/profile.md index 41721dd5d..aee136104 100644 --- a/docs/help/profile.md +++ b/docs/help/profile.md @@ -1,6 +1,6 @@ # profile -> Extract, derive & infer metadata from a CSV (local path or URL) - using the statistical profile of a dataset, mapped and driven by a configurable metadata [scheming](https://github.com/ckan/ckanext-scheming) YAML spec ([DCAT-US v3](https://resources.data.gov/resources/dcat-us3/), [DCAT-AP v3](https://semiceu.github.io/DCAT-AP/releases/3.0.0/), [Croissant 1.1](https://docs.mlcommons.org/croissant/docs/croissant-spec-1.1.html) and [Geoconnex](https://docs.geoconnex.us/reference/overview) profiles bundled), with optional CKAN/DCAT metadata discovery for URL inputs. This enables [FAIRification](https://www.go-fair.org/fair-principles/fairification-process/) at scale. +> Extract, derive & infer metadata from a CSV (local path or URL) - using the statistical profile of a dataset, mapped and driven by a configurable metadata [scheming](https://github.com/ckan/ckanext-scheming) YAML spec ([DCAT-US v3](https://resources.data.gov/resources/dcat-us3/), [DCAT-AP v3](https://semiceu.github.io/DCAT-AP/releases/3.0.0/) and [Croissant 1.1](https://docs.mlcommons.org/croissant/docs/croissant-spec-1.1.html) bundled; [Geoconnex](https://docs.geoconnex.us/reference/overview) when built with the `geoconnex` feature), with optional CKAN/DCAT metadata discovery for URL inputs. This enables [FAIRification](https://www.go-fair.org/fair-principles/fairification-process/) at scale. **[Table of Contents](TableOfContents.md)** | **Source: [src/cmd/profile.rs](https://github.com/dathere/qsv/blob/master/src/cmd/profile.rs)** | [📇](TableOfContents.md#legend "uses an index when available.")[🧠](TableOfContents.md#legend "expensive operations are memoized with available inter-session Redis/Disk caching for fetch commands.")[🤖](TableOfContents.md#legend "command uses Natural Language Processing or Generative AI.")[📚](TableOfContents.md#legend "has lookup table support, enabling runtime \"lookups\" against local or remote reference CSVs.")[â›Šī¸](TableOfContents.md#legend "uses Mini Jinja template engine.") [![CKAN](../images/ckan.png)](TableOfContents.md#legend "has CKAN-aware integration options.")