Running To-do List
Problem
Storing the ARTIS database on KNB requires the use of many .csv to ensure the data is preserved in a accessible and usable format. For our end users, a duckdb will provide many benefits. However, we want to make it as easy as possible to set up a duckdb for our end users without introducing even more technologies. They will already have a learning curve to querying and running analyses with duckdb (trying to minimize!). We want to make the uptake of our new duckdb distribution as smooth and reproducible as possible.
Solution
Call a single function and BAM, you have the power of the new ARTIS duckdb on your local computer!
Add an exploreARTIS function that standardizes and streamlines building the ARTIS duckdb pulling data directly from the ARTIS KNB record. Use @Anurag19101996's script as foundation to insert data in a standardized way into duckdb. Pulling data directly from KNB would also register every ARTIS data download through the built in KNB metric service!
Function arguments:
version: "latest" or "DOI" or "1.0.0"
model_run?: "FAO" of "SAU" (not sure how we will separate these in KNB. might not need if new DOI assigned to different data versions)
user_orcid: "https://orcid.org/0000-0002-9370-9128" (for signing into KNB)
- other KNB credentials needed?
path: file path for duckdb file
Questions
- Use DataOne API to request data OR
rdataone package?
- Is
rdataone just a R client for the DataONE API?
- How do we specify the KNB repository through the DataONE API and/or
rdataone package?
- Is it possible to download/pull KNB data directly into duckdb without saving it locally first? Insert R function into SQL query passed to duckdb?
- Find the persistent DOI for the ARTIS data and the versioned DOIs
Relevant info and resources
The Knowledge Network for Biocomplexity (KNB) data repository is a member of the DataOne network of data repositories.
Ideas for function name
exploreARTIS::build_artis_with_ducks() 🦆
build_artis_duckdb()
make_artis_duckdb()
setup_artis_duckdb()
Running To-do List
exploreARTIS/develop-build-knb-duckdbbranch.Rscripts need to live in./R/directory when developing an R packageroxygen2documentation decorators to the very top of the function script. It may pass checks and build when it is separated across the script, but lets stick to explected formatting conventions.#' @export process_knb_to_duckdbin roxygen2 header to signify the main function to export the the package NAMESPACE.roxygen2header instead of in scriptprocess_knb_to_duckdb()to replacedownload_dir <- "~/Downloads/artis_downloads"Problem
Storing the ARTIS database on KNB requires the use of many .csv to ensure the data is preserved in a accessible and usable format. For our end users, a duckdb will provide many benefits. However, we want to make it as easy as possible to set up a duckdb for our end users without introducing even more technologies. They will already have a learning curve to querying and running analyses with duckdb (trying to minimize!). We want to make the uptake of our new duckdb distribution as smooth and reproducible as possible.
Solution
Call a single function and BAM, you have the power of the new ARTIS duckdb on your local computer!
Add an
exploreARTISfunction that standardizes and streamlines building the ARTIS duckdb pulling data directly from the ARTIS KNB record. Use @Anurag19101996's script as foundation to insert data in a standardized way into duckdb. Pulling data directly from KNB would also register every ARTIS data download through the built in KNB metric service!Function arguments:
version: "latest" or "DOI" or "1.0.0"model_run?: "FAO" of "SAU" (not sure how we will separate these in KNB. might not need if new DOI assigned to different data versions)user_orcid: "https://orcid.org/0000-0002-9370-9128" (for signing into KNB)path: file path for duckdb fileQuestions
rdataonepackage?rdataonejust a R client for the DataONE API?rdataonepackage?Relevant info and resources
The Knowledge Network for Biocomplexity (KNB) data repository is a member of the DataOne network of data repositories.
rdataonepackage https://github.com/DataONEorg/rdataoneIdeas for function name
exploreARTIS::build_artis_with_ducks()🦆build_artis_duckdb()make_artis_duckdb()setup_artis_duckdb()