Skip to content

[Feature]: Document that Report object is mutable #71

@alexverse

Description

@alexverse

Guidelines

  • I agree to follow this project's Contributing Guidelines.

Description

Using mutable objects in a data workflow may destroy reproducibility evidence.

Example:

library(dplyr, warn.conflicts = FALSE)
library(data.validator)
library(assertr)

validator_a <- function(data_) {
  report <- data_validation_report()
  validate(data_) %>%
    validate_cols(
      \(x) not_na(x),
      Sepal.Length,
      description = "Sepal.Length not_na"
    ) %>%
    add_results(report)
  report
}

validator_b <- function(data_, report) {
  validate(data_) %>%
    validate_if(
      nrow(data_) > 0,
      description = "Non empty table"
    ) %>%
    add_results(report)
  report
}


report_a <- validator_a(iris)
print(report_a)
#> Validation summary: 
#>  Number of successful validations: 1
#>  Number of failed validations: 0
#>  Number of validations with warnings: 0
#> 
#> Advanced view: 
#> 
#> 
#> |table_name |description         |type    | total_violations|
#> |:----------|:-------------------|:-------|----------------:|
#> |data_      |Sepal.Length not_na |success |               NA|

report_b <- validator_b(iris, report_a)

### eport_a is mutated
print(report_a)
#> Validation summary: 
#>  Number of successful validations: 1
#>  Number of failed validations: 1
#>  Number of validations with warnings: 0
#> 
#> Advanced view: 
#> 
#> 
#> |table_name |description         |type    | total_violations|
#> |:----------|:-------------------|:-------|----------------:|
#> |data_      |Non empty table     |error   |                1|
#> |data_      |Sepal.Length not_na |success |               NA|

Created on 2023-06-20 with reprex v2.0.2

Problem

The report_a object changes and for a functional approach in data analysis workflow this may be non-expected behavior for most users.

Proposed Solution

Update documentation and highlight that reference semantics are used and the Report can be passed to downstream functions using R6 clone() method.

Example:

library(dplyr, warn.conflicts = FALSE)
library(data.validator)
library(assertr)

validator_a <- function(data_) {
  report <- data_validation_report()
  validate(data_) %>%
    validate_cols(
      \(x) not_na(x),
      Sepal.Length,
      description = "Sepal.Length not_na"
    ) %>%
    add_results(report)
  report
}

validator_b <- function(data_, report) {
  validate(data_) %>%
    validate_if(
      nrow(data_) > 0,
      description = "Non empty table"
    ) %>%
    add_results(report)
  report
}


report_a <- validator_a(iris)
print(report_a)
#> Validation summary: 
#>  Number of successful validations: 1
#>  Number of failed validations: 0
#>  Number of validations with warnings: 0
#> 
#> Advanced view: 
#> 
#> 
#> |table_name |description         |type    | total_violations|
#> |:----------|:-------------------|:-------|----------------:|
#> |data_      |Sepal.Length not_na |success |               NA|

report_b <- validator_b(iris, report_a$clone())

### eport_a is mutated
print(report_a)
#> Validation summary: 
#>  Number of successful validations: 1
#>  Number of failed validations: 0
#>  Number of validations with warnings: 0
#> 
#> Advanced view: 
#> 
#> 
#> |table_name |description         |type    | total_violations|
#> |:----------|:-------------------|:-------|----------------:|
#> |data_      |Sepal.Length not_na |success |               NA|

Created on 2023-06-20 with reprex v2.0.2

Alternatives Considered

Maybe refactor so that non-standard reference semantics are used.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions