Skip to content

Fix group rule evaluation order with row-level rules#278

Draft
Copilot wants to merge 3 commits intomainfrom
copilot/fix-group-rules-validation
Draft

Fix group rule evaluation order with row-level rules#278
Copilot wants to merge 3 commits intomainfrom
copilot/fix-group-rules-validation

Conversation

Copy link
Contributor

Copilot AI commented Feb 17, 2026

Group rules were evaluated on the original dataset before row-level filtering. When a row failing a row-level rule would make a group invalid after removal, the group rule passed incorrectly.

class DiagnosisSchema(dy.Schema):
    invoice_id = dy.String(primary_key=True)
    diagnosis = dy.String(primary_key=True, regex="^[A-Z]{3}$")
    is_main = dy.Bool(nullable=False)

    @dy.rule(group_by=["invoice_id"])
    def exactly_one_main_diagnosis(cls) -> pl.Expr:
        return pl.col("is_main").sum() == 1

df = pl.DataFrame({
    "invoice_id": ["A", "A", "A"],
    "diagnosis": ["ABC", "ABD", "123"],  # "123" fails regex
    "is_main": [False, False, True],     # main diagnosis on invalid row
})

# Before: Returns 2 rows with is_main=False (invalid - no main diagnosis)
# After: Returns 0 rows (correctly fails validation)
good, _ = DiagnosisSchema.filter(df)

Changes

  • with_evaluation_rules(): Evaluate row-level rules first, then pass to group rule evaluation
  • _with_group_rules(): Accept simple_exprs parameter and filter to rows passing all row-level rules before computing group aggregations. Left join results back to original dataset (rows failing row-level rules get null for group rules, treated as valid).
  • Tests: Added test_filter_group_rule_after_row_rule_filtering covering interaction scenarios

Group rules now only see data that would remain after row-level filtering.

Original prompt

This section details on the original issue you should resolve

<issue_title>Validation result is sometimes incorrect when using group rules</issue_title>
<issue_description>In yesterday's PyData meetup in Zurich, one question prompted me to realize that we're incorrectly dealing with group rules and row-level rules: if a row-level rule removes a row which would make a group rule fail, we do not realize it. For example:

import dataframely as dy
import polars as pl

class DiagnosisSchema(dy.Schema):
    invoice_id = dy.String(primary_key=True)
    diagnosis = dy.String(primary_key=True, regex="^[A-Z]{3}$")
    is_main = dy.Bool(nullable=False)

    @dy.rule()
    def exactly_one_main_diagnosis() -> pl.Expr:
        return pl.col("is_main").sum() == 1

df = pl.DataFrame(
    {
        "invoice_id": ["A", "A", "A"],
        "diagnosis": ["ABC", "ABD", "123"],
        "is_main": [False, False, True],
    }
)
good, _ = DiagnosisSchema.filter(df)
print(good)

results in

shape: (2, 3)
┌────────────┬───────────┬─────────┐
│ invoice_id ┆ diagnosis ┆ is_main │
│ ---        ┆ ---       ┆ ---     │
│ str        ┆ str       ┆ bool    │
╞════════════╪═══════════╪═════════╡
│ A          ┆ ABC       ┆ false   │
│ A          ┆ ABD       ┆ false   │
└────────────┴───────────┴─────────┘

which clearly violates the schema since we don't have a main diagnosis for the group.</issue_description>

Comments on the Issue (you are @copilot in this section)

@borchero We should probably fix this by sequentializing group rules and execute them after the row-level rules.

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits February 17, 2026 14:34
- Group rules are now evaluated only on rows that pass row-level rules
- This ensures that rows failing row-level rules don't affect group validation
- Fixes issue where removing invalid rows could make groups invalid
- Added comprehensive tests for the fix

Co-authored-by: borchero <22455425+borchero@users.noreply.github.com>
Co-authored-by: borchero <22455425+borchero@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix validation result issue with group rules Fix group rule evaluation order with row-level rules Feb 17, 2026
Copilot AI requested a review from borchero February 17, 2026 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Validation result is sometimes incorrect when using group rules

2 participants