Skip to content

first_value doesn't work when applied to window function output #1300

@ntjohnson1

Description

@ntjohnson1

Describe the bug
A clear and concise description of what the bug is.
If I generate a column based on a window function then try to filter and select the first value it barfs

To Reproduce
Steps to reproduce the behavior:

import datafusion as dfn
from datafusion import lit, col, functions as F
from datafusion.expr import Window, WindowFrame

def main() -> None:
    ctx = dfn.SessionContext()
    df = ctx.from_pydict(
        {"any_row": list(range(10))},
    )
    df = df.select(
        "any_row",
        lit(1).alias("ones"),
    )
    df = df.select(
        "any_row",
        F.sum(col("ones"))\
            .over(Window(window_frame=WindowFrame("rows", None, 0), order_by=col("any_row").sort(ascending=True))) \
            .alias("forward_row_sum"),
        F.sum(col("ones"))\
            .over(Window(window_frame=WindowFrame("rows", None, 0), order_by=col("any_row").sort(ascending=False))) \
            .alias("reverse_row_sum"),
    )
    df.collect()
    df.select(
        F.first_value(col("forward_row_sum"), order_by=col("any_row"))
    ).collect()

    df.select(
        F.last_value(col("reverse_row_sum"), filter=col("reverse_row_sum") >= 5, order_by=col("any_row").sort(ascending=True))
    ).collect()

if __name__ == "__main__":
    main()
Traceback (most recent call last):
  File "/Users/nick/repos/bug.py", line 39, in <module>
    main()
    ~~~~^^
  File "/Users/nick/repos/bug.py", line 26, in main
    ).collect()
      ~~~~~~~^^
  File "/Users/nick/repos/.venv/lib/python3.13/site-packages/datafusion/dataframe.py", line 681, in collect
    return self.df.collect()
           ~~~~~~~~~~~~~~~^^
Exception: DataFusion error: NotImplemented("Physical plan does not support logical expression AggregateFunction(AggregateFunction { func: AggregateUDF { inner: FirstValue { name: \"first_value\", signature: Signature { type_signature: Any(1), volatility: Immutable }, accumulator: \"<FUNC>\" } }, params: AggregateFunctionParams { args: [Column(Column { relation: None, name: \"sum(ones) ORDER BY [c19e557aec20e49b985bb070e969ba68f.any_row ASC NULLS FIRST] ROWS BETWEEN UNBOUNDED PRECEDING AND 0 FOLLOWING\" })], distinct: false, filter: None, order_by: [Sort { expr: Column(Column { relation: Some(Bare { table: \"c19e557aec20e49b985bb070e969ba68f\" }), name: \"any_row\" }), asc: true, nulls_first: true }], null_treatment: Some(RespectNulls) } })")

Expected behavior
A clear and concise description of what you expected to happen.
That I get the first (or last) value.

Additional context
Add any other context about the problem here.

import datafusion as dfn
>>> dfn.__version__
'50.1.0'

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions