Skip to content

Scalar Values + Constant Arrays#56

Draft
gatesn wants to merge 4 commits intodevelopfrom
ngates/scalar-values
Draft

Scalar Values + Constant Arrays#56
gatesn wants to merge 4 commits intodevelopfrom
ngates/scalar-values

Conversation

@gatesn
Copy link
Copy Markdown
Contributor

@gatesn gatesn commented May 4, 2026

We have been toying with the idea of adding a ScalarValue::Array(ArrayRef) variant in order to efficiently store and access list-like scalars. But this drags all the baggage of arrays being lazy plans into the scalar space and essentially makes scalars unusable. Does the array need to be canonical? What if buffers live on-device? And so on...

This proposal strengthens the position of Scalar to mean "host-resident" flat value, and adds support for list-like scalars at higher levels, e.g. within a ConstantArray and within a Literal expression.

gatesn added 2 commits May 3, 2026 22:37
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Comment on lines +92 to +98
These dtypes do not have scalar values in the long-term model:

- list
- fixed-size-list
- variant
- extension values whose storage dtype is nested
- structs containing any field that is not scalar-compatible
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we make the expression lit hold a Constant Array instead?

- `child.dtype() == constant.dtype()`
- the outer constant length is `broadcast_len`
- the outer constant has no separate validity
- if `child` is itself a `ConstantArray`, construction normalizes by unwrapping to the inner child
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you never have a ConstantArray with broadcast_len==1?

Comment on lines +151 to +157
The expected costs are:

- one length-1 child array allocation for primitive constants
- one child-slot hop in constant fast paths
- updated constant kernels that read from the child instead of an inline scalar

The expected benefits are:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might really add up. Is there a way we can test this?

Comment on lines +215 to +228
### Scalar functions and broadcasting

Scalar functions already take `ArrayRef` inputs. This RFC keeps that interface unchanged.

The calling convention remains:

- every input array has logical length `args.row_count()`
- broadcasting is represented by `ConstantArray(len = args.row_count(), child = length_1_array)`
- naked length-1 arrays are not normal scalar function inputs, except for private sub-executions
such as evaluating an all-constant expression once

Scalar functions may still use constant fast paths by checking whether an input is a `ConstantArray`.
This RFC only changes what a constant contains. It does not require changing the scalar-function
execution API, adding argument materialization helpers, or changing `Columnar`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so nothing really changes

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a function to go from len one array to scalar to help the migration

Comment on lines +256 to +257
This is the right tradeoff for expression identity. Embedded arrays can be large, compressed,
deferred, or device-resident. Expression equality should not accidentally become array equality.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a huge footgun. Making equality of literal approx, definitely equal and false mean possibly equal

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we also make scalar equality ptr_eq?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing is that we cannot use eq for evaluation of the expression.

I think we don't want to use PartialEq or Eq but another trait? Overload equality is annoying here

Comment thread rfcs/0056-scalar-values.md
Comment on lines +303 to +312
pub fn lit_list(elements: ArrayRef) -> VortexResult<Expression>;

pub fn lit_fixed_size_list(elements: ArrayRef) -> VortexResult<Expression>;

pub fn lit_struct(
dtype: StructDType,
fields: impl IntoIterator<Item = (FieldName, ArrayRef)>,
) -> VortexResult<Expression>;

pub fn lit_variant(value: ArrayRef) -> VortexResult<Expression>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need these?

Comment on lines +324 to +328
impl ArrayRef {
pub fn row(&self, index: usize) -> VortexResult<ArrayRef> {
self.slice(index..index + 1)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires a execution context?


pub trait ScalarProbe {
fn dtype(&self) -> &DType;
fn get(&mut self, index: usize) -> VortexResult<Scalar>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We likely want a different type than Result? That is expensive to check

Comment on lines +344 to +353
Add a fallible, stateful probe API for those callers:

```rust
pub struct ProbeOptions {
// Exact fields intentionally left unspecified by this RFC.
}

pub trait ScalarProbe {
fn dtype(&self) -> &DType;
fn get(&mut self, index: usize) -> VortexResult<Scalar>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have an immediate use case for this?

Comment on lines +590 to +591
Primitive constants become slightly heavier. They now carry a length-1 child array instead of an
inline scalar. Hot constant kernels must be audited and benchmarked before this lands.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be very pronounced

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants