Skip to content

feat(sampling): add VarOptItemsUnion, serde, and cross-language compatibility tests#130

Open
Fengzdadi wants to merge 12 commits intoapache:mainfrom
Fengzdadi:main
Open

feat(sampling): add VarOptItemsUnion, serde, and cross-language compatibility tests#130
Fengzdadi wants to merge 12 commits intoapache:mainfrom
Fengzdadi:main

Conversation

@Fengzdadi
Copy link
Contributor

There's a lot of content, but so far all tests have passed.

  • Added VarOptUnion family (ID=14)
  • Added VarOptItemsUnion core implementation (including gadget/marks resolution)
  • Added serialization and deserialization for VarOptItemsSketch / VarOptItemsUnion
  • Tightened varopt deserialization header consistency check (empty flag and preLongs must be consistent)
  • Added VarOpt unit tests and serial roundtrip tests
  • Added Java/C++ -> Go VarOpt compatibility tests
  • Added and included sample files in serialization_test_data/generated_files/varopt.sk
  • Extended Go-side generation logic to generate varopt test data for the same Java/C++ scenarios

Copy link
Member

@proost proost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add go sketches?

packBoolsInto(out[markOffset:markOffset+markBytes], s.marks[:s.h])
}

copy(out[markOffset+markBytes:], itemsBytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check not empty before write "itemsBytes"? If an user has custom "ItemsSerDe", it is easy to make corrupted sketch

totalWeightR := 0.0
if r > 0 {
totalWeightR = math.Float64frombits(binary.LittleEndian.Uint64(data[24:]))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to invalidate that preLong is 4, but r == 0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can you add check totalWeightR is NaN?

}

if r == 0 {
out := &VarOptItemsSketch[T]{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to set capacities of each slice. becasue we use cap to determine grow or not.

}

// NewVarOptItemsSketchFromSlice deserializes a sketch from bytes.
func NewVarOptItemsSketchFromSlice[T any](data []byte, serde ItemsSerDe[T]) (*VarOptItemsSketch[T], error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need more stricter validation on prelong, h, n, r, k

}

func NewVarOptItemsUnion[T any](maxK int) (*VarOptItemsUnion[T], error) {
if maxK < 1 || maxK > varOptMaxK {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is duplicated. VarOptItemsSketch constructor covers it in L49.

}

func (u *VarOptItemsUnion[T]) Reset() error {
gadget, err := newVarOptItemsSketchAsGadget[T](u.maxK)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you just use "Reset" method in Sketch?

}, nil
}

func (u *VarOptItemsUnion[T]) MaxK() int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User knows what maxK is. What is purpose on this method?

assert.True(t, result.IsEmpty())
}

func TestVarOptItemsUnion_UpdateSketchExactMode(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need more test cases for update method.

  1. union sampling sketches which one of them has extreme heavy item.
  2. union identical sampling sketches.
  3. union with diffent k and weighted items.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants