im curious what the performance is like for a Chunk implementation where there would be 256 separate lists as opposed to 1 list. these 256 lists would each represent a possible component type. id imagine that iterating through these chunks should be a bit quicker, since the final code would only need to fetch data for the actual component. it would also eliminate the need for a custom enumerator that takes the stride into account. this is all at the cost of more memory fragmentation.