Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains a large architectural rewrite of chasms runtime and compiler, it improves its execution speed roughly 2x on the industry standard coremark benchmark. I had long considered making chasm more performant a dead end, and that we are fundamentally limited by the type of interpretation which can be expressed in java bytecode. Last year I implemented a fusion algorithm that bumped the speed roughly 30%, but what I didn't realise is that this algorithm was suboptimal in a bunch of ways. Long story short we were fetching operands via a lambda to avoid the explosion of monomorphised instruction handlers, this was a decision I made at the time because it limited the scope and I generally assumed not a lot of performance had been left on the table.
In the last 6 months I have been working on a new runtime, written in a low level language that has some very impressive performance characteristics. This work is inspired by things I have learnt there and also from another open source runtime called stitch, check it out its very clever!
TLDR
We now create super instructions with the operands i and s, i meaning intermediate which are constants that are folded into the instruction, s being stack but notably stack with a slot index. Chasm has always stored locals on the stack and what this stack with index approach does is allow us to fuse both locals and normal stack operands using the same handler. Its a very clever trick which keeps code size/instruction cache under control. At compile time we calculate the the entire size needed for the call frame and the indicies for every instructions so there is no runtime overhead.
Further to this we also now lower wasms control flow instructions from a nested syntax representation to true jumps, this is something I started last year but it never showed the performance characteristics I needed. It transpires coupled with some other improvements it was meaningful. That being said theres still some performance on the table as we are maintaining an instruction stack for now, this can be removed in the future and changed to a real ip/pc and instruction pointer model.
Anyways, enjoy twice the performance. Shout out to GPT 5.4 which actually solved some very tricky compiler issues I ran intro, its remarkable what a good harness can do for these agents