LICM: Migrate from invalidates to orderedBefore#8743
Conversation
Replace the coarse `invalidates` check and the coarse global state check in LICM with more precise `orderedBefore` checks. This allows LICM to move memory accesses past release stores, while still correctly blocking them from moving past acquire loads. Add a lit test to verify the asymmetrical reordering behavior with release/acquire atomics on shared memory/GC structs.
| // stores. | ||
| EffectAnalyzer loopGlobalEffects = loopEffects; | ||
| loopGlobalEffects.localsRead.clear(); | ||
| loopGlobalEffects.localsWritten.clear(); |
There was a problem hiding this comment.
All this can be done once outside the loop, on loopEffects itself? (perhaps also renaming it)
There was a problem hiding this comment.
The last commit includes this simplification.
| (effects.readsMutableGlobalState() && | ||
| loopEffects.writesGlobalState()); | ||
| effectsSoFar.orderedBefore(effects) || | ||
| loopGlobalEffects.orderedBefore(globalEffects); |
There was a problem hiding this comment.
loopGlobalEffects includes globalEffects, so I am not sure how to interpret this? I mean, this is asking if the effects of B can be moved before [A,B,C]
There was a problem hiding this comment.
Yes, because the loop body is [A,B,C] and if we are pulling B up out of the loop, it is being moved before [A,B,C] x N for an unknown N loop iterations.
There was a problem hiding this comment.
I guess, conceptually, I didn't realize this is meaningful. I assumed orderedBefore(x,y) assumes x is literally before y, and asks whether it must remain so.
The comments in effects.h say that: Checks if these effects must remain ordered before another set of effects ; things appear after each other. Should we update them?
There was a problem hiding this comment.
Your interpretation is correct except that the "before" and "after" are execution order, not program order. Since we're talking about loops, the entire loop body is executed both before and after any particular instruction in the loop. If the loop body must remain ordered before that particular instruction, then we cannot move execution of that instruction before the previous executions of the loop body and hoist the instruction out of the loop.
There was a problem hiding this comment.
Fair point about execution order, but - at the risk of being annoying - isn't it the same issue? The docs say "x is before y" but in this case we are using it when x is within y? It may be safe to do so (for the reasons you say here), but I'm saying the docs should also be updated?
There was a problem hiding this comment.
I've pushed some fixes to the docs on orderedBefore. They are now more careful to discuss execution order rather than program appearance order.
I don't think we need to say anything about cases where "A executed before B" is not the only relationship between A and B. It's the only relationship that matters for using this API.
There was a problem hiding this comment.
I don't think we need to say anything about cases where "A executed before B" is not the only relationship between A and B.
But if A and B overlap - if A is the parent of B - then their executions overlap? It is literally not true that A executed before B, in this case. Where am I confused?
I'll add a comment draft to my review that might explain what kind of comment I am looking for.
There was a problem hiding this comment.
It's a loop! A executes before, during, and after B. FWIW, A executes before, during, and after itself. These are not mutually exclusive in the presence of loops and recursion.
Add tests in test/lit/passes/licm.wast verifying that local variable dependency tracking (RAW/WAR conflicts across loop boundaries and logic inversion bypass) is correctly preserved and does not allow incorrect reorderings. TAG=agy CONV=2c655195-5fe6-42ef-ac14-08d071b29d85
| (br_if $loop (i32.const 0)) | ||
| ) | ||
| (local.get $x) | ||
| ) |
There was a problem hiding this comment.
What is being tested by these? (and how does it relate to this PR?)
There was a problem hiding this comment.
When I first tried to make simplifications based on your suggestion to clear the local effects outside the loop, I went overboard and produced something incorrect. But the test suite still passed. These are regression tests for that incorrect simplification.
| ;; E: memory access (shared GC read) | ||
| (drop | ||
| (struct.get $struct 0 (local.get $x)) | ||
| ) |
There was a problem hiding this comment.
In both of these tests, should we also test the the memory/GC instructions reordered? Here, with the struct.get before the i32 store?
There was a problem hiding this comment.
We already have very thorough tests of all the interactions considered by orderedBefore and orderedAfter, so here I think it's sufficient to have a single positive and single negative test showing that we got the order correct.
| // example with the br_if we can't reorder A and B if B traps, but in the | ||
| // valid examples we can reorder them even if B traps (even if A has a global | ||
| // effect like a global.set, since we assume B does not trap in | ||
| // traps-never-happen). |
There was a problem hiding this comment.
| // traps-never-happen). | |
| // traps-never-happen). | |
| // | |
| // This code also handles overlapping inputs: | |
| // | |
| // (A | |
| // (B) | |
| // ) | |
| // | |
| // Here B is nested within A. We can still ask whether B is orderedBefore or | |
| // orderedAfter A, even though their executions overlap, because [TODO] |
There was a problem hiding this comment.
This is not true in general. It's only valid when loops or recursion make it such that A executes before B, which may or may not be true completely independently of whether A and B overlap or are even different expressions.
There was a problem hiding this comment.
I see what you mean now, so they both overlap and execute before.
There was a problem hiding this comment.
I would still like to mention loops here, but up to you. I don't feel strongly.
There was a problem hiding this comment.
Ok, I added a comment about loops in the last commit.
| (effects.readsMutableGlobalState() && | ||
| loopEffects.writesGlobalState()); | ||
| effectsSoFar.orderedBefore(effects) || | ||
| loopGlobalEffects.orderedBefore(globalEffects); |
There was a problem hiding this comment.
I don't think we need to say anything about cases where "A executed before B" is not the only relationship between A and B.
But if A and B overlap - if A is the parent of B - then their executions overlap? It is literally not true that A executed before B, in this case. Where am I confused?
I'll add a comment draft to my review that might explain what kind of comment I am looking for.
| // example with the br_if we can't reorder A and B if B traps, but in the | ||
| // valid examples we can reorder them even if B traps (even if A has a global | ||
| // effect like a global.set, since we assume B does not trap in | ||
| // traps-never-happen). |
There was a problem hiding this comment.
I would still like to mention loops here, but up to you. I don't feel strongly.
| (effects.readsMutableGlobalState() && | ||
| loopEffects.writesGlobalState()); | ||
| effectsSoFar.orderedBefore(effects) || | ||
| loopEffects.orderedBefore(effects); |
There was a problem hiding this comment.
effectsSoFar.orderedBefore(effects) makes sense to me. But for loopEffects.orderedBefore(effects), as you said they execute before, during, and after - why is it enough to only check one?
There was a problem hiding this comment.
Because we're only trying to move the execution of curr (whose effects are effects) before the loop, i.e. back past all the previous executions of the loop body. If we were trying to sink curr down past the loop, we would have to check the other direction.
There was a problem hiding this comment.
I see, thanks. Yes, that's the right way to look at this.
Replace the coarse
invalidatescheck and the coarse global state checkin LICM with more precise
orderedBeforechecks. This allows LICM tomove memory accesses past release stores, while still correctly blocking
them from moving past acquire loads.
Add a lit test to verify the asymmetrical reordering behavior with
release/acquire atomics on shared memory/GC structs.