Skip to content

Update and expand the TLA+ model#216

Draft
the-mikedavis wants to merge 9 commits into
mainfrom
md/tla
Draft

Update and expand the TLA+ model#216
the-mikedavis wants to merge 9 commits into
mainfrom
md/tla

Conversation

@the-mikedavis
Copy link
Copy Markdown
Collaborator

I was taking a look through the model to try to extend it for rabbitmq-stream-s3. The diff looks pretty large since I flattened the directory but the commit history lays out each change.

I added replication of commit offset and retention to the model as well. I tested this for 20-30 min on an hpc7a.96xlarge (192 cores) and didn't find any violations.

The model previously didn't include the committed offset according to
replicas / followers. Adding this is necessary to later cover retention
or consumer behaviors.
TLA 1.8 flags a deadlock as an error. A deadlock can happen because the
model allows all replicas to stop. We could alternatively prevent this
structurally by ensuring that one replica is always online, but this
doesn't seem realistic in practice, since a cluster may realistically
go fully offline and recover gracefully.
Some variables were marked as UNCHANGED but were assigned anyways. This
logs a warning when running. It also constrains the model to not
consider states where the variable can change.
@the-mikedavis
Copy link
Copy Markdown
Collaborator Author

the-mikedavis commented May 19, 2026

Found some extra issues with TLA+ 1.8.0 around the dead-locking behavior. The message of 266cabe (#216) has some more discussion on that but I think it makes sense to turn off deadlock checking for this model.

Tested again for an hour on an hpc7a.96xlarge.

@kjnilsson
Copy link
Copy Markdown
Contributor

thanks, does this model match what we actually do with the coordinator in RabbitMQ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants