Skip to content

Kroxylicious 1.0#82

Open
tombentley wants to merge 6 commits into
kroxylicious:mainfrom
tombentley:kroxylicious-1.0
Open

Kroxylicious 1.0#82
tombentley wants to merge 6 commits into
kroxylicious:mainfrom
tombentley:kroxylicious-1.0

Conversation

@tombentley
Copy link
Copy Markdown
Member

This PR proposes:

  • a release of Kroxylicious 1.0 in the near future,
  • a commitment to making patch releases

Signed-off-by: Tom Bentley <tbentley@redhat.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
@tombentley tombentley requested a review from a team as a code owner October 21, 2025 04:36
Comment thread proposals/007-kroxylicious-1.0.md Outdated
1. Binary compatibility of the Java APIs of the following modules:
* `kroxylicious-api`
* `kroxylicious-kms`
2. The format of the proxy configuration file, including the configuration of certain plugins provided by the project. Compatibility here means that a released version `1.y` would accept any configuration file that was accepted as valid in any released `1.x` version, where `y > x`. "Valid" means the proxy starts up cleanly.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

certain plugins

How are you deciding which plugins are included and which no. We have the oauthbearer, multitenancy and the record-validation. I would extend the guarantee to all the plugins.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to add oauth and validation.

It's a bit awkward defining what's included with a list:

  • It makes it unclear what happens to plugins we write during the 1.x timeframe.
  • I suppose we should also include the filters we're currently working on.

Which makes me unsure about the current MT, given the namespacing work. We'll end up having two quite similar things. Wdyt we should do there?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the guarantee has to extend to all plug-ins included in the kroxylicious repo and distribution - we are aiming for clarity of message and future.

It would also imply we could only remove a plugin at a major version bump.

I think we should be making reference to our deprecation policy here as well and possibly making it explicit what that means for official plug-ins and not just APIs.

Do we want to make such a strong assertion about config loading? That any minor release will read the prior config format. I'm wondering about a weaker guarantee along the lines of:

All minor releases of a major version will be able to load the major versions config file or provide an automated migration.

I'm specifically thinking of something based around OpenRewrite's yaml recipe .

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh, 'm not sure what to do about the namespacing ideas vs multi-tenancy at the minute.

For the sake of 1.0 proposal, I would just say that guarantees you describe apply to all the plugins that are currently shipped by the project.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally I think we should evict the current MultiTennant filter from the main repo (maybe just delete it, maybe to its own (in|out)side the org). Its served it purpose as a POC but we have evolved our thinking away from such a monolithic design.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second the idea of removing the existing Multitenant filter, I think moving it to its own repo (which could then be archived) is a good idea, so that if anyone is actually using it (or just using it as inspiration) the code is still somewhere relatively accessible.

Copy link
Copy Markdown
Member

@k-wall k-wall Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MT should be deprecated first, then removed, otherwise we are not living up to the pledges we make in our existing deprecation policy.

I think having MT,marked deprecated, in a 1.0 does no harm.

@k-wall
Copy link
Copy Markdown
Member

k-wall commented Oct 21, 2025

@tombentley, thanks for writing this up.

Copy link
Copy Markdown
Member

@SamBarker SamBarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm supportive of a move to 1.0 I have a few caveats around the proposed model.

Comment thread proposals/007-kroxylicious-1.0.md Outdated
1. Binary compatibility of the Java APIs of the following modules:
* `kroxylicious-api`
* `kroxylicious-kms`
2. The format of the proxy configuration file, including the configuration of certain plugins provided by the project. Compatibility here means that a released version `1.y` would accept any configuration file that was accepted as valid in any released `1.x` version, where `y > x`. "Valid" means the proxy starts up cleanly.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the guarantee has to extend to all plug-ins included in the kroxylicious repo and distribution - we are aiming for clarity of message and future.

It would also imply we could only remove a plugin at a major version bump.

I think we should be making reference to our deprecation policy here as well and possibly making it explicit what that means for official plug-ins and not just APIs.

Do we want to make such a strong assertion about config loading? That any minor release will read the prior config format. I'm wondering about a weaker guarantee along the lines of:

All minor releases of a major version will be able to load the major versions config file or provide an automated migration.

I'm specifically thinking of something based around OpenRewrite's yaml recipe .


# Patch releases

We propose to provide patch releases for current `1.x`, plus up to the previous two minor releases. I.e. If the current release is `1.x.y` then we are prepared to release a `1.x.(y+1)`, `1.(x-1).z` and `1.(x-2).z`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if the example would be better expressed as a table.


# Patch releases

We propose to provide patch releases for current `1.x`, plus up to the previous two minor releases. I.e. If the current release is `1.x.y` then we are prepared to release a `1.x.(y+1)`, `1.(x-1).z` and `1.(x-2).z`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given what we say about API(java/config) stability why would we offer anything other than roll forward to the latest patch release?

I am very wary about building up a branch of un-released commits/fixes. While I accept releases have a cost, ours are fairly cheap and getting cheaper (we keep automating stuff) I think we should aim to release every minor release branch we start. However i'd also advocate for lazy branching around this. Only start a 1.x-1 branch if and when requires

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bleeding edge is more or less the first time our new code comes into contact with the real world. It's a likely place for performance etc to degrade. Conservative Users may be putting that version through it's paces in test environments before adopting in production, so supporting a couple of minor versions could be a good idea.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SamBarker the language around the number of maintained minor releases was based on what I thought we had discussed and agreed out of band, but maybe I misunderstood. I added the 6 month thing off my own bat purely as a way of making the commitment less open-ended. I'm happy to change it if that's not the consensus we reached, but we (the maintainers on the hook for actually making good on this) all need to agree what we're willing to do here.

Personally I feel that just "roll forward" is not really good enough. It means that users have to accept the additional risk of taking a bunch of unrelated features in order to get a fix. That's just not always acceptable in production, and I think it therefore stands to harm adoption in production environments.

Therefore I would advocate for:

  • Creating a 1.x branch at the time we release 1.x.
  • Cherry picking to this branch. What gets cherry-picked is partly up to the community. If someone from the community it willing to open a charry-pick PR targeted at 1.x that's great, and we should endevour to get that into a release. This is why a 1.x branch needs to be created when releasing 1.x.0, so that anyone can just go ahead and open such a PR.
  • However, I don't think we should be doing a release simply because a single commit got made to the 1.x branch. The costs of a release are not just bourne by the maintainers who are clicking the buttons on some automation (however fancy). There is a cost for the whole community in terms of assessing whether or not to pick up the new 1.x.y release, and in actually rolling it out. It's no good the project pushing out a new patch release every week if users cannot keep up. We also want the community to participate in testing. For example, if Billy Bob opened a cherry-pick PR that's going to be included in a release then Billy Bob might well want to confirm that the proposed 1.x.y release fixes their problem. Bear in mind what's in the release might be more than just Billy Bob's PR, so it reasonable to want to be sure that none of the other fixes as regressed Billy's fix. Also it's possible that the maintainers haven't been able to reproduce Billy's problem (e.g. maybe it's hardware specific), so their feedback is invaluable.
  • For all these reasons it's I think it's inevitable that some number of fixes get merged to the 1.x branch, building up, before a release of 1.x.y happens. This is OK! We can still decide to expedite a release if a particularly terrible bug/CVE is found. But otherwise batching a bunch of fixes actually makes the eventual release more consumable, and more likely to actually be used (because the risk/benefit looks more favourable if a release fixes several problems rather than a single one). And releases being used is, after all, why we do them.

We need to decide how many 'live' release branches we want on the go. Obviously the effort here is multiplicative, so we need to be realistic. Apache Kafka generally manages to release two patch releases for each minor (see https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan), and they're usually spaced about 3-4 months apart. Kafka also aims for 3 a.b.0 releases a year. This averages out to there being 3 'next releases' at any one time: A .0, a .1 and a .2. Personally my gut feeling is we could sustain something similar, but note the implication that we slow down the pace on the .0 releases.

So then the final loose end is when and how we stop making releases from a minor branch. Here the Apache Kafka project doesn't really have anything written down. But in practice patches do get merged to minor branches which are never publicly released. However, they are available to those motivated to do their own building and testing. That seems like a reasonable balance to me: The project is not obliged to release them, but the community has the ability (and some motivation) to contribute those patches to the project.

cc @k-wall @gracegrimwood @robobario @hrishabhg, because this discussion is kinda central to the whole thing.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only start a 1.x-1 branch if and when requires

On this point I would value having the release branches created as part of the standard workflow so that we don't have diverging release flows (release latest off main, vs other releases off release branch). Having them exist before we need them will enable community members to PR backports directly to them without having to request a branch creation. It makes things more democratic and approachable if all the processes are in place and Contributors just have to review/approve things and pull the release levers.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also wondering if it would be manageable to have things more time based. So have a scheme like:

  1. the latest minor version is actively supported
  2. when a new minor version is created, on date X ,the previous minor version is moved to a different support level
  3. the previous minor now has support dates set based on date X, like 3 months of bugfixing and 6 months of security fixing for example.

This would give Users an easy to interpret table of dates for when a minor goes EOL. And users would know that they are getting at least that many months of support for the latest minor version. Under this scheme there would always be a transition period when we put out a new minor version where we support two versions, where I think with the current proposal if you didn't release for 6 months the prior minor version would be unsupported on day 0 of the new release.

current actively supported version : 1.4.x

old versions:

version    | bugfix support until       | security support until (EOL)   |
1.3.x      | 01-03-2026                 | 01-06-2026                     |
1.2.x      | 01-02-2026                 | 01-05-2026                     |
...

But this could put us on the hook for more than the proposed 3 minor versions of maintenance. It feels like, as Tom said, we will probably slow down the minors due to these new processes and guarantees though. We may end up following Kafka's cadence, to keep up we have to bump kafka-clients and take minor or major changes to the Filter API.

Maybe those concerns are manageable by keeping the lengths of time tight enough

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This got suuuuper long, sorry 😅

Here's a sort of TL;DR of my thoughts:

  • We should be actively creating and maintaining new patch releases for the last two minor versions, as well as the current minor version (i.e. 1.x, 1.x-1, and 1.x-2). To limit the overhead, there should be no more than 3 actively maintained minor versions at any time.
  • The EOL date for the oldest active minor version 1.x-2 should be upon the release of the third subsequent minor version 1.x+1 or six months after the initial release of the expiring version 1.x-2, whichever comes first.
  • We shouldn't try to schedule the EOL or maintenance windows of releases ahead of time, so we don't overcommit ourselves. We could revisit this after we've reached 1.0.0 and tried out a less rigid maintenance schedule for a bit.
  • We should stop merging PRs to a release branch once that version is no longer receiving new patch releases; the purpose of these branches is to create patch releases to serve users of the proxy, anything beyond that is unnecessary extra work. We could revisit this later if there is demand for it.
  • We should have release branches for minor releases in addition to the existing release tags, and these should be created during their initial (minor.0) release by our release automation. These branches can then accept PRs for backporting fixes, and new patch releases for their respective versions can be made from those branches. Patch releases only need release tags, not branches.
  • Backports should be community-driven via raising PRs for backports against active release branches. Maintainers would be responsible for approving these PRs, backporting high-severity security fixes, and for cutting patch releases.
  • The timing of patch releases should be driven by the community through a proposal process (separate from this repo), where a member of the community proposes a new patch release and others can vote on it. Urgent patch releases (such as for high-severity security issues) should bypass this. If we find we are creating patch releases too often, we could look at implementing rules to limit this.
  • Maintaining multiple minor releases might mean we're inclined to put out minor releases less often but it shouldn't mean the pace of development is meaningfully slower. Also, being more mindful about what we're releasing and how often might not be a bad thing.
  • Maintaining multiple minor releases will mean more things for us to review and more times we have to run the release workflow, but that would happen to some extent anyway as the project grows.
  • Not providing any active maintenance for prior releases and instead taking a stance of telling users to just roll forward to the latest version risks alienating a significant number of people who might otherwise use Kroxylicious, and isn't in the best interests of the project.
  • We really should not be talking about any of this as "support". It muddies the water as to what it is we're trying to accomplish as a project, so we should use the term "active maintenance" or just "maintenance" if we can.

the language around the number of maintained minor releases was based on what I thought we had discussed and agreed out of band

We discussed the idea of maintaining and issuing patches for 1.x, 1.x-1, and 1.x-2, but I don't know if we actually reached an agreement. Either way, it sounds like a good number to me; any less isn't really worth it, any more could become unsustainable.

I added the 6 month thing off my own bat purely as a way of making the commitment less open-ended.

I think 6 months is fine as a sort of hard-coded "expiry date" for minor releases. We currently release much more frequently than that, so I reckon it's unlikely any release would hit that deadline anyway. Even if we did let the oldest minor release "expire", having another minor release in there between oldest and latest means users still have a buffer zone to work in, and 6 months seems like plenty of time to arrange a proxy upgrade to a new minor version. We can revisit that number later on if we need to, but I think it's fine.

Personally I feel that just "roll forward" is not really good enough. It means that users have to accept the additional risk of taking a bunch of unrelated features in order to get a fix. That's just not always acceptable in production, and I think it therefore stands to harm adoption in production environments.

I fully agree. We can't know every user's environment and risk profile, and we risk alienating potential users if our response to requests for backported fixes is to just install the latest version. I don't think that stance is in the best interests of the project, even if it would mean less maintenance work for us. Besides, if it was feasible for everyone to just roll forward all the time, nobody would make any money selling LTS versions of things 😉

  • Creating a 1.x branch at the time we release 1.x.
  • Cherry picking to this branch. What gets cherry-picked is partly up to the community.

Agreed, starting out with a major.minor minor release branch at initial major.minor.0 release time to serve as the source of all future major.minor.x patch releases is the way to go. As @robobario suggested, let's have our release automation create it for us, perhaps when it creates the release tag? Then we let the community put up PRs against these minor release branches with backported fixes, and our job is mostly to approve or not, as we currently do with PRs to main.

Patch releases can then be tagged in git as usual but won't get their own release branches. We shouldn't put out patches of patch releases like major.minor.patch.x, as per semver, so any fixes should instead be backported to the respective minor branch and then released in the next patch release of that minor version. For example, if someone wanted to backport a fix to release major.minor.patch then instead they would raise a PR against the major.minor release branch, and that would get released as major.minor.patch+1.

I think there's still definitely a bigger role for us as maintainers in this process, like urgent backporting of high-severity exploitable CVEs, but otherwise I think this process really should be driven by the people using the proxy on the daily, as they're best placed to know which fixes they need and don't need. We can step in when something that isn't a fix (i.e. a feature) gets suggested, for anything urgent, or for anything that's too complicated to backport, but for the most part I see this as the kind of system that starts to run itself.

  • However, I don't think we should be doing a release simply because a single commit got made to the 1.x branch. The costs of a release are not just bourne by the maintainers who are clicking the buttons on some automation (however fancy). There is a cost for the whole community in terms of assessing whether or not to pick up the new 1.x.y release, and in actually rolling it out. It's no good the project pushing out a new patch release every week if users cannot keep up. [...]
  • For all these reasons it's I think it's inevitable that some number of fixes get merged to the 1.x branch, building up, before a release of 1.x.y happens. This is OK! We can still decide to expedite a release if a particularly terrible bug/CVE is found. But otherwise batching a bunch of fixes actually makes the eventual release more consumable, and more likely to actually be used

Yeah, batching fixes is the way to go. As for timing, I think the we should handle this through proposals, honestly. Maybe not anything as big or elaborate as what we do here in this repo, but having a forum or something (like Slack or GitHub discussions or just raising an issue with a special tag on it) where community members can suggest that they think it's time for a patch release of whatever active minor version and then others can vote on that idea. Obviously for anything high-risk we could just go ahead with creating a patch release anyway, but for everything else this lets us take a more hands-off approach.

If, by some chance, that proposal process gets out of hand and folks do start requesting new patch releases every week, then we could look at instituting some kind of limit on how often we will release new patch versions, but that's the kind of bridge we should cross if/when we get to it. I think trying to come up with a schedule for this is a bit of a fool's errand because there will be some times we get to a patch release date and there's nothing worth releasing, and some times when something will show up well ahead of schedule and we'll want to put patches out ASAP. We're better off letting the people who use the proxy determine (for the most part) when new patch releases are required.

We need to decide how many 'live' release branches we want on the go. Obviously the effort here is multiplicative, so we need to be realistic.

I think if we stick to only having branches for minor releases then the number of actively maintained "live" branches comes down to how many minor releases we intend to create new patch releases for. Like I said above, I think the current and the two previous minor releases (1.x, 1.x-1, and 1.x-2) is a good number. I don't think it means we will have to slow development of new features (.0's) because we'd be leaning on the community to do most of the heavy lifting as far as maintenance effort goes, but it might change how often we cut new releases from main and make us think more carefully about what goes into them, which isn't necessarily a bad thing. And of course, all of this will mean more things for us to review and more releases, but we're going to get more things to review and more stuff to release as our contributor base grows anyway.

We'll also have to keep in mind what happens when we hit a new major version, but I don't think that's an unnavigable problem, nor a particularly pressing one at this juncture.

So then the final loose end is when and how we stop making releases from a minor branch. Here the Apache Kafka project doesn't really have anything written down. But in practice patches do get merged to minor branches which are never publicly released. However, they are available to those motivated to do their own building and testing. That seems like a reasonable balance to me: The project is not obliged to release them, but the community has the ability (and some motivation) to contribute those patches to the project.

I've sort of said this already, but if we're only promising patch releases for the previous two minor versions, then the answer to "when do we stop releasing patches for 1.x-2?" is "once we've released 1.x+1", or at that 6 month expiration date, whichever comes first. Obviously, if we're churning out a new minor release every week then we're going to need to reassess both how many minor versions we're maintaining and how long we're maintaining them for. Though, if we were in a position to be churning out minor releases that frequently I think this whole discussion would be very different.

I think "once we've released 1.x+1" also answers the question of "when do we stop accepting PRs to the release branch for 1.x-2?", because if it's not going in a patch release then it doesn't need backporting IMO. The point of this is to make releases and upgrades of Kroxylicious safer, more stable, and more reliable for the people who use it, so merging backports that aren't going to end up in releases seems a bit like wasted effort. In saying that, there is some reasonable demand for this in the future, and we feel it wouldn't add any unreasonable burden on us to maintain, then we could do so. If we draw a hard line now we can always bend it later if we need to, but I don't think we need to do so pre-emptively.

I'm also wondering if it would be manageable to have things more time based. So have a scheme like:

  1. the latest minor version is actively supported
  2. when a new minor version is created, on date X ,the previous minor version is moved to a different support level
  3. the previous minor now has support dates set based on date X, like 3 months of bugfixing and 6 months of security fixing for example.

This would give Users an easy to interpret table of dates for when a minor goes EOL. And users would know that they are getting at least that many months of support for the latest minor version.

This is what NodeJS does and, to an extent, what Ubuntu does something something pro subscriptions and paid support tiers.

I don't dislike this model (quite the opposite actually, as a user it's great) but it feels very strict and rigid given where the project is at currently, and I just don't feel that it's the right fit for us right now. I think it also implies a level of "support" (I don't mean to nitpick on language but I really don't like using this word here, I think we should be calling this "maintenance" because that's what we're actually proposing to do) that I don't think we're trying to offer. It is not within our scope as an open source project to be offering —or appearing to offer— anything close to enterprise support, which is why I think we should be talking about "actively maintained" releases rather than "supported" releases.

If we are wanting some kind of tapered maintenance model then we could say something like "version 1.x-1 gets all the backports while version 1.x-2 only gets CVE fixes until we release 1.x+1", but trying to come up with a schedule for this at this stage isn't really feasible because we have no way of knowing how realistic these kinds of timelines are yet. We can guess, but we don't actually know. We aren't Kafka or NodeJS or Ubuntu; our development effort is not their development effort, our maintenance burden is not their maintenance burden, and we can only model ourselves off of them to a point. A system like this, with stricter timelines and more assurance for users, is the kind of thing we could revisit in the near future once we've actually reached 1.0.0 and tried out something a little more flexible, at which point we'll have a better idea of how much extra work we've actually created for ourselves and what kinds of timelines are actually practical.

Anyway, I'm off to bed. If you read to the end of this: here, have a cookie 🍪

1. Binary compatibility of the Java APIs of the following modules:
* `kroxylicious-api`
* `kroxylicious-kms`
2. The format of the proxy configuration file, including the configuration of certain plugins provided by the project. Compatibility here means that a released version `1.y` would accept any configuration file that was accepted as valid in any released `1.x` version, where `y > x`. "Valid" means the proxy starts up cleanly.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally I think we should evict the current MultiTennant filter from the main repo (maybe just delete it, maybe to its own (in|out)side the org). Its served it purpose as a POC but we have evolved our thinking away from such a monolithic design.

Comment thread proposals/007-kroxylicious-1.0.md
Copy link
Copy Markdown
Member

@gracegrimwood gracegrimwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really happy we've started this discussion, I think it's a great move for the project. Thanks @tombentley for putting this together! 🎉

Comment thread proposals/007-kroxylicious-1.0.md Outdated
Comment thread proposals/007-kroxylicious-1.0.md Outdated
Comment thread proposals/007-kroxylicious-1.0.md Outdated
1. Binary compatibility of the Java APIs of the following modules:
* `kroxylicious-api`
* `kroxylicious-kms`
2. The format of the proxy configuration file, including the configuration of certain plugins provided by the project. Compatibility here means that a released version `1.y` would accept any configuration file that was accepted as valid in any released `1.x` version, where `y > x`. "Valid" means the proxy starts up cleanly.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second the idea of removing the existing Multitenant filter, I think moving it to its own repo (which could then be archived) is a good idea, so that if anyone is actually using it (or just using it as inspiration) the code is still somewhere relatively accessible.

# Patch releases

We propose to provide patch releases for current `1.x`, plus up to the previous two minor releases. I.e. If the current release is `1.x.y` then we are prepared to release a `1.x.(y+1)`, `1.(x-1).z` and `1.(x-2).z`.
However, the "up to" is limited to 6 months after the initial minor release.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a distinction between the current latest minor and the 2 prior ones here? I'm wondering if somewhere we should say the latest minor will be bugfix supported beyond 6 months.


# Patch releases

We propose to provide patch releases for current `1.x`, plus up to the previous two minor releases. I.e. If the current release is `1.x.y` then we are prepared to release a `1.x.(y+1)`, `1.(x-1).z` and `1.(x-2).z`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be more like this? (current version + 2 patches or current version, (current version - 1) patch and (current version - 2) patch.

Suggested change
We propose to provide patch releases for current `1.x`, plus up to the previous two minor releases. I.e. If the current release is `1.x.y` then we are prepared to release a `1.x.(y+1)`, `1.(x-1).z` and `1.(x-2).z`.
We propose to provide patch releases for current `1.x`, plus up to the previous two minor releases. I.e. If the current release is `1.x.y` then we are prepared to release a `1.(x-1).(y+1)` and `1.(x-2).(y+2)`.

OR

Suggested change
We propose to provide patch releases for current `1.x`, plus up to the previous two minor releases. I.e. If the current release is `1.x.y` then we are prepared to release a `1.x.(y+1)`, `1.(x-1).z` and `1.(x-2).z`.
We propose to provide patch releases for current `1.x`, plus up to the previous two minor releases. I.e. If the current release is `1.x.y` then we are prepared to release a `1.x.(y+1)` and `1.x.(y+2)`.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding off any change here while we work though the conversation above.

Comment thread proposals/007-kroxylicious-1.0.md Outdated
Comment thread proposals/007-kroxylicious-1.0.md Outdated
tombentley and others added 4 commits October 28, 2025 22:26
Co-authored-by: Hrishabh Gupta <hrishabhg@gmail.com>
Signed-off-by: Tom Bentley <tombentley@users.noreply.github.com>
Signed-off-by: Tom Bentley <tombentley@users.noreply.github.com>
Co-authored-by: Grace Grimwood <15976368+gracegrimwood@users.noreply.github.com>
Signed-off-by: Tom Bentley <tombentley@users.noreply.github.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
@tombentley
Copy link
Copy Markdown
Member Author

Thanks for the initial round of reviews folks. I've made some of the more minor changes. I think we need to resolve this conversation to make further progress.

@alanrobinson-dwp
Copy link
Copy Markdown

Hi all, for what it's worth from our perspective this sounds like it would work well for us.

Our security posture means we shouldn't be too far away from latest releases. If we were any further out than 3 we'd be getting asked to upgrade regardless.

If we were one or two out and there was a fix release for a security vulnerability it'd be easier getting our current version + security vulnerability or bug into Prod than jumping possibly 2 versions and having to make sure we covered off testing for extra features in order to get that security vulnerability or bug.

3. The plugins covered under item 2. are all those implementations of the APIs listed in 1. provided by the project, excluding those implementations in test packages and modules. This implies the contined inclusion of these plugins in the `1.x` series of releases. However, the existing multi-tenancy plugin will be marked as deprecated.
4. The Kubernetes custom resource APIs defined by the kroxylicious-operator.

If the compatibility guarantee is broken we will treat that as a bug.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only covers compatibility between Kroxylicious versions. I couldn't see anywhere compatibility with regards to Kafka. My understand is that Kroxy always tries to adopt the latest Kafka dependencies. Is that true? Should that be documented? If so it's takes the same Kafka compatibility guarantees as Kafka.
So for example if Kafka 5.0 drops support for Kafka < 3.0 (just a random example, this is not based on actual plans), Kroxylicious would also drop support for Kafka < 3.0.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question.

There are two aspects to Kroxylicious's dependency on Kafka:

  1. We depend on the protocol itself, in the abstract. A new version of Kafka can add more APIs, or new API versions, which won't be understood by the proxy. Kafka's ApiVersions API mostly solves this, in the sense that we can force clients that use ApiVersions correctly to only use APIs and API versions which the proxy understands.
  2. Internally and in the current Filter API we use Kafka's own *RequestData and *ResponseData classes, as generated from the JSON IDL in the Kafka project.

Item 1 is fundamental, and unavoidable, but I think we're happy that it's not a problem in practice.

Item 2 is definitely a problem when it comes to making compatibility guarantees like we're trying to do here. The Apache Kafka project can (and has) renamed API key and fields, which ripples out to those Java classes. Apache Kafka can do this because on the wire the API key is identified by id, not by the name, and fields are defined positionally within a structure, rather than by name. When AK does this it manifests as a compile time error/linker error in any filters consume that API key. I.e. a loss of source and binary compatibility.

So we can't really live up to the promise we're trying to make here.

I can see two possible ways forward:

  1. Carve out a massive compatibility exception around this for Kroxylicious 1.0, and maybe try to address it during 1.x.
  2. Address it before we declare a 1.0.

The way we would solve it amounts to removing our dependency on kafka-clients.jar entirely, by:

  • Generating our own RequestData and ResponseData classes (e.g. by forking Kafka's code generator and having it generate classes in our own package namespace)
  • Vendoring, or rewriting, all the MemoryRecords code.
  • Dealing with the known problem around errorResponse, so we don't depend on Kafka's hand-written *Request classes.
  • Very likely other dependencies too.

Suffice to say this is not a small piece of work,. If we wanted to solve it prior to 1.0 we would be delaying 1.0 for quite a while.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention was mostly to point at these issues to be aware of them. At this point documenting them is probably enough. As you said, the effort to only relying on Kafka's public APIs is very large.

It still think it's important to consider them, for example to avoid compatibility pitfalls like we've seen in projects like Cruise Control which was stuck on an outdated Kafka version for a long time.

k-wall added a commit to k-wall/design that referenced this pull request Apr 9, 2026
This introduces a proposal index (README.md) for the proposals directory
to solve the problem of colliding proposal numbers.

Currently, multiple open PRs have chosen proposal numbers that conflict
with already-merged proposals. This creates confusion and makes it
difficult to maintain stable identifiers for proposals - which is
essential for productive discussion on the mailing list.

The new process:
1. Authors create proposals with temporary filenames (nnn-*)
2. A separate PR to this README allocates the next sequential number
3. Authors rename their proposal once the number is allocated
4. The index is updated again when proposals are accepted

This ensures:
- Proposal numbers are allocated chronologically
- No number collisions occur
- Each proposal gets a stable identifier before mailing list discussion
- Clear visibility of open vs. accepted proposals

Current collisions resolved by this index:
- PR kroxylicious#70, kroxylicious#82, kroxylicious#83, kroxylicious#88, kroxylicious#89 all have numbers conflicting with accepted
  proposals and have been allocated new sequential numbers (016-024)

Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com>
k-wall added a commit to k-wall/design that referenced this pull request Apr 9, 2026
This introduces a proposal index (README.md) for the proposals directory
to solve the problem of colliding proposal numbers.

Currently, multiple open PRs have chosen proposal numbers that conflict
with already-merged proposals. This creates confusion and makes it
difficult to maintain stable identifiers for proposals - which is
essential for productive discussion on the mailing list.

The new process:
1. Authors create proposals with temporary filenames (nnn-*)
2. A separate PR to this README allocates the next sequential number
3. Authors rename their proposal once the number is allocated
4. The index is updated again when proposals are accepted

This ensures:
- Proposal numbers are allocated chronologically
- No number collisions occur
- Each proposal gets a stable identifier before mailing list discussion
- Clear visibility of open vs. accepted proposals

Current collisions resolved by this index:
- PR kroxylicious#70, kroxylicious#82, kroxylicious#83, kroxylicious#88, kroxylicious#89 all have numbers conflicting with accepted
  proposals and have been allocated new sequential numbers (016-024)

Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Keith Wall <kwall@apache.org>
k-wall added a commit to k-wall/design that referenced this pull request Apr 9, 2026
This introduces a proposal index (README.md) for the proposals directory
to solve the problem of colliding proposal numbers.

Currently, multiple open PRs have chosen proposal numbers that conflict
with already-merged proposals. This creates confusion and makes it
difficult to maintain stable identifiers for proposals - which is
essential for productive discussion on the mailing list.

The new process:
1. Authors create proposals with temporary filenames (nnn-*)
2. A separate PR to this README allocates the next sequential number
3. Authors rename their proposal once the number is allocated
4. The index is updated again when proposals are accepted

This ensures:
- Proposal numbers are allocated chronologically
- No number collisions occur
- Each proposal gets a stable identifier before mailing list discussion
- Clear visibility of open vs. accepted proposals

Current collisions resolved by this index:
- PR kroxylicious#70, kroxylicious#82, kroxylicious#83, kroxylicious#88, kroxylicious#89 all have numbers conflicting with accepted
  proposals and have been allocated new sequential numbers (016-024)

Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Keith Wall <kwall@apache.org>
@robobario
Copy link
Copy Markdown
Member

robobario commented Apr 15, 2026

the future of rapid experimentation

Once we release 1.0.0 some things will become much easier to discuss, for example we can talk about deprecating major features with a defined removal time of 2.0.0. But on the other hand, using a single sem-ver for all our APIs (filter configs, CRDs, proxy config, shell scripts) may be restrictive when it comes to putting out experimental features and APIs.

For example if I am designing some new feature, I may have some idea that is not fully proven but I want to get it out there for users to experiment with. I am uncertain that the API of this feature is stable, it may need to be reworked when we accumulate more feedback. Having to completely abide by the project sem-ver is uncomfortable since the feature is still mid-design, we may need to break the API quickly.

Some questions raised are:

  1. Will we support this kind of experimentation?
  2. Do we need some classification so we can say "feature X is alpha" or "feature X is beta" with different implications about the API stability and impact on semver?
  3. Would different classifications have different requirements? Like does an alpha change require a merged design proposal?

edit: I can imagine this is easyish for new Filters as we can control when they enter our defined "Public API", but what about experimenting within established modules, like an experimental extension to the Proxy runtime

@tombentley
Copy link
Copy Markdown
Member Author

@robobario that is a great question.

  • Absolutely we need to allow the people developing kroxylicious a way to build out features over multiple releases. I could point to plenty of other projects which have mechanisms like this. For example, OpenJDK has developer preview features where neither API compatibility not eventual delivery are guaranteed. Similarly Kubernetes explicitly versions each API (v1alpha1, v1beta1, v1).
  • The key thing is that end users have to explicitly opt-in to using such unstable APIs, and their unstable nature is effectively communicated.
  • SemVer does provide a way to do that, with 0.x versions.
  • However, we have lots of APIs, and they're all going to mature at different times. We cannot really use a single kroxylicious release version number to convey their status to users.
  • We could break apart our release process, so that each thing that presents an API to an end users has its own semver number. However, managing all that at the maven level looks like a lot of effort to me. I think the project benefits from having a single thing which we release on a regular cadence.
  • Another problem is that there are "APIs" where we care about compatibility but which do not fit nicely into a release artifact (that could be released with a semver number attached). For example we care about things like the proxy's CLI interface, and the env vars it supports, and likely have expectations about the file system it's looking at (Add plugins directory support to proxy classpath kroxylicious#3715)
  • Reworking proxy configuration #96 provides a concrete proposal for how we could version plugin configuration schemas using the same basic conventions as Kubernetes.
  • For Java APIs (kroxylicious-api, kroxylicious-authorizer-api, kroxylicious-kms) I think we could do something similar using the Kubenetes syntax (v1alpha1, v1beta1, v1), explicitly labelling APIs and guaranteeing backwards binary compatibility.
    • A given version of the proxy would only support one version of each of those APIs at a time.
    • We could probably arrange to log some warning (via a static init block) about APIs which are in alpha.
  • For other things, I think we're likely looking at logging and some kind of feature gating, so users had to explicitly unlock using those APIs. Perhaps using env vars is enough? Those are available to the JVM but also things like scripts. I'm open to other ideas. But I don't want to block getting to 1.0 on having some perfect solution here.
  • If we do the above then we have the vocabulary to communicate with each other and end users about APIs in terms of alpha, beta and GA. The question becomes how does this relate back to the overall version number. I think that's pretty simple, though here I don't think we necessarily want to follow exactly the semantics what Kube has but I think it's basically a good model
    • An alpha API, matching v[0-9]+alpha[0-9]+, comes with no guarantees at all. This open the door to experimentation, and should speed up getting experimental software into the hands of people who want to play with it, because the project is no going to be held to anything.
    • A beta API, matching v[0-9]+beta[0-9]+, comes with some stronger guarantees, but still not compatibility guarantees.
    • A GA API, matching v[0-9]+, comes with the strongest guarantee we can offer: Backwards compatibility, a minimum deprecation period, and removal only in some Kroxylicious x.0 release.
    • The thing I don't want to replicate from Kubernetes is the time bounding of things in beta (see below for one concrete reason why)

In terms of our design approval process, don't think we need to have a completely rigid view. But I think my starting expectation would be that each API version would have at least one proposal associated with it. By doing that I think it's clearer to everyone what the bar for agreeing a proposal to add an alpha API would be much lower then a beta which would be lower again that a final/ga. By definition in an alpha API we don't fully understand the problem and/or solution. So we don't need to crystal-ball gaze so much when reviewing the API. The current situation and motivation sections of beta and final versions become a summary of what we've learned from the previous version(s).

Finally I think this model also allows us a principled way of describing the Kafka dependency which Mickael raised. Our kroxylicious-api module is in beta, because we can't guarantee that upstream Kafka changes to the RPC IDL won't break filters. But the feature is definitely going to stick around.

@k-wall
Copy link
Copy Markdown
Member

k-wall commented Apr 16, 2026

I think having a Kroxylicious version number and another version tagging scheme (v[0-9]+) is a recipe for confusion.
I think we pledge semver compliance for all our public APIs (api modules, config, envvar).

If we need to weaken the guarantee for new API surface, I'd suggest we use a Java annotation like @Unstable to signify that the API may change in an incompatible way (breaking semver). We'd change the config module to detect where @unstable features had been configured and bleat a warning (so the unstableness is visible to admin configuring the system).

However, I do have YAGNI ringing in my head. I'd suggest we try and live by semver. If it gets too costly, we can think alternatives.

@SamBarker
Copy link
Copy Markdown
Member

Stability levels: the WildFly model as a starting point

Great discussion @robobario and @tombentley. I've been looking at how other projects handle this
tension between velocity and stability guarantees, and I think WildFly's stability level model is
worth examining as a concrete reference point. It maps reasonably well onto our architecture.

The model

WildFly defines four stability tiers — experimental, preview, community, default.
The distinction between community and default is largely a downstream concern (which features Red
Hat will support commercially in JBoss EAP). For an open-source project like ours, that's not our
tier to define — a downstream vendor can draw that line themselves.

But the other three levels map well onto our needs:

Level Signal to users Bar for contributors Semver coverage
Experimental Try it, give feedback, might vanish next release Basic tests, no proposal needed None
Preview This is happening, but the API may change Approved proposal with promotion plan None
GA Stable, we stand behind this Full semver compliance Full

The important split is between experimental and preview. Experimental is "I don't even know if
this is the right idea" — it lowers the barrier to trying things out. Someone has an idea for a
filter, they can ship it without writing a proposal, without committing to a promotion path. If it
turns out to be useful, it gets a proposal and moves to preview. If not, it gets removed with zero
ceremony. Preview is "this is the right idea but the shape might change."

In WildFly, each feature is tagged with a stability level and users opt in at startup via a flag
(--stability=experimental). Features below the running stability level simply don't exist at
runtime — not visible in config, CLI, or admin console.

How this maps onto our surfaces

Filters and filter config: the cleanest case. A filter implementation declares its stability
level (annotation, plugin descriptor, etc.). The proxy checks the filter's stability against the
configured threshold when loading virtual clusters. Filter config is already per-filter and owned
by the filter implementation, so an experimental filter naturally has experimental config. No shared
schema changes needed.

Evolving existing filters (e.g. adding record key encryption to RecordEncryption, or enabling
post-quantum algorithms): this is the harder case — an experimental capability being added to a GA
filter. The filter itself stays GA, but its config schema gains new keys that are experimental. We
could handle this by tagging individual config keys with a stability level (e.g. via annotations on
the Java config model class) and having the framework enforce the threshold at config validation
time. This way one filter can have a mix of GA and experimental config surface, and the enforcement
is uniform rather than each filter inventing its own feature-flag mechanism.

Proxy-level config: similar to the above — new proxy config options could be tagged at the
attribute level with a stability level. More like WildFly's attribute-level granularity.

Operator CRDs: our CR-per-filter model (KafkaProtocolFilter) helps here. The CRD is generic —
type plus a config blob. Adding an experimental filter doesn't change the CRD schema; the
stability level is a property of the filter type, not the CRD. The CRD's own API version
(v1alpha1v1) is an orthogonal dimension.

Java APIs (kroxylicious-api, kroxylicious-kms): @tombentley suggested using the Kubernetes
naming convention (v1alpha1, v1beta1, v1). In Java there are broadly two ways to do this:

  1. Package-level versioning — put the version in the package name
    (io.kroxylicious.proxy.filter.v1alpha1.FilterFactory). This gives hard classpath boundaries
    and makes stability dependencies explicit at compile time. For promotion, the trick is to make
    alpha extend beta: existing v1alpha1.FilterFactory implementations automatically satisfy
    v1beta1.FilterFactory since alpha is a subtype. Methods dropped during promotion stay on the
    alpha interface as throwing defaults to drive migration. Methods added in beta get defaults so
    existing alpha implementations still compile. This works cleanly when promotion narrows the API
    (stabilises a subset, drops the rest). If promotion reshapes the API fundamentally, it's a hard
    break — but that's the explicit contract of alpha.

  2. Annotation-level marking — keep one package, annotate individual types or methods with
    @Stability(EXPERIMENTAL). Less ceremony, no package gymnastics, and the config-level stability
    enforcement provides the safety net at runtime. The trade-off is you lose the compile-time
    boundary — a filter compiled against an experimental method gets a NoSuchMethodError if the
    method is removed, rather than a clean compile error.

My instinct is that annotations are less ceremony for the same outcome, especially given that the
config-level stability check already prevents misconfigured features from reaching the data path.
But for modules like kroxylicious-api where binary compatibility is a headline guarantee, the
package approach gives stronger assurance. Worth discussing — I don't think we need to settle this
before 1.0.

Opt-in mechanism

The stability threshold should live on the KafkaProxy CR — the infrastructure admin decides
the risk profile for the deployment:

apiVersion: operator.kroxylicious.io/v1alpha1
kind: KafkaProxy
spec:
  stabilityLevel: preview  # default: ga

For bare proxy usage, the equivalent in proxy config or a startup flag.

What happens when config references a feature below the threshold?

WildFly hard-rejects (the server won't start). That's clean but creates crash loops in a GitOps
model — you discover the problem by noticing the proxy is down.

Silently dropping the feature and serving traffic through a reduced filter chain is worse — a user
thinks they have encryption enabled but the proxy quietly dropped it.

I think the right answer for us is: refuse the virtual cluster, not the proxy.

  • Operator path: the reconciler validates stability levels at reconciliation time. If a
    VirtualKafkaCluster references a KafkaProtocolFilter whose type is below the KafkaProxy's
    stability threshold, set a condition on the VirtualKafkaCluster and don't generate config for
    it. The proxy never sees invalid config. Feedback is in the CR status, exactly where GitOps
    tooling looks.

  • Bare proxy path: the proxy validates at startup/reload and refuses individual virtual clusters
    that reference filters below the threshold. The proxy stays up, other virtual clusters are
    unaffected, and the failing cluster gets a clear status message. This aligns with the per-VC
    failure model we already have.

In both cases: never silently degrade the data path, never crash the whole proxy over one
misconfigured virtual cluster
.

What this doesn't solve

  • Kafka dependency coupling: if upstream Kafka renames an IDL field, that breaks
    kroxylicious-api regardless of stability levels. That's a coupling problem, not a maturity
    problem. Stability levels don't help; decoupling from kafka-clients.jar internals does.
  • Promotion criteria: we'd need to define what it takes to move from experimental → preview →
    GA. The bar for GA is clear (full semver compliance), but the criteria for experimental →
    preview could use some definition so promotion decisions aren't ad-hoc.

Relationship to this proposal

I don't think we need to fully design this before 1.0, but I think the 1.0 proposal should at
least acknowledge the mechanism. Something like: "features within Kroxylicious may be classified at
different stability levels. Only GA features are covered by the semver compatibility guarantee. The
details of this classification are the subject of a follow-up proposal."

That way 1.0 isn't a straitjacket — we've explicitly reserved the right to ship experimental things
without breaking the compatibility promise.

@tombentley
Copy link
Copy Markdown
Member Author

@k-wall

I think having a Kroxylicious version number and another version tagging scheme (v[0-9]+) is a recipe for confusion.

I don't think so. It's worked for the Kubernetes project. I don't seem people there looking confused. On the contrary it brings clarify, because they've decomposed the messy reality behind the top level version number into identified APIs and feature flags and version them in a consistent way.

I'd suggest we try and live by semver.

I think you need to acknowledge that we have already adopted the kubernetes api versioning for our kubernetes model. So the "complication" of having multiple version numbers floating around already exists in the project.

If it gets too costly, we can think alternatives.

It's already too costly. You might have a mental model in your head about what all the various APIs are, and whether or not we consider them supported, but I can almost guarantee it's incomprehensible to anyone new to the project. You have to read the docs. You can easily accidentally use APIs which we don't consider fully supported. It's a mess.

What I'm proposing is an alternative. To me the decision seems to be between:

  • Carry on as we are, with a top level semver number, Kube convention for CRs and kind-of nothing at all for all the other APIs. But we have pretensions to declare 1.0. That's just making the project even more incomprehensible.
  • Use SemVer for the top level, and for the other APIs, and only use Kube conventions for Kube APIs. But as I explained, it's not really clear to me how we make SemVer work for the other APIs without complicating the build process.
  • Use SemVer for the top level, and Kube-like conventions for the all other APIs. Release process is not affected. Sematics are clear.

@SamBarker

It seems to me that the wildfly model is broadly similar to the Kubernetes model, modulo syntax and proposal/approval processes. When I talked to @robobario about this earlier this week I was in the "Alpha doesn't require anything" camp too, so I could live with that. I definitely want to lower barriers to trying things out in a way that end users can easily experiment with. The reason I suggested to have some process even for Alpha is simply so we have a record of "In v1alpha1 we did ... the feedback we got was ... so for v1alpha2 we are going to change ....". Maybe the proposal process it too heavy weight for capturing that. So fine let's say only the beta and GA APIs need to use the proposal process.

I'm guessing you've not yet read #96. It's still marked a Draft, so I'm not blaming you for not having take a look yet, but please could you before we get into some bikeshedding discussion about how to version configurations.

Java APIs (kroxylicious-api, kroxylicious-kms): @tombentley suggested using the Kubernetes
naming convention (v1alpha1, v1beta1, v1). In Java there are broadly two ways to do this:

Technically there is at least a 3rd way -- max out on class loader isolation by linking plugins to the API version. But we're approximately 1,000,000 miles from that.

Versioning packages suffers from the terrible drawback that our packages are not versioned already, so trying to switch to that is itself a breaking change.

So practically speaking I agree with you that annotations would be good enough. That would be dead simple. That being the case, I think we should do this for 1.0.

The stability threshold should live on the KafkaProxy CR

That does not work: We support the proxy outside Kube, where there is no KafkaProxy CR.
You might then say: do it in the Proxy config itself then, but that doesn't work very well for anything which doesn't have a configuration readily to hand, such as the startup script.
What might work is just to use env vars, one for each feature/API we're gating access to. So suppose we wanted to add a new plugin interface API (like KMS, say): At the use site we'd check KROXLICIOUS_UNLOCK_API_FOO=v1alpha1.

  • That's a mechanism which works in a script as well as in a JVM program.
  • It can be used to propagate opt-ins from the operator into the proxy process.
  • It's API specific, so an admin can unlock API FOO but not BAR, even though both are v1alpha1.

Kafka dependency coupling: if upstream Kafka renames an IDL field, that breaks
kroxylicious-api regardless of stability levels. That's a coupling problem, not a maturity
problem.

It is a coupling problem, but the only solution to that is to change the API. Thus the coupling problem is itself coupled to the versioning problem. I.e. we will need the equivalent of a "FilterFactory v2", which produces filters whose method signatures depend on our own set of types for modelling the Kafka protocol.

So when you view it like that we might as well say that our current FilterFactory is v1beta1. The alternative is to say FilterFactory is v1, but not actually be able to live up to the compatibility promise which a v1 should imply and also still need a v2 to solve the problem in the long term. Calling what we have v1beta1 is just simpler, isn't it?

@SamBarker
Copy link
Copy Markdown
Member

@tombentley thanks for the detailed response, and for pushing me to read #96 — it changed how I think about some of this.

Where I think we agree

Annotations for Java API stability, not packages. Agreed, and I think doing this for 1.0 is the right call.

FilterFactory is effectively v1beta1. The Kafka coupling means we can't deliver the binary compatibility promise that v1 implies. The 1.0 proposal should be honest about this rather than making a guarantee we can't keep. Calling it beta gives us the vocabulary to describe the situation accurately: the feature is here to stay, but the API surface may break when upstream Kafka changes its IDL.

#96 handles plugin config versioning and per-plugin opt-in. After reading that proposal, I agree we don't need a separate stability mechanism for plugin configurations — the version string (v1alpha1, v1beta1, v1) already carries the stability signal, and users opt in to experimental features by writing version: v1alpha1 in their plugin config. You're also right that the KafkaProxy CR doesn't work as the sole opt-in mechanism — bare proxy deployments don't have one.

User experience and policy enforcement

I want to push back on one dimension though: who are we optimising for?

Between #82 and #96, we're introducing a lot of versioning machinery: project version (semver), proxy config version, per-plugin config versions, K8s naming conventions. That machinery is necessary — but it should be our problem, not the user's. We're paid to think so users don't have to.

Our usual design mantra applies here: easy things should be easy, other things should be possible. Deploying a stable proxy with GA features should be easy — no versioning concepts required. Experimenting with alpha features, writing third-party plugins, evolving config schemas — those should be possible, but the complexity involved is opt-in, not load-bearing for the common case. This has implications for #96 that I'll raise there separately, but for #82 it means the proposal should state this as a design principle alongside the stability levels and compatibility guarantees.

There's also a related concern on the enforcement side. Per-plugin opt-in covers the user who wants an experimental feature, but not the admin who wants to prevent them. If anyone who can write a config file can opt in to v1alpha1, the only thing stopping alpha configs in production is discipline. There could be a case for a per-proxy policy gate (along the lines of WildFly's --stability flag) — whether that's an env var, a proxy.yaml field, or something else. I don't think we need to solve this for 1.0, but the proposal should at least acknowledge the two dimensions: opt-in (per-plugin, handled by #96) and policy enforcement (per-proxy, future work).

What the 1.0 proposal should say

Pulling this together, I think the proposal needs:

  1. Stability levels defined (experimental, preview, GA) with clear guarantees for each.
  2. Honest scoping of the compatibility guarantee — acknowledge that kroxylicious-api's filter API is effectively beta due to the Kafka dependency. The other modules (kroxylicious-kms, operator CRDs) can be GA.
  3. A user experience principle: the stable path requires zero versioning knowledge. Complexity is opt-in.
  4. Per-proxy policy enforcement acknowledged as a potential future need — there's a case for admins being able to constrain stability levels on a deployment.
  5. The existing content on patch releases and maintenance, once the branching model discussion converges.

@k-wall
Copy link
Copy Markdown
Member

k-wall commented Apr 21, 2026

I haven't left myself another time to reply to this properly today. I see I'm in minority of one with my YAGNI concern. Happy to go with annotation based API stability markers if that the will.

It's already too costly. You might have a mental model in your head about what all the various APIs are, and whether or not we consider them supported, but I can almost guarantee it's incomprehensible to anyone new to the project. You have to read the docs. You can easily accidentally use APIs which we don't consider fully supported. It's a mess.

Fair point. Maybe that's a smell we should think adopting Jigsaw sooner, so it is crystal clear what's intended to be public API and what's not. (Full disclosure - I've never actually implemented Jigsaw in anything other than a Helloworld so I don't know how hard it will actually be).

@k-wall
Copy link
Copy Markdown
Member

k-wall commented Apr 23, 2026

Discussed at Community Call 23rd April. Tom is planning to rework the proposal factoring in recent ideas/discussions on API stability.

@tombentley
Copy link
Copy Markdown
Member Author

FYI I've opened #103 for the Java API versioning part. I propose that we try to get that accepted, and the configuration versioning aspect accepted, then circle back to Kroxylicious 1.0 specifically.

@tombentley tombentley moved this to Want to Do in Release 0.21.0 Apr 24, 2026
@tombentley tombentley self-assigned this Apr 24, 2026
k-wall added a commit to k-wall/design that referenced this pull request Apr 24, 2026
This introduces a proposal index (README.md) for the proposals directory
to solve the problem of colliding proposal numbers.

Currently, multiple open PRs have chosen proposal numbers that conflict
with already-merged proposals. This creates confusion and makes it
difficult to maintain stable identifiers for proposals - which is
essential for productive discussion on the mailing list.

The new process:
1. Authors create proposals with temporary filenames (nnn-*)
2. A separate PR to this README allocates the next sequential number
3. Authors rename their proposal once the number is allocated
4. The index is updated again when proposals are accepted

This ensures:
- Proposal numbers are allocated chronologically
- No number collisions occur
- Each proposal gets a stable identifier before mailing list discussion
- Clear visibility of open vs. accepted proposals

Current collisions resolved by this index:
- PR kroxylicious#70, kroxylicious#82, kroxylicious#83, kroxylicious#88, kroxylicious#89 all have numbers conflicting with accepted
  proposals and have been allocated new sequential numbers (016-024)

Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Keith Wall <kwall@apache.org>
k-wall added a commit to k-wall/design that referenced this pull request Apr 28, 2026
This change simplifies the proposal numbering system by using PR numbers
as proposal identifiers, eliminating number collisions and removing the
need for a separate allocation process.

Changes:
- Simplified proposals/README.md to focus on author workflow
  - Removed index tables (directory listing serves as the index)
  - Streamlined instructions for creating and renaming proposals
- Updated proposal template with workflow instructions
  - Require PR number in title format: # <PR#> - <Title>
  - Moved workflow instructions into comment block
- Added GitHub workflow to automatically check proposal numbering
  - Validates both filename and title format
  - Updates PR description when proposal files don't match PR number
  - Provides exact commands to fix naming issues
  - Removes warning once corrected
  - Handles both added and renamed files
  - Runs on all PRs (ready for mandatory status check)
- Added notification script for existing open PRs
  - After merge, run notify-open-prs.sh to ask authors to rebase
  - Workflow will automatically guide them through renaming
  - Updated with all current open proposal PRs (kroxylicious#70, kroxylicious#82, kroxylicious#83, kroxylicious#85,
    kroxylicious#88, kroxylicious#93, kroxylicious#94, kroxylicious#96, kroxylicious#98, kroxylicious#99, kroxylicious#100, kroxylicious#101, kroxylicious#103)

Proposals 001-019 retain their original numbers.

Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Keith Wall <kwall@apache.org>
k-wall added a commit that referenced this pull request Apr 28, 2026
This change simplifies the proposal numbering system by using PR numbers
as proposal identifiers, eliminating number collisions and removing the
need for a separate allocation process.

Changes:
- Simplified proposals/README.md to focus on author workflow
  - Removed index tables (directory listing serves as the index)
  - Streamlined instructions for creating and renaming proposals
- Updated proposal template with workflow instructions
  - Require PR number in title format: # <PR#> - <Title>
  - Moved workflow instructions into comment block
- Added GitHub workflow to automatically check proposal numbering
  - Validates both filename and title format
  - Updates PR description when proposal files don't match PR number
  - Provides exact commands to fix naming issues
  - Removes warning once corrected
  - Handles both added and renamed files
  - Runs on all PRs (ready for mandatory status check)
- Added notification script for existing open PRs
  - After merge, run notify-open-prs.sh to ask authors to rebase
  - Workflow will automatically guide them through renaming
  - Updated with all current open proposal PRs (#70, #82, #83, #85,
    #88, #93, #94, #96, #98, #99, #100, #101, #103)

Proposals 001-019 retain their original numbers.

Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Signed-off-by: Keith Wall <kwall@apache.org>
@k-wall
Copy link
Copy Markdown
Member

k-wall commented Apr 28, 2026

Hi! We've updated the proposal numbering system to use PR numbers as proposal identifiers.

Action required: Please rebase your PR on main.

Once you rebase, you'll need to rename your proposal file and update the title:

git mv proposals/007-kroxylicious-1.0.md proposals/082-1.0.md
# Update title: remove any old number prefix and add PR number
sed -i.bak '0,/^# /{s/^# \([0-9]\{3\}\|xxx\|nnn\|000\) - /# 82 - /; t; s/^# /# 82 - /}' proposals/082-1.0.md && rm proposals/082-1.0.md.bak
git add proposals/082-1.0.md
git commit -m "Rename proposal to use PR number"
git push

The GitHub workflow will automatically check your proposal file naming after you push and update this PR description if any corrections are still needed.

See proposals/README.md for the updated workflow.

@k-wall
Copy link
Copy Markdown
Member

k-wall commented Apr 28, 2026

Correction: The previous notification had an incorrect filename. Here are the correct commands:

git mv proposals/007-kroxylicious-1.0.md proposals/082-kroxylicious-1.0.md
# Update title: remove any old number prefix and add PR number
sed -i.bak '0,/^# /{s/^# \([0-9]\{3\}\|xxx\|nnn\|000\) - /# 82 - /; t; s/^# /# 82 - /}' proposals/082-kroxylicious-1.0.md && rm proposals/082-kroxylicious-1.0.md.bak
git add proposals/082-kroxylicious-1.0.md
git commit -m "Rename proposal to use PR number"
git push

Sorry for the confusion!

@robobario
Copy link
Copy Markdown
Member

On Plugin Versioning.

As an alternative I've put up #110 which is a smaller idea for achieving explicit versioning of our plugins, adopting k8s CRD versioning conventions to convey stability. This is just a tiny subset of what we'd get out of the full #96 but I think it could be sufficient for Kroxylicious 1.0.0 enabling us to put out alpha versioned plugin configurations, making the stability guarantees visible to Users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Must Do

Development

Successfully merging this pull request may close these issues.

8 participants