Kroxylicious 1.0#82
Conversation
Signed-off-by: Tom Bentley <tbentley@redhat.com>
| 1. Binary compatibility of the Java APIs of the following modules: | ||
| * `kroxylicious-api` | ||
| * `kroxylicious-kms` | ||
| 2. The format of the proxy configuration file, including the configuration of certain plugins provided by the project. Compatibility here means that a released version `1.y` would accept any configuration file that was accepted as valid in any released `1.x` version, where `y > x`. "Valid" means the proxy starts up cleanly. |
There was a problem hiding this comment.
certain plugins
How are you deciding which plugins are included and which no. We have the oauthbearer, multitenancy and the record-validation. I would extend the guarantee to all the plugins.
There was a problem hiding this comment.
Happy to add oauth and validation.
It's a bit awkward defining what's included with a list:
- It makes it unclear what happens to plugins we write during the
1.xtimeframe. - I suppose we should also include the filters we're currently working on.
Which makes me unsure about the current MT, given the namespacing work. We'll end up having two quite similar things. Wdyt we should do there?
There was a problem hiding this comment.
I think the guarantee has to extend to all plug-ins included in the kroxylicious repo and distribution - we are aiming for clarity of message and future.
It would also imply we could only remove a plugin at a major version bump.
I think we should be making reference to our deprecation policy here as well and possibly making it explicit what that means for official plug-ins and not just APIs.
Do we want to make such a strong assertion about config loading? That any minor release will read the prior config format. I'm wondering about a weaker guarantee along the lines of:
All minor releases of a major version will be able to load the major versions config file or provide an automated migration.
I'm specifically thinking of something based around OpenRewrite's yaml recipe .
There was a problem hiding this comment.
Tbh, 'm not sure what to do about the namespacing ideas vs multi-tenancy at the minute.
For the sake of 1.0 proposal, I would just say that guarantees you describe apply to all the plugins that are currently shipped by the project.
There was a problem hiding this comment.
Additionally I think we should evict the current MultiTennant filter from the main repo (maybe just delete it, maybe to its own (in|out)side the org). Its served it purpose as a POC but we have evolved our thinking away from such a monolithic design.
There was a problem hiding this comment.
I second the idea of removing the existing Multitenant filter, I think moving it to its own repo (which could then be archived) is a good idea, so that if anyone is actually using it (or just using it as inspiration) the code is still somewhere relatively accessible.
There was a problem hiding this comment.
MT should be deprecated first, then removed, otherwise we are not living up to the pledges we make in our existing deprecation policy.
I think having MT,marked deprecated, in a 1.0 does no harm.
|
@tombentley, thanks for writing this up. |
SamBarker
left a comment
There was a problem hiding this comment.
While I'm supportive of a move to 1.0 I have a few caveats around the proposed model.
| 1. Binary compatibility of the Java APIs of the following modules: | ||
| * `kroxylicious-api` | ||
| * `kroxylicious-kms` | ||
| 2. The format of the proxy configuration file, including the configuration of certain plugins provided by the project. Compatibility here means that a released version `1.y` would accept any configuration file that was accepted as valid in any released `1.x` version, where `y > x`. "Valid" means the proxy starts up cleanly. |
There was a problem hiding this comment.
I think the guarantee has to extend to all plug-ins included in the kroxylicious repo and distribution - we are aiming for clarity of message and future.
It would also imply we could only remove a plugin at a major version bump.
I think we should be making reference to our deprecation policy here as well and possibly making it explicit what that means for official plug-ins and not just APIs.
Do we want to make such a strong assertion about config loading? That any minor release will read the prior config format. I'm wondering about a weaker guarantee along the lines of:
All minor releases of a major version will be able to load the major versions config file or provide an automated migration.
I'm specifically thinking of something based around OpenRewrite's yaml recipe .
|
|
||
| # Patch releases | ||
|
|
||
| We propose to provide patch releases for current `1.x`, plus up to the previous two minor releases. I.e. If the current release is `1.x.y` then we are prepared to release a `1.x.(y+1)`, `1.(x-1).z` and `1.(x-2).z`. |
There was a problem hiding this comment.
I'm wondering if the example would be better expressed as a table.
|
|
||
| # Patch releases | ||
|
|
||
| We propose to provide patch releases for current `1.x`, plus up to the previous two minor releases. I.e. If the current release is `1.x.y` then we are prepared to release a `1.x.(y+1)`, `1.(x-1).z` and `1.(x-2).z`. |
There was a problem hiding this comment.
Given what we say about API(java/config) stability why would we offer anything other than roll forward to the latest patch release?
I am very wary about building up a branch of un-released commits/fixes. While I accept releases have a cost, ours are fairly cheap and getting cheaper (we keep automating stuff) I think we should aim to release every minor release branch we start. However i'd also advocate for lazy branching around this. Only start a 1.x-1 branch if and when requires
There was a problem hiding this comment.
The bleeding edge is more or less the first time our new code comes into contact with the real world. It's a likely place for performance etc to degrade. Conservative Users may be putting that version through it's paces in test environments before adopting in production, so supporting a couple of minor versions could be a good idea.
There was a problem hiding this comment.
@SamBarker the language around the number of maintained minor releases was based on what I thought we had discussed and agreed out of band, but maybe I misunderstood. I added the 6 month thing off my own bat purely as a way of making the commitment less open-ended. I'm happy to change it if that's not the consensus we reached, but we (the maintainers on the hook for actually making good on this) all need to agree what we're willing to do here.
Personally I feel that just "roll forward" is not really good enough. It means that users have to accept the additional risk of taking a bunch of unrelated features in order to get a fix. That's just not always acceptable in production, and I think it therefore stands to harm adoption in production environments.
Therefore I would advocate for:
- Creating a
1.xbranch at the time we release1.x. - Cherry picking to this branch. What gets cherry-picked is partly up to the community. If someone from the community it willing to open a charry-pick PR targeted at
1.xthat's great, and we should endevour to get that into a release. This is why a1.xbranch needs to be created when releasing1.x.0, so that anyone can just go ahead and open such a PR. - However, I don't think we should be doing a release simply because a single commit got made to the
1.xbranch. The costs of a release are not just bourne by the maintainers who are clicking the buttons on some automation (however fancy). There is a cost for the whole community in terms of assessing whether or not to pick up the new1.x.yrelease, and in actually rolling it out. It's no good the project pushing out a new patch release every week if users cannot keep up. We also want the community to participate in testing. For example, if Billy Bob opened a cherry-pick PR that's going to be included in a release then Billy Bob might well want to confirm that the proposed1.x.yrelease fixes their problem. Bear in mind what's in the release might be more than just Billy Bob's PR, so it reasonable to want to be sure that none of the other fixes as regressed Billy's fix. Also it's possible that the maintainers haven't been able to reproduce Billy's problem (e.g. maybe it's hardware specific), so their feedback is invaluable. - For all these reasons it's I think it's inevitable that some number of fixes get merged to the
1.xbranch, building up, before a release of1.x.yhappens. This is OK! We can still decide to expedite a release if a particularly terrible bug/CVE is found. But otherwise batching a bunch of fixes actually makes the eventual release more consumable, and more likely to actually be used (because the risk/benefit looks more favourable if a release fixes several problems rather than a single one). And releases being used is, after all, why we do them.
We need to decide how many 'live' release branches we want on the go. Obviously the effort here is multiplicative, so we need to be realistic. Apache Kafka generally manages to release two patch releases for each minor (see https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan), and they're usually spaced about 3-4 months apart. Kafka also aims for 3 a.b.0 releases a year. This averages out to there being 3 'next releases' at any one time: A .0, a .1 and a .2. Personally my gut feeling is we could sustain something similar, but note the implication that we slow down the pace on the .0 releases.
So then the final loose end is when and how we stop making releases from a minor branch. Here the Apache Kafka project doesn't really have anything written down. But in practice patches do get merged to minor branches which are never publicly released. However, they are available to those motivated to do their own building and testing. That seems like a reasonable balance to me: The project is not obliged to release them, but the community has the ability (and some motivation) to contribute those patches to the project.
cc @k-wall @gracegrimwood @robobario @hrishabhg, because this discussion is kinda central to the whole thing.
There was a problem hiding this comment.
Only start a 1.x-1 branch if and when requires
On this point I would value having the release branches created as part of the standard workflow so that we don't have diverging release flows (release latest off main, vs other releases off release branch). Having them exist before we need them will enable community members to PR backports directly to them without having to request a branch creation. It makes things more democratic and approachable if all the processes are in place and Contributors just have to review/approve things and pull the release levers.
There was a problem hiding this comment.
I'm also wondering if it would be manageable to have things more time based. So have a scheme like:
- the latest minor version is actively supported
- when a new minor version is created, on date X ,the previous minor version is moved to a different support level
- the previous minor now has support dates set based on date X, like 3 months of bugfixing and 6 months of security fixing for example.
This would give Users an easy to interpret table of dates for when a minor goes EOL. And users would know that they are getting at least that many months of support for the latest minor version. Under this scheme there would always be a transition period when we put out a new minor version where we support two versions, where I think with the current proposal if you didn't release for 6 months the prior minor version would be unsupported on day 0 of the new release.
current actively supported version : 1.4.x
old versions:
version | bugfix support until | security support until (EOL) |
1.3.x | 01-03-2026 | 01-06-2026 |
1.2.x | 01-02-2026 | 01-05-2026 |
...
But this could put us on the hook for more than the proposed 3 minor versions of maintenance. It feels like, as Tom said, we will probably slow down the minors due to these new processes and guarantees though. We may end up following Kafka's cadence, to keep up we have to bump kafka-clients and take minor or major changes to the Filter API.
Maybe those concerns are manageable by keeping the lengths of time tight enough
There was a problem hiding this comment.
This got suuuuper long, sorry 😅
Here's a sort of TL;DR of my thoughts:
- We should be actively creating and maintaining new patch releases for the last two minor versions, as well as the current minor version (i.e.
1.x,1.x-1, and1.x-2). To limit the overhead, there should be no more than 3 actively maintained minor versions at any time. - The EOL date for the oldest active minor version
1.x-2should be upon the release of the third subsequent minor version1.x+1or six months after the initial release of the expiring version1.x-2, whichever comes first. - We shouldn't try to schedule the EOL or maintenance windows of releases ahead of time, so we don't overcommit ourselves. We could revisit this after we've reached
1.0.0and tried out a less rigid maintenance schedule for a bit. - We should stop merging PRs to a release branch once that version is no longer receiving new patch releases; the purpose of these branches is to create patch releases to serve users of the proxy, anything beyond that is unnecessary extra work. We could revisit this later if there is demand for it.
- We should have release branches for minor releases in addition to the existing release tags, and these should be created during their initial (
minor.0) release by our release automation. These branches can then accept PRs for backporting fixes, and new patch releases for their respective versions can be made from those branches. Patch releases only need release tags, not branches. - Backports should be community-driven via raising PRs for backports against active release branches. Maintainers would be responsible for approving these PRs, backporting high-severity security fixes, and for cutting patch releases.
- The timing of patch releases should be driven by the community through a proposal process (separate from this repo), where a member of the community proposes a new patch release and others can vote on it. Urgent patch releases (such as for high-severity security issues) should bypass this. If we find we are creating patch releases too often, we could look at implementing rules to limit this.
- Maintaining multiple minor releases might mean we're inclined to put out minor releases less often but it shouldn't mean the pace of development is meaningfully slower. Also, being more mindful about what we're releasing and how often might not be a bad thing.
- Maintaining multiple minor releases will mean more things for us to review and more times we have to run the release workflow, but that would happen to some extent anyway as the project grows.
- Not providing any active maintenance for prior releases and instead taking a stance of telling users to just roll forward to the latest version risks alienating a significant number of people who might otherwise use Kroxylicious, and isn't in the best interests of the project.
- We really should not be talking about any of this as "support". It muddies the water as to what it is we're trying to accomplish as a project, so we should use the term "active maintenance" or just "maintenance" if we can.
the language around the number of maintained minor releases was based on what I thought we had discussed and agreed out of band
We discussed the idea of maintaining and issuing patches for 1.x, 1.x-1, and 1.x-2, but I don't know if we actually reached an agreement. Either way, it sounds like a good number to me; any less isn't really worth it, any more could become unsustainable.
I added the 6 month thing off my own bat purely as a way of making the commitment less open-ended.
I think 6 months is fine as a sort of hard-coded "expiry date" for minor releases. We currently release much more frequently than that, so I reckon it's unlikely any release would hit that deadline anyway. Even if we did let the oldest minor release "expire", having another minor release in there between oldest and latest means users still have a buffer zone to work in, and 6 months seems like plenty of time to arrange a proxy upgrade to a new minor version. We can revisit that number later on if we need to, but I think it's fine.
Personally I feel that just "roll forward" is not really good enough. It means that users have to accept the additional risk of taking a bunch of unrelated features in order to get a fix. That's just not always acceptable in production, and I think it therefore stands to harm adoption in production environments.
I fully agree. We can't know every user's environment and risk profile, and we risk alienating potential users if our response to requests for backported fixes is to just install the latest version. I don't think that stance is in the best interests of the project, even if it would mean less maintenance work for us. Besides, if it was feasible for everyone to just roll forward all the time, nobody would make any money selling LTS versions of things 😉
- Creating a
1.xbranch at the time we release1.x.- Cherry picking to this branch. What gets cherry-picked is partly up to the community.
Agreed, starting out with a major.minor minor release branch at initial major.minor.0 release time to serve as the source of all future major.minor.x patch releases is the way to go. As @robobario suggested, let's have our release automation create it for us, perhaps when it creates the release tag? Then we let the community put up PRs against these minor release branches with backported fixes, and our job is mostly to approve or not, as we currently do with PRs to main.
Patch releases can then be tagged in git as usual but won't get their own release branches. We shouldn't put out patches of patch releases like major.minor.patch.x, as per semver, so any fixes should instead be backported to the respective minor branch and then released in the next patch release of that minor version. For example, if someone wanted to backport a fix to release major.minor.patch then instead they would raise a PR against the major.minor release branch, and that would get released as major.minor.patch+1.
I think there's still definitely a bigger role for us as maintainers in this process, like urgent backporting of high-severity exploitable CVEs, but otherwise I think this process really should be driven by the people using the proxy on the daily, as they're best placed to know which fixes they need and don't need. We can step in when something that isn't a fix (i.e. a feature) gets suggested, for anything urgent, or for anything that's too complicated to backport, but for the most part I see this as the kind of system that starts to run itself.
- However, I don't think we should be doing a release simply because a single commit got made to the
1.xbranch. The costs of a release are not just bourne by the maintainers who are clicking the buttons on some automation (however fancy). There is a cost for the whole community in terms of assessing whether or not to pick up the new1.x.yrelease, and in actually rolling it out. It's no good the project pushing out a new patch release every week if users cannot keep up. [...]- For all these reasons it's I think it's inevitable that some number of fixes get merged to the
1.xbranch, building up, before a release of1.x.yhappens. This is OK! We can still decide to expedite a release if a particularly terrible bug/CVE is found. But otherwise batching a bunch of fixes actually makes the eventual release more consumable, and more likely to actually be used
Yeah, batching fixes is the way to go. As for timing, I think the we should handle this through proposals, honestly. Maybe not anything as big or elaborate as what we do here in this repo, but having a forum or something (like Slack or GitHub discussions or just raising an issue with a special tag on it) where community members can suggest that they think it's time for a patch release of whatever active minor version and then others can vote on that idea. Obviously for anything high-risk we could just go ahead with creating a patch release anyway, but for everything else this lets us take a more hands-off approach.
If, by some chance, that proposal process gets out of hand and folks do start requesting new patch releases every week, then we could look at instituting some kind of limit on how often we will release new patch versions, but that's the kind of bridge we should cross if/when we get to it. I think trying to come up with a schedule for this is a bit of a fool's errand because there will be some times we get to a patch release date and there's nothing worth releasing, and some times when something will show up well ahead of schedule and we'll want to put patches out ASAP. We're better off letting the people who use the proxy determine (for the most part) when new patch releases are required.
We need to decide how many 'live' release branches we want on the go. Obviously the effort here is multiplicative, so we need to be realistic.
I think if we stick to only having branches for minor releases then the number of actively maintained "live" branches comes down to how many minor releases we intend to create new patch releases for. Like I said above, I think the current and the two previous minor releases (1.x, 1.x-1, and 1.x-2) is a good number. I don't think it means we will have to slow development of new features (.0's) because we'd be leaning on the community to do most of the heavy lifting as far as maintenance effort goes, but it might change how often we cut new releases from main and make us think more carefully about what goes into them, which isn't necessarily a bad thing. And of course, all of this will mean more things for us to review and more releases, but we're going to get more things to review and more stuff to release as our contributor base grows anyway.
We'll also have to keep in mind what happens when we hit a new major version, but I don't think that's an unnavigable problem, nor a particularly pressing one at this juncture.
So then the final loose end is when and how we stop making releases from a minor branch. Here the Apache Kafka project doesn't really have anything written down. But in practice patches do get merged to minor branches which are never publicly released. However, they are available to those motivated to do their own building and testing. That seems like a reasonable balance to me: The project is not obliged to release them, but the community has the ability (and some motivation) to contribute those patches to the project.
I've sort of said this already, but if we're only promising patch releases for the previous two minor versions, then the answer to "when do we stop releasing patches for 1.x-2?" is "once we've released 1.x+1", or at that 6 month expiration date, whichever comes first. Obviously, if we're churning out a new minor release every week then we're going to need to reassess both how many minor versions we're maintaining and how long we're maintaining them for. Though, if we were in a position to be churning out minor releases that frequently I think this whole discussion would be very different.
I think "once we've released 1.x+1" also answers the question of "when do we stop accepting PRs to the release branch for 1.x-2?", because if it's not going in a patch release then it doesn't need backporting IMO. The point of this is to make releases and upgrades of Kroxylicious safer, more stable, and more reliable for the people who use it, so merging backports that aren't going to end up in releases seems a bit like wasted effort. In saying that, there is some reasonable demand for this in the future, and we feel it wouldn't add any unreasonable burden on us to maintain, then we could do so. If we draw a hard line now we can always bend it later if we need to, but I don't think we need to do so pre-emptively.
I'm also wondering if it would be manageable to have things more time based. So have a scheme like:
- the latest minor version is actively supported
- when a new minor version is created, on date X ,the previous minor version is moved to a different support level
- the previous minor now has support dates set based on date X, like 3 months of bugfixing and 6 months of security fixing for example.
This would give Users an easy to interpret table of dates for when a minor goes EOL. And users would know that they are getting at least that many months of support for the latest minor version.
This is what NodeJS does and, to an extent, what Ubuntu does something something pro subscriptions and paid support tiers.
I don't dislike this model (quite the opposite actually, as a user it's great) but it feels very strict and rigid given where the project is at currently, and I just don't feel that it's the right fit for us right now. I think it also implies a level of "support" (I don't mean to nitpick on language but I really don't like using this word here, I think we should be calling this "maintenance" because that's what we're actually proposing to do) that I don't think we're trying to offer. It is not within our scope as an open source project to be offering —or appearing to offer— anything close to enterprise support, which is why I think we should be talking about "actively maintained" releases rather than "supported" releases.
If we are wanting some kind of tapered maintenance model then we could say something like "version 1.x-1 gets all the backports while version 1.x-2 only gets CVE fixes until we release 1.x+1", but trying to come up with a schedule for this at this stage isn't really feasible because we have no way of knowing how realistic these kinds of timelines are yet. We can guess, but we don't actually know. We aren't Kafka or NodeJS or Ubuntu; our development effort is not their development effort, our maintenance burden is not their maintenance burden, and we can only model ourselves off of them to a point. A system like this, with stricter timelines and more assurance for users, is the kind of thing we could revisit in the near future once we've actually reached 1.0.0 and tried out something a little more flexible, at which point we'll have a better idea of how much extra work we've actually created for ourselves and what kinds of timelines are actually practical.
Anyway, I'm off to bed. If you read to the end of this: here, have a cookie 🍪
| 1. Binary compatibility of the Java APIs of the following modules: | ||
| * `kroxylicious-api` | ||
| * `kroxylicious-kms` | ||
| 2. The format of the proxy configuration file, including the configuration of certain plugins provided by the project. Compatibility here means that a released version `1.y` would accept any configuration file that was accepted as valid in any released `1.x` version, where `y > x`. "Valid" means the proxy starts up cleanly. |
There was a problem hiding this comment.
Additionally I think we should evict the current MultiTennant filter from the main repo (maybe just delete it, maybe to its own (in|out)side the org). Its served it purpose as a POC but we have evolved our thinking away from such a monolithic design.
gracegrimwood
left a comment
There was a problem hiding this comment.
I'm really happy we've started this discussion, I think it's a great move for the project. Thanks @tombentley for putting this together! 🎉
| 1. Binary compatibility of the Java APIs of the following modules: | ||
| * `kroxylicious-api` | ||
| * `kroxylicious-kms` | ||
| 2. The format of the proxy configuration file, including the configuration of certain plugins provided by the project. Compatibility here means that a released version `1.y` would accept any configuration file that was accepted as valid in any released `1.x` version, where `y > x`. "Valid" means the proxy starts up cleanly. |
There was a problem hiding this comment.
I second the idea of removing the existing Multitenant filter, I think moving it to its own repo (which could then be archived) is a good idea, so that if anyone is actually using it (or just using it as inspiration) the code is still somewhere relatively accessible.
| # Patch releases | ||
|
|
||
| We propose to provide patch releases for current `1.x`, plus up to the previous two minor releases. I.e. If the current release is `1.x.y` then we are prepared to release a `1.x.(y+1)`, `1.(x-1).z` and `1.(x-2).z`. | ||
| However, the "up to" is limited to 6 months after the initial minor release. |
There was a problem hiding this comment.
Is there a distinction between the current latest minor and the 2 prior ones here? I'm wondering if somewhere we should say the latest minor will be bugfix supported beyond 6 months.
|
|
||
| # Patch releases | ||
|
|
||
| We propose to provide patch releases for current `1.x`, plus up to the previous two minor releases. I.e. If the current release is `1.x.y` then we are prepared to release a `1.x.(y+1)`, `1.(x-1).z` and `1.(x-2).z`. |
There was a problem hiding this comment.
Should it be more like this? (current version + 2 patches or current version, (current version - 1) patch and (current version - 2) patch.
| We propose to provide patch releases for current `1.x`, plus up to the previous two minor releases. I.e. If the current release is `1.x.y` then we are prepared to release a `1.x.(y+1)`, `1.(x-1).z` and `1.(x-2).z`. | |
| We propose to provide patch releases for current `1.x`, plus up to the previous two minor releases. I.e. If the current release is `1.x.y` then we are prepared to release a `1.(x-1).(y+1)` and `1.(x-2).(y+2)`. |
OR
| We propose to provide patch releases for current `1.x`, plus up to the previous two minor releases. I.e. If the current release is `1.x.y` then we are prepared to release a `1.x.(y+1)`, `1.(x-1).z` and `1.(x-2).z`. | |
| We propose to provide patch releases for current `1.x`, plus up to the previous two minor releases. I.e. If the current release is `1.x.y` then we are prepared to release a `1.x.(y+1)` and `1.x.(y+2)`. |
There was a problem hiding this comment.
Holding off any change here while we work though the conversation above.
Co-authored-by: Hrishabh Gupta <hrishabhg@gmail.com> Signed-off-by: Tom Bentley <tombentley@users.noreply.github.com>
Signed-off-by: Tom Bentley <tombentley@users.noreply.github.com>
Co-authored-by: Grace Grimwood <15976368+gracegrimwood@users.noreply.github.com> Signed-off-by: Tom Bentley <tombentley@users.noreply.github.com>
Signed-off-by: Tom Bentley <tbentley@redhat.com>
|
Thanks for the initial round of reviews folks. I've made some of the more minor changes. I think we need to resolve this conversation to make further progress. |
|
Hi all, for what it's worth from our perspective this sounds like it would work well for us. Our security posture means we shouldn't be too far away from latest releases. If we were any further out than 3 we'd be getting asked to upgrade regardless. If we were one or two out and there was a fix release for a security vulnerability it'd be easier getting our current version + security vulnerability or bug into Prod than jumping possibly 2 versions and having to make sure we covered off testing for extra features in order to get that security vulnerability or bug. |
| 3. The plugins covered under item 2. are all those implementations of the APIs listed in 1. provided by the project, excluding those implementations in test packages and modules. This implies the contined inclusion of these plugins in the `1.x` series of releases. However, the existing multi-tenancy plugin will be marked as deprecated. | ||
| 4. The Kubernetes custom resource APIs defined by the kroxylicious-operator. | ||
|
|
||
| If the compatibility guarantee is broken we will treat that as a bug. |
There was a problem hiding this comment.
This only covers compatibility between Kroxylicious versions. I couldn't see anywhere compatibility with regards to Kafka. My understand is that Kroxy always tries to adopt the latest Kafka dependencies. Is that true? Should that be documented? If so it's takes the same Kafka compatibility guarantees as Kafka.
So for example if Kafka 5.0 drops support for Kafka < 3.0 (just a random example, this is not based on actual plans), Kroxylicious would also drop support for Kafka < 3.0.
There was a problem hiding this comment.
Great question.
There are two aspects to Kroxylicious's dependency on Kafka:
- We depend on the protocol itself, in the abstract. A new version of Kafka can add more APIs, or new API versions, which won't be understood by the proxy. Kafka's
ApiVersionsAPI mostly solves this, in the sense that we can force clients that useApiVersionscorrectly to only use APIs and API versions which the proxy understands. - Internally and in the current Filter API we use Kafka's own
*RequestDataand*ResponseDataclasses, as generated from the JSON IDL in the Kafka project.
Item 1 is fundamental, and unavoidable, but I think we're happy that it's not a problem in practice.
Item 2 is definitely a problem when it comes to making compatibility guarantees like we're trying to do here. The Apache Kafka project can (and has) renamed API key and fields, which ripples out to those Java classes. Apache Kafka can do this because on the wire the API key is identified by id, not by the name, and fields are defined positionally within a structure, rather than by name. When AK does this it manifests as a compile time error/linker error in any filters consume that API key. I.e. a loss of source and binary compatibility.
So we can't really live up to the promise we're trying to make here.
I can see two possible ways forward:
- Carve out a massive compatibility exception around this for Kroxylicious 1.0, and maybe try to address it during 1.x.
- Address it before we declare a 1.0.
The way we would solve it amounts to removing our dependency on kafka-clients.jar entirely, by:
- Generating our own
RequestDataandResponseDataclasses (e.g. by forking Kafka's code generator and having it generate classes in our own package namespace) - Vendoring, or rewriting, all the
MemoryRecordscode. - Dealing with the known problem around
errorResponse, so we don't depend on Kafka's hand-written*Requestclasses. - Very likely other dependencies too.
Suffice to say this is not a small piece of work,. If we wanted to solve it prior to 1.0 we would be delaying 1.0 for quite a while.
There was a problem hiding this comment.
My intention was mostly to point at these issues to be aware of them. At this point documenting them is probably enough. As you said, the effort to only relying on Kafka's public APIs is very large.
It still think it's important to consider them, for example to avoid compatibility pitfalls like we've seen in projects like Cruise Control which was stuck on an outdated Kafka version for a long time.
This introduces a proposal index (README.md) for the proposals directory to solve the problem of colliding proposal numbers. Currently, multiple open PRs have chosen proposal numbers that conflict with already-merged proposals. This creates confusion and makes it difficult to maintain stable identifiers for proposals - which is essential for productive discussion on the mailing list. The new process: 1. Authors create proposals with temporary filenames (nnn-*) 2. A separate PR to this README allocates the next sequential number 3. Authors rename their proposal once the number is allocated 4. The index is updated again when proposals are accepted This ensures: - Proposal numbers are allocated chronologically - No number collisions occur - Each proposal gets a stable identifier before mailing list discussion - Clear visibility of open vs. accepted proposals Current collisions resolved by this index: - PR kroxylicious#70, kroxylicious#82, kroxylicious#83, kroxylicious#88, kroxylicious#89 all have numbers conflicting with accepted proposals and have been allocated new sequential numbers (016-024) Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This introduces a proposal index (README.md) for the proposals directory to solve the problem of colliding proposal numbers. Currently, multiple open PRs have chosen proposal numbers that conflict with already-merged proposals. This creates confusion and makes it difficult to maintain stable identifiers for proposals - which is essential for productive discussion on the mailing list. The new process: 1. Authors create proposals with temporary filenames (nnn-*) 2. A separate PR to this README allocates the next sequential number 3. Authors rename their proposal once the number is allocated 4. The index is updated again when proposals are accepted This ensures: - Proposal numbers are allocated chronologically - No number collisions occur - Each proposal gets a stable identifier before mailing list discussion - Clear visibility of open vs. accepted proposals Current collisions resolved by this index: - PR kroxylicious#70, kroxylicious#82, kroxylicious#83, kroxylicious#88, kroxylicious#89 all have numbers conflicting with accepted proposals and have been allocated new sequential numbers (016-024) Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Keith Wall <kwall@apache.org>
This introduces a proposal index (README.md) for the proposals directory to solve the problem of colliding proposal numbers. Currently, multiple open PRs have chosen proposal numbers that conflict with already-merged proposals. This creates confusion and makes it difficult to maintain stable identifiers for proposals - which is essential for productive discussion on the mailing list. The new process: 1. Authors create proposals with temporary filenames (nnn-*) 2. A separate PR to this README allocates the next sequential number 3. Authors rename their proposal once the number is allocated 4. The index is updated again when proposals are accepted This ensures: - Proposal numbers are allocated chronologically - No number collisions occur - Each proposal gets a stable identifier before mailing list discussion - Clear visibility of open vs. accepted proposals Current collisions resolved by this index: - PR kroxylicious#70, kroxylicious#82, kroxylicious#83, kroxylicious#88, kroxylicious#89 all have numbers conflicting with accepted proposals and have been allocated new sequential numbers (016-024) Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Keith Wall <kwall@apache.org>
|
the future of rapid experimentation Once we release 1.0.0 some things will become much easier to discuss, for example we can talk about deprecating major features with a defined removal time of 2.0.0. But on the other hand, using a single sem-ver for all our APIs (filter configs, CRDs, proxy config, shell scripts) may be restrictive when it comes to putting out experimental features and APIs. For example if I am designing some new feature, I may have some idea that is not fully proven but I want to get it out there for users to experiment with. I am uncertain that the API of this feature is stable, it may need to be reworked when we accumulate more feedback. Having to completely abide by the project sem-ver is uncomfortable since the feature is still mid-design, we may need to break the API quickly. Some questions raised are:
edit: I can imagine this is easyish for new Filters as we can control when they enter our defined "Public API", but what about experimenting within established modules, like an experimental extension to the Proxy runtime |
|
@robobario that is a great question.
In terms of our design approval process, don't think we need to have a completely rigid view. But I think my starting expectation would be that each API version would have at least one proposal associated with it. By doing that I think it's clearer to everyone what the bar for agreeing a proposal to add an alpha API would be much lower then a beta which would be lower again that a final/ga. By definition in an alpha API we don't fully understand the problem and/or solution. So we don't need to crystal-ball gaze so much when reviewing the API. The current situation and motivation sections of beta and final versions become a summary of what we've learned from the previous version(s). Finally I think this model also allows us a principled way of describing the Kafka dependency which Mickael raised. Our |
|
I think having a Kroxylicious version number and another version tagging scheme (v[0-9]+) is a recipe for confusion. If we need to weaken the guarantee for new API surface, I'd suggest we use a Java annotation like However, I do have YAGNI ringing in my head. I'd suggest we try and live by semver. If it gets too costly, we can think alternatives. |
|
Stability levels: the WildFly model as a starting point Great discussion @robobario and @tombentley. I've been looking at how other projects handle this The modelWildFly defines four stability tiers — experimental, preview, community, default. But the other three levels map well onto our needs:
The important split is between experimental and preview. Experimental is "I don't even know if In WildFly, each feature is tagged with a stability level and users opt in at startup via a flag How this maps onto our surfacesFilters and filter config: the cleanest case. A filter implementation declares its stability Evolving existing filters (e.g. adding record key encryption to RecordEncryption, or enabling Proxy-level config: similar to the above — new proxy config options could be tagged at the Operator CRDs: our CR-per-filter model ( Java APIs (
My instinct is that annotations are less ceremony for the same outcome, especially given that the Opt-in mechanismThe stability threshold should live on the apiVersion: operator.kroxylicious.io/v1alpha1
kind: KafkaProxy
spec:
stabilityLevel: preview # default: gaFor bare proxy usage, the equivalent in proxy config or a startup flag. What happens when config references a feature below the threshold?WildFly hard-rejects (the server won't start). That's clean but creates crash loops in a GitOps Silently dropping the feature and serving traffic through a reduced filter chain is worse — a user I think the right answer for us is: refuse the virtual cluster, not the proxy.
In both cases: never silently degrade the data path, never crash the whole proxy over one What this doesn't solve
Relationship to this proposalI don't think we need to fully design this before 1.0, but I think the 1.0 proposal should at That way 1.0 isn't a straitjacket — we've explicitly reserved the right to ship experimental things |
I don't think so. It's worked for the Kubernetes project. I don't seem people there looking confused. On the contrary it brings clarify, because they've decomposed the messy reality behind the top level version number into identified APIs and feature flags and version them in a consistent way.
I think you need to acknowledge that we have already adopted the kubernetes api versioning for our kubernetes model. So the "complication" of having multiple version numbers floating around already exists in the project.
It's already too costly. You might have a mental model in your head about what all the various APIs are, and whether or not we consider them supported, but I can almost guarantee it's incomprehensible to anyone new to the project. You have to read the docs. You can easily accidentally use APIs which we don't consider fully supported. It's a mess. What I'm proposing is an alternative. To me the decision seems to be between:
It seems to me that the wildfly model is broadly similar to the Kubernetes model, modulo syntax and proposal/approval processes. When I talked to @robobario about this earlier this week I was in the "Alpha doesn't require anything" camp too, so I could live with that. I definitely want to lower barriers to trying things out in a way that end users can easily experiment with. The reason I suggested to have some process even for Alpha is simply so we have a record of "In I'm guessing you've not yet read #96. It's still marked a Draft, so I'm not blaming you for not having take a look yet, but please could you before we get into some bikeshedding discussion about how to version configurations.
Technically there is at least a 3rd way -- max out on class loader isolation by linking plugins to the API version. But we're approximately 1,000,000 miles from that. Versioning packages suffers from the terrible drawback that our packages are not versioned already, so trying to switch to that is itself a breaking change. So practically speaking I agree with you that annotations would be good enough. That would be dead simple. That being the case, I think we should do this for 1.0.
That does not work: We support the proxy outside Kube, where there is no
It is a coupling problem, but the only solution to that is to change the API. Thus the coupling problem is itself coupled to the versioning problem. I.e. we will need the equivalent of a " So when you view it like that we might as well say that our current |
|
@tombentley thanks for the detailed response, and for pushing me to read #96 — it changed how I think about some of this. Where I think we agreeAnnotations for Java API stability, not packages. Agreed, and I think doing this for 1.0 is the right call. FilterFactory is effectively v1beta1. The Kafka coupling means we can't deliver the binary compatibility promise that v1 implies. The 1.0 proposal should be honest about this rather than making a guarantee we can't keep. Calling it beta gives us the vocabulary to describe the situation accurately: the feature is here to stay, but the API surface may break when upstream Kafka changes its IDL. #96 handles plugin config versioning and per-plugin opt-in. After reading that proposal, I agree we don't need a separate stability mechanism for plugin configurations — the version string ( User experience and policy enforcementI want to push back on one dimension though: who are we optimising for? Between #82 and #96, we're introducing a lot of versioning machinery: project version (semver), proxy config version, per-plugin config versions, K8s naming conventions. That machinery is necessary — but it should be our problem, not the user's. We're paid to think so users don't have to. Our usual design mantra applies here: easy things should be easy, other things should be possible. Deploying a stable proxy with GA features should be easy — no versioning concepts required. Experimenting with alpha features, writing third-party plugins, evolving config schemas — those should be possible, but the complexity involved is opt-in, not load-bearing for the common case. This has implications for #96 that I'll raise there separately, but for #82 it means the proposal should state this as a design principle alongside the stability levels and compatibility guarantees. There's also a related concern on the enforcement side. Per-plugin opt-in covers the user who wants an experimental feature, but not the admin who wants to prevent them. If anyone who can write a config file can opt in to What the 1.0 proposal should sayPulling this together, I think the proposal needs:
|
|
I haven't left myself another time to reply to this properly today. I see I'm in minority of one with my YAGNI concern. Happy to go with annotation based API stability markers if that the will.
Fair point. Maybe that's a smell we should think adopting Jigsaw sooner, so it is crystal clear what's intended to be public API and what's not. (Full disclosure - I've never actually implemented Jigsaw in anything other than a Helloworld so I don't know how hard it will actually be). |
|
Discussed at Community Call 23rd April. Tom is planning to rework the proposal factoring in recent ideas/discussions on API stability. |
|
FYI I've opened #103 for the Java API versioning part. I propose that we try to get that accepted, and the configuration versioning aspect accepted, then circle back to Kroxylicious 1.0 specifically. |
This introduces a proposal index (README.md) for the proposals directory to solve the problem of colliding proposal numbers. Currently, multiple open PRs have chosen proposal numbers that conflict with already-merged proposals. This creates confusion and makes it difficult to maintain stable identifiers for proposals - which is essential for productive discussion on the mailing list. The new process: 1. Authors create proposals with temporary filenames (nnn-*) 2. A separate PR to this README allocates the next sequential number 3. Authors rename their proposal once the number is allocated 4. The index is updated again when proposals are accepted This ensures: - Proposal numbers are allocated chronologically - No number collisions occur - Each proposal gets a stable identifier before mailing list discussion - Clear visibility of open vs. accepted proposals Current collisions resolved by this index: - PR kroxylicious#70, kroxylicious#82, kroxylicious#83, kroxylicious#88, kroxylicious#89 all have numbers conflicting with accepted proposals and have been allocated new sequential numbers (016-024) Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Keith Wall <kwall@apache.org>
This change simplifies the proposal numbering system by using PR numbers as proposal identifiers, eliminating number collisions and removing the need for a separate allocation process. Changes: - Simplified proposals/README.md to focus on author workflow - Removed index tables (directory listing serves as the index) - Streamlined instructions for creating and renaming proposals - Updated proposal template with workflow instructions - Require PR number in title format: # <PR#> - <Title> - Moved workflow instructions into comment block - Added GitHub workflow to automatically check proposal numbering - Validates both filename and title format - Updates PR description when proposal files don't match PR number - Provides exact commands to fix naming issues - Removes warning once corrected - Handles both added and renamed files - Runs on all PRs (ready for mandatory status check) - Added notification script for existing open PRs - After merge, run notify-open-prs.sh to ask authors to rebase - Workflow will automatically guide them through renaming - Updated with all current open proposal PRs (kroxylicious#70, kroxylicious#82, kroxylicious#83, kroxylicious#85, kroxylicious#88, kroxylicious#93, kroxylicious#94, kroxylicious#96, kroxylicious#98, kroxylicious#99, kroxylicious#100, kroxylicious#101, kroxylicious#103) Proposals 001-019 retain their original numbers. Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Keith Wall <kwall@apache.org>
This change simplifies the proposal numbering system by using PR numbers as proposal identifiers, eliminating number collisions and removing the need for a separate allocation process. Changes: - Simplified proposals/README.md to focus on author workflow - Removed index tables (directory listing serves as the index) - Streamlined instructions for creating and renaming proposals - Updated proposal template with workflow instructions - Require PR number in title format: # <PR#> - <Title> - Moved workflow instructions into comment block - Added GitHub workflow to automatically check proposal numbering - Validates both filename and title format - Updates PR description when proposal files don't match PR number - Provides exact commands to fix naming issues - Removes warning once corrected - Handles both added and renamed files - Runs on all PRs (ready for mandatory status check) - Added notification script for existing open PRs - After merge, run notify-open-prs.sh to ask authors to rebase - Workflow will automatically guide them through renaming - Updated with all current open proposal PRs (#70, #82, #83, #85, #88, #93, #94, #96, #98, #99, #100, #101, #103) Proposals 001-019 retain their original numbers. Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Keith Wall <kwall@apache.org>
|
Hi! We've updated the proposal numbering system to use PR numbers as proposal identifiers. Action required: Please rebase your PR on Once you rebase, you'll need to rename your proposal file and update the title: git mv proposals/007-kroxylicious-1.0.md proposals/082-1.0.md
# Update title: remove any old number prefix and add PR number
sed -i.bak '0,/^# /{s/^# \([0-9]\{3\}\|xxx\|nnn\|000\) - /# 82 - /; t; s/^# /# 82 - /}' proposals/082-1.0.md && rm proposals/082-1.0.md.bak
git add proposals/082-1.0.md
git commit -m "Rename proposal to use PR number"
git pushThe GitHub workflow will automatically check your proposal file naming after you push and update this PR description if any corrections are still needed. See proposals/README.md for the updated workflow. |
|
Correction: The previous notification had an incorrect filename. Here are the correct commands: git mv proposals/007-kroxylicious-1.0.md proposals/082-kroxylicious-1.0.md
# Update title: remove any old number prefix and add PR number
sed -i.bak '0,/^# /{s/^# \([0-9]\{3\}\|xxx\|nnn\|000\) - /# 82 - /; t; s/^# /# 82 - /}' proposals/082-kroxylicious-1.0.md && rm proposals/082-kroxylicious-1.0.md.bak
git add proposals/082-kroxylicious-1.0.md
git commit -m "Rename proposal to use PR number"
git pushSorry for the confusion! |
|
On Plugin Versioning. As an alternative I've put up #110 which is a smaller idea for achieving explicit versioning of our plugins, adopting k8s CRD versioning conventions to convey stability. This is just a tiny subset of what we'd get out of the full #96 but I think it could be sufficient for Kroxylicious 1.0.0 enabling us to put out alpha versioned plugin configurations, making the stability guarantees visible to Users. |
This PR proposes: