JAMES-4156 refactoring Blobstore deleted message vault for appending messages into single bucket #2902

Arsnael · 2026-01-07T09:14:02Z

Replacing #2899

…ingle bucket Keep old code as appendV1 for testing retro-compatibility

quantranhong1999

LGTM

...ssages-vault/src/test/java/org/apache/james/vault/blob/BlobStoreDeletedMessageVaultTest.java

chibenwa

Great as a first step!

I propose to wait to have the purge working on top of the v2 (in addition to the v1) in order not to ship broken code

Thanks @Arsnael for proposing this change.

Arsnael · 2026-01-07T09:59:50Z

I propose to wait to have the purge working on top of the v2 (in addition to the v1) in order not to ship broken code

Alright will add the purge refactoring on this PR then. Thanks for the feedback

EDIT: I think technically the purge with old messages works as demonstrated in the adapted test. It's just not doing it for the new single bucket yet.

chibenwa · 2026-01-07T10:22:14Z

EDIT: I think technically the purge with old messages works as demonstrated in the adapted test

Yes I know.

But not with the new.

Hence my remark.

…vault single bucket

chibenwa · 2026-01-08T07:39:11Z

CF remarks on #2894

While there's no objection continuing this work so that it's ready I would actually like to also get the "S3 contained alternative" and potentially bring up a debate on the ML on the two implementations.

Arsnael · 2026-01-08T08:01:48Z

CF remarks on #2894

I've seen yes. I've been thinking worth continuing in case, as the remarks were not widely discussed and validated with the community yet and might require more work.

… deleted blob id from string

…t design into account

Arsnael · 2026-01-09T09:43:44Z

Might have a few bits left:

probably adding a few integration tests to the existing ones
likely adding some additional infos? like number of blobs deleted for example for the single bucket?
then I still have an issue with a test, not sure what approach I should use

In DeletedMessageVaultIntegrationTest , the test vaultDeleteShouldDeleteAllMessagesHavingSameBlobContent .

Here it fails, it doesn't work anymore. I guess maybe because same blob content message used to have the same blobId, but maybe here with the different time prefix, it's not the case? Is that case still relevant or not with this new design is what I'm wondering

…t bucket design into account

...ssages-vault/src/test/java/org/apache/james/vault/blob/BlobStoreDeletedMessageVaultTest.java

Arsnael · 2026-01-12T03:01:12Z

...ain/java/org/apache/james/webadmin/integration/vault/DeletedMessageVaultIntegrationTest.java

            .hasSize(0);
    }

+    @Disabled("JAMES-4156")


As said in a comment above, I don't even understand if this case is supposed to be valid or not, even more with the new design.

…rbageCollectionTask execution

...d-messages-vault/src/main/java/org/apache/james/vault/blob/BlobStoreDeletedMessageVault.java

...ssages-vault/src/test/java/org/apache/james/vault/blob/BlobStoreDeletedMessageVaultTest.java

…VaultGarbageCollectionTask execution

Arsnael · 2026-01-15T07:00:08Z

Closing related changes to deleted message vault for a lack of consensus reason. Moving work to linagora tmail.

chibenwa · 2026-01-15T13:19:27Z

CF mailing list

jeantil · 2026-01-15T14:55:07Z

...ges-vault-cassandra/src/main/java/org/apache/james/vault/metadata/StorageInformationDAO.java

        this.removeStatement = prepareRemove(session);
        this.readStatement = prepareRead(session);
-        this.blobIdFactory = blobIdFactory;
+        this.blobIdTimeGenerator = blobIdTimeGenerator;


Why not have a custom BlobId.Factory implementation here to avoid changing the API and affecting the cassandra variant for a S3 only problem ?

check https://github.com/apache/james-project/blob/master/server/mailrepository/mailrepository-blob/src/main/scala/org/apache/james/mailrepository/blob/BlobMailRepositoryFactory.scala#L29 for an example

Hello @jeantil

-> I wonder if this piece of coude could not work with PlainBlobId as the cassandra implem is blobId transparent. That's a technical nitpick that do not address the core of your remarks but could be a nice simplification I presume. Worth testing...

a custom BlobId.Factory implementation

On the principle mostly agree. This would mean acting at the blobStoreDAO level and not on the BlobStore. Which do not seem like a major issue to me.

I;ll try to push a fix in this direction.

This would mean acting at the blobStoreDAO level and not on the BlobStore.

🤔 not sure I follow : it does require creating a custom blobstore instance but it has no impact on the blobstoreDAO itself. This is how we did it for the blob backed mailrepository

It could arguably be managed at the injection layer by using a Qualified blobid factory instance ( this is a limitation of the BlobMailRepositoryFactory which forces usage of a passthrough blobstore and cannot use deduplication but I was so happy to have something working I stopped without the deduplication, also deduplication makes less sense for a mail repository which I don't expect to store mails for years )

Ok I see

val metadataBlobStore = BlobStoreFactory.builder() .blobStoreDAO(blobStoreDao) .blobIdFactory(metadataIdFactory) .bucket(defaultBucketName) .passthrough()

So re-assembling a ad-hoc blobStore tailor made for the feature.

Interesting.

Sorry do like ... comment was hard for me to interprete.

Sorry do like ... comment was hard for me to interprete.

my apologies, I should have been clearer

This information can be deduced without hacks from the storage information.

… PlainBlobId Care is taken not to concentrate all blobs in a single folder

- Build a tailor made blobStore from blobDAO - BlobIdTimeGenerator can be static

Arsnael · 2026-01-16T08:04:19Z

LGTM, thanks for the changes @chibenwa

...d-messages-vault/src/main/java/org/apache/james/vault/blob/BlobStoreDeletedMessageVault.java

jeantil · 2026-01-16T14:08:24Z

...x/plugin/deleted-messages-vault/src/main/java/org/apache/james/vault/VaultConfiguration.java

 import com.google.common.base.Preconditions;

 public class VaultConfiguration {
+    public static final String DEFAULT_SINGLE_BUCKET_NAME = "james-deleted-message-vault";


Will this allow nesting below root ?
Imagine I can or want to have a single s3 bucket for all james related stuff. My bucket is free-hosting-foo-23402
how do I configure the Vault so that deleted messages are nested under
free-hosting-foo-23402/james-deleted-message-vault/xxxx
so I can have the mailqueue and other s3 backed store write in other subdirectories ?

No the bucket is (by default) james-deleted-message-vault
Here it is not a subfolder of the main S3 bucket.
The goal of this work is not to get a single bucket
But operate distributed james server off 3 buckets (default, james-uploads, james-deleted-message-vault)
(So as an Ops I need to create 3 buckets and not one each month)

I think using the term Bucket or bucket name for the blobstore was a mistake. it overlaps in a bad way with the s3 notions.

Here you are manipulating a blobstore bucket.
I think this can be stored as a subpath of a single underlying S3 bucket by using the prefix property on the S3 configuration
This is all very confusing (to me at least) as this prefix only allows to shift the whole namespace in which case it is impossible to split storage of some use cases among different bucket

On a more direct musing I think the previous design influences the naming too much here, the term single will become confusing quite fast. I think I would use DEFAULT_VAULT_BUCKET instead

I think using the term Bucket or bucket name for the blobstore was a mistake.

+1

But as said CF previous refactoring that may not be something easy that we are willing to change.

I think this can be stored as a subpath of a single underlying S3 bucket

I don't want this to mess up with mailbox GC. I'd rather have a separate blob store bucket if given the choice.

Also a separate blob store bucket makes loads of sense as the storage constraints are not the standard james ones (glacier?)

I think I would use DEFAULT_VAULT_BUCKET instead

Cf comment above.

I thought we agreed that this work for now just allows going from multiple S3 buckets to just one for the vault only, but it does not put all blobs under the root bucket.

The goal in the end would be to put all blobs under one S3 bucket yes. But that's not what this PR intends to do. That would require more changes and discussions.

Is it still unclear on that point @jeantil ?

Is it still unclear on that point @jeantil ?

no that's perfectly clear

The goal in the end would be to put all blobs under one S3 bucket yes.

This can be achieved today using the prefix or namespace configuration of blob.properties (sorry I don't exactly remember which I had to use, this is not so well documented) but is not really my point.

I think the constant should not contain the word SINGLE as this only makes sense in the current refactoring context. if you were building this feature from scratch today based on the new design you would not call it single.

As I understand it this is the Blobstore bucket name (so a logical bucket name) under which all the blobs of the feature implemented by the plugin will be written. The actual S3 bucket name in which this logical bucket is stored can be different based on the blob.properties configuration ( prefix and namespace will affect the actual storage namespace)

Hence my proposal to call it DEFAULT_VAULT_BUCKET (by no means definitive)

I don't want this to mess up with mailbox GC. I'd rather have a separate blob store bucket if given the choice.

I'm not sure how my proposed name change or how using the prefix/namespace features of blob.properties changes would afffect the mailbox GC ... I propose we discuss this with actual examples on the ADR. This goes beyond my intention for the PR.

jeantil · 2026-01-16T14:11:20Z

...in/deleted-messages-vault/src/main/java/org/apache/james/vault/blob/BlobIdTimeGenerator.java

+import org.apache.james.blob.api.BlobId;
+import org.apache.james.blob.api.PlainBlobId;
+
+public class BlobIdTimeGenerator {


I don't understand why the whole PR goes around the BlobId.Factory design 🤷

Please re read the interface.

BlobId.Factory is about *parsing* BlobId

This work is about *generating* them.

Maybe I am missing something, despite investing several hours on the topic. Maybe a hand on example could help?

Maybe I completely miss the point of BlobId.Factory.

My understanding is that the Factory is responsible for creating instances of BlobId that point to a specific object (generate BlobIds), either by wrapping/decorating of an id or by parsing parse a storage path and extracting the logical id.

In this case you are encoding custom paths as a PlainBlobId bypassing the factory which is not consistent with how things are done in the rest of the codebase.

I was kinda expecting a TimePrefixedBlobIdFactory (don't focus on this specific name) which would encapsulate the path manipulation. I may misunderstand by your path manipulation appears to be very similar to what is done in MailRepositoryBlobIdFactory though for a difference use case.

Hello @jeantil

My understanding is that the Factory is responsible for creating instances of BlobId that point to a specific object (generate BlobIds)

public interface BlobId { interface Factory { BlobId of(String id); BlobId parse(String id); } String asString(); }

I see no generate method.

Generation is an actual responsibility of the BlobStore, or of its caller.

Proof:

james-project/server/blob/blob-storage-strategy/src/main/scala/org/apache/james/server/blob/deduplication/PassThroughBlobStore.scala

Line 93 in aef61d6

SMono.just(Tuples.of(blobIdFactory.of(UUID.randomUUID.toString), data))

james-project/server/blob/blob-storage-strategy/src/main/scala/org/apache/james/server/blob/deduplication/DeDuplicationBlobStore.scala

Line 103 in aef61d6

private def withBlobId: BlobIdProvider[InputStream] = data => {

At the minimum a concrete clarification would be a s/of/wrap/g or s/of/decorate/g - if this sounds relevant to others I agree opening sucha PR.

Please note that in the example taken: BlobMailRepository:

val blobId = blobIdFactory.of(mailKey.asString())

The decision to rely on a MailKey as the storage identifier is not ecoded in the BlobId.Factory either.

I was kinda expecting a TimePrefixedBlobIdFactory

TBH as we are not actively using the date information encoded in the object name this present little added value to operate of a "typed" blobId.

The only use of information encoded by the blobId to a useful process is to my knowledge the GC to leverage a grace period cf

james-project/server/blob/blob-storage-strategy/src/main/java/org/apache/james/server/blob/deduplication/BloomFilterGCAlgorithm.java

Line 277 in aef61d6

.flatMap(blobId -> Mono.fromCallable(() -> blobIdFactory.parse(blobId.asString())))

Also my consern about "typed" BlobId is that it interoperate badly.

The beauty of the proposed design here (PlainBloId factory) is that we naturally handle historical data without needing some complex fallback logic for an information that is not actually needed. The BlobMailRepository thing was a new contributed code change and did not need to care with historical data. Here the vault needs to deal with data that do not fit the expected TimePrefixedBlobIdFactory formalism.

So how do we deal with this:

optional field in the blob id so that temporal info can go missing or be incorrect ?

TimePrefixedBlobIdFactory actually fallback to PlainBlobId when temporal info is missing or incorrect ?

Or as proposed here because we do not need that info to act on we just avoid writing itchy code and proceed with PlainBlobId everywhere ?

Cheers.

Arsnael added 3 commits January 7, 2026 15:31

JAMES-4156 add a property for single bucket name in vault configuration

6f18cc9

JAMES-4156 add a BlobIdTimeGenerator

4591193

JAMES-4156 append deleted messages to the new deleted message vault s…

51f4491

…ingle bucket Keep old code as appendV1 for testing retro-compatibility

Arsnael self-assigned this Jan 7, 2026

This was referenced Jan 7, 2026

Implement BlobStoreDeletedMessageVaultV2 linagora/james-project#5498

Closed

JAMES-4156 Blobstore deleted message vault v2 #2899

Closed

quantranhong1999 reviewed Jan 7, 2026

View reviewed changes

...ssages-vault/src/test/java/org/apache/james/vault/blob/BlobStoreDeletedMessageVaultTest.java Outdated Show resolved Hide resolved

...ssages-vault/src/test/java/org/apache/james/vault/blob/BlobStoreDeletedMessageVaultTest.java Outdated Show resolved Hide resolved

chibenwa approved these changes Jan 7, 2026

View reviewed changes

fixup! JAMES-4156 append deleted messages to the new deleted message …

f53bf5d

…vault single bucket

Arsnael added 2 commits January 9, 2026 12:44

JAMES-4156 add methods to calculate expirate date of blob and rebuild…

4f8649f

… deleted blob id from string

JAMES-4156 Refactoring the vault purge to take the single vault bucke…

1533006

…t design into account

fixup! JAMES-4156 Refactoring the vault purge to take the single vaul…

eba0223

…t bucket design into account

Arsnael commented Jan 12, 2026

View reviewed changes

JAMES-4156 add a value to count deleted blobs during BlobStoreVaultGa…

205cbc6

…rbageCollectionTask execution

quantranhong1999 reviewed Jan 12, 2026

View reviewed changes

fixup! JAMES-4156 add a value to count deleted blobs during BlobStore…

d6195e9

…VaultGarbageCollectionTask execution

quantranhong1999 approved these changes Jan 12, 2026

View reviewed changes

Arsnael closed this Jan 15, 2026

chibenwa mentioned this pull request Jan 15, 2026

JAMES-4156 ADR: Deleted message vault single bucket usage #2894

Open

chibenwa reopened this Jan 15, 2026

jeantil reviewed Jan 15, 2026

View reviewed changes

chibenwa added 2 commits January 15, 2026 17:30

JAMES-4164 Remove unneeded BlobIdTimeGenerator::blobIdEndTime

a8e419b

This information can be deduced without hacks from the storage information.

JAMES-4164 Simplification for BlobIdTimeGenerator by using underlying…

febcdf0

… PlainBlobId Care is taken not to concentrate all blobs in a single folder

chibenwa added 2 commits January 15, 2026 18:08

JAMES-4164 Simplify BlobStoreDeletedMessageVault constructor

e21ce2b

- Build a tailor made blobStore from blobDAO - BlobIdTimeGenerator can be static

fixup! JAMES-4164 Remove unneeded BlobIdTimeGenerator::blobIdEndTime

07768b0

quantranhong1999 reviewed Jan 16, 2026

View reviewed changes

...d-messages-vault/src/main/java/org/apache/james/vault/blob/BlobStoreDeletedMessageVault.java Outdated Show resolved Hide resolved

Improve deletion logic

1cc7a87

jeantil reviewed Jan 16, 2026

View reviewed changes

JAMES-4156 refactoring Blobstore deleted message vault for appending messages into single bucket #2902

Are you sure you want to change the base?

JAMES-4156 refactoring Blobstore deleted message vault for appending messages into single bucket #2902

Conversation

Arsnael commented Jan 7, 2026

Uh oh!

quantranhong1999 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chibenwa left a comment

Choose a reason for hiding this comment

Uh oh!

Arsnael commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chibenwa commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chibenwa commented Jan 8, 2026

Uh oh!

Arsnael commented Jan 8, 2026

Uh oh!

Arsnael commented Jan 9, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Arsnael commented Jan 15, 2026

Uh oh!

chibenwa commented Jan 15, 2026

Uh oh!

jeantil Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeantil Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeantil Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Arsnael commented Jan 16, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Arsnael Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeantil Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chibenwa Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Arsnael commented Jan 7, 2026 •

edited

Loading

chibenwa commented Jan 7, 2026 •

edited

Loading

jeantil Jan 15, 2026 •

edited

Loading

jeantil Jan 15, 2026 •

edited

Loading

jeantil Jan 15, 2026 •

edited

Loading

Arsnael Jan 19, 2026 •

edited

Loading

jeantil Jan 19, 2026 •

edited

Loading

chibenwa Jan 16, 2026 •

edited

Loading

chibenwa Jan 19, 2026 •

edited

Loading