Skip to content

perf(flagd): speed up e2e test execution via container pool and parallel scenarios#1752

Open
aepfli wants to merge 5 commits intomainfrom
feat/speed-up-flagd-e2e-tests
Open

perf(flagd): speed up e2e test execution via container pool and parallel scenarios#1752
aepfli wants to merge 5 commits intomainfrom
feat/speed-up-flagd-e2e-tests

Conversation

@aepfli
Copy link
Copy Markdown
Member

@aepfli aepfli commented Mar 30, 2026

Summary

Reduces flagd provider e2e test wall-clock time by ~75% through a pre-warmed container pool and Cucumber-level parallel scenario execution — no Surefire fork changes needed.

Changes

1. Container pool (ContainerPool + ContainerEntry)

The previous setup used a single shared Docker Compose stack for all scenarios within a runner. Since the flagd-testbed launchpad controls a single flagd process via /start, /stop, /restart, and /change HTTP endpoints, scenarios sharing one container would race on these operations and could not run concurrently.

The fix is a pre-warmed container pool: @BeforeAll starts N containers in parallel (~45s, once per JVM), and each Cucumber scenario borrows one ContainerEntry for its duration, giving it a fully isolated flagd process. After teardown the entry is returned to the pool.

Pool size is tunable via -Dflagd.e2e.pool.size=N (default: 2).

A reference counter ensures that when multiple suite runners share the same JVM (reuseForks=true), containers are only started on the first initialize() call and stopped on the last shutdown() call.

2. Parallel Cucumber scenarios

With cucumber.execution.parallel.enabled=true and fixed.parallelism=2 (matching the default pool size), scenarios within each runner execute concurrently.

Correctness safeguards via exclusive resource locks:

  • @env-var scenarios serialised behind ENV_VARS lock (requires companion PR flagd-testbed#359)
  • @grace scenarios (container restart + reconnection timing) serialised behind CONTAINER_RESTART lock
  • ConfigCucumberTest disables parallelism entirely (env-var mutations in <0.4s suite — no benefit)

3. Per-provider teardown

Replaced OpenFeatureAPI.getInstance().shutdown() (global — tears down all providers) with a per-provider NoOpProvider swap through the SDK lifecycle. This properly detaches event emitters and is safe for parallel execution.

4. Event drain fix

EventSteps now drains events up to and including the first match instead of clear()-ing the entire queue. This prevents stale events (e.g. a READY from before a disconnect) from satisfying later assertions.

Architecture

Before:
  Runner 1 (sequential) → start 1 container → scenario 1,2,...N (sequential) → stop
  Runner 2 (sequential) → start 1 container → scenario 1,2,...N (sequential) → stop
  Runner 3 (sequential) → start 1 container → scenario 1,2,...N (sequential) → stop

After (pool + parallel):
  Runner 1: start pool(2) → 2 parallel scenarios → ... → defer shutdown
  Runner 2: reuse pool    → 2 parallel scenarios → ... → defer shutdown
  Runner 3: reuse pool    → 2 parallel scenarios → ... → stop pool

Dependencies

  • flagd-testbed#359 — adds @env-var tag to config scenarios (submodule temporarily pointed at PR branch)

Draft — watching CI

Opening as draft to observe CI behaviour before merging.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables parallel end-to-end test execution for the flagd provider by implementing a ContainerPool to manage multiple Docker Compose environments. It introduces ContainerEntry and ContainerPool classes, refactors test steps to use pooled containers, and updates Maven and Cucumber configurations for parallel forking. A review comment identifies a potential resource leak in ContainerPool.initialize() and suggests using a try-finally block to ensure the ExecutorService is always shut down and to prevent container leaks if an exception occurs.

@aepfli aepfli force-pushed the feat/speed-up-flagd-e2e-tests branch from e985cfb to 8057aa4 Compare March 31, 2026 08:50
@aepfli aepfli changed the title perf(flagd): speed up e2e test execution via parallel runners and container pool perf(flagd): speed up e2e test execution via container pool and parallel scenarios Mar 31, 2026
@aepfli aepfli force-pushed the feat/speed-up-flagd-e2e-tests branch 3 times, most recently from 209cce0 to f9e647c Compare March 31, 2026 09:01
@aepfli aepfli marked this pull request as ready for review April 1, 2026 08:05
@aepfli aepfli requested a review from a team as a code owner April 1, 2026 08:05
// later assertion that expects a *new* event of the same type, while still
// preserving events that arrived *after* the match for subsequent steps.
Event matched = null;
while (!state.events.isEmpty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are w esure that no new events can be emitted while or immediately after this runs? Otherwise this loop might not be sufficient

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean, we want to specifically keep events which are generated while this loop runs. because an ready event can happen shortly after a disconnect, and while we wait for the disconnect including cleanup, we might even remove the new ready

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are worried about events shortly after a disconnect, we should wait for some time and check events afterwards or in the meantime. This loop might be done after 0 or 1 iterations, and might be done before we receive such an event that we would want to wait for

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this cleans all the events which has happened till our matched event. so cleaning the list till out event. so that if there are events in the meantime, they stay in the list, and we can match in the next check against all of them.

break;
}
}
state.lastEvent = java.util.Optional.ofNullable(matched);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming is not ideal, this is not the last event, it's the last event that matches the eventType

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the last event is for the current test state, the last tracked event. we do not care about the type. this is only used to verify some event information.

Comment on lines +59 to +60
// properly calls detachEventProvider (nulls onEmit) and shuts down the emitter
// executor — neither of which happens when calling provider.shutdown() directly.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that should happen when we call shutdown?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tests are now running in parallel, so shutting down the api is a no go, as it is also messing with other tests

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but I mean in general, not specifically this test case

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe, but currently the goal is speeding up tests ;) - when we reset a new provider, we are actually cleaning up

}

public static void shutdown() {
int remaining = refCount.decrementAndGet();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be possible that all current users call shutdown, even though there are still outstanding users, who have not called initialize yet? Then we would shutdown the pool, even though there are still tests lined up

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is happening on the beforeAll - per test suites, tests suites are not parallel anyways, there is another improvement for this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the test suites do not run in parallel, then we don't need this sync mechanism. If they do run in parallel, the scenrio in my comment could (even though it is unlikely) occur

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i ran it shortly in parallel, and the next iteration will add more flexibility to it, where all the tests are actually parallel. but this is in the next follow up pr. i shortly ran all 3 tests in parallel with some hacks, but it was not worth the effort.

"flagd.e2e.pool.size", Math.min(Runtime.getRuntime().availableProcessors(), 4));

private static final BlockingQueue<ContainerEntry> pool = new LinkedBlockingQueue<>();
private static final List<ContainerEntry> all = new ArrayList<>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be a concurrent data structure too, so that we guarantee that all changes to the list are also visible to another thread calling shutdown

@aepfli aepfli force-pushed the feat/speed-up-flagd-e2e-tests branch 2 times, most recently from 6862d3a to 51918e1 Compare April 7, 2026 10:52
aepfli and others added 2 commits April 7, 2026 14:27
Replace the single shared Docker Compose stack with a pre-warmed
ContainerPool. Each Cucumber scenario borrows its own isolated
ContainerEntry (flagd + envoy + temp dir), eliminating the process-level
contention that prevented parallel execution.

Key changes:
- ContainerEntry: encapsulates a single Docker Compose stack + temp dir
- ContainerPool: manages a fixed-size pool with acquire/release semantics
  and reference counting so multiple suite runners sharing a JVM only
  start/stop containers once
- ProviderSteps: borrows a container per scenario, replaces global
  API.shutdown() with per-provider NoOpProvider swap through the SDK
  lifecycle (properly detaches event emitters)
- State: carries the borrowed ContainerEntry and provider domain name

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Simon Schrottner <simon.schrottner@dynatrace.com>

diff --git c/providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/ContainerEntry.java i/providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/ContainerEntry.java
new file mode 100644
index 00000000..820f808
--- /dev/null
+++ i/providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/ContainerEntry.java
@@ -0,0 +1,50 @@
+package dev.openfeature.contrib.providers.flagd.e2e;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.time.Duration;
+import org.apache.commons.io.FileUtils;
+import org.apache.commons.lang3.RandomStringUtils;
+import org.testcontainers.containers.ComposeContainer;
+import org.testcontainers.containers.wait.strategy.Wait;
+
+/** A single pre-warmed Docker Compose stack (flagd + envoy) and its associated temp directory. */
+public class ContainerEntry {
+
+    public static final int FORBIDDEN_PORT = 9212;
+
+    public final ComposeContainer container;
+    public final Path tempDir;
+
+    private ContainerEntry(ComposeContainer container, Path tempDir) {
+        this.container = container;
+        this.tempDir = tempDir;
+    }
+
+    /** Start a new container entry. Blocks until all services are ready. */
+    public static ContainerEntry start() throws IOException {
+        Path tempDir = Files.createDirectories(
+                Paths.get("tmp/" + RandomStringUtils.randomAlphanumeric(8).toLowerCase() + "/"));
+
+        ComposeContainer container = new ComposeContainer(new File("test-harness/docker-compose.yaml"))
+                .withEnv("FLAGS_DIR", tempDir.toAbsolutePath().toString())
+                .withExposedService("flagd", 8013, Wait.forListeningPort())
+                .withExposedService("flagd", 8015, Wait.forListeningPort())
+                .withExposedService("flagd", 8080, Wait.forListeningPort())
+                .withExposedService("envoy", 9211, Wait.forListeningPort())
+                .withExposedService("envoy", FORBIDDEN_PORT, Wait.forListeningPort())
+                .withStartupTimeout(Duration.ofSeconds(45));
+        container.start();
+
+        return new ContainerEntry(container, tempDir);
+    }
+
+    /** Stop the container and clean up the temp directory. */
+    public void stop() throws IOException {
+        container.stop();
+        FileUtils.deleteDirectory(tempDir.toFile());
+    }
+}
diff --git c/providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/ContainerPool.java i/providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/ContainerPool.java
new file mode 100644
index 00000000..8b529d6
--- /dev/null
+++ i/providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/ContainerPool.java
@@ -0,0 +1,92 @@
+package dev.openfeature.contrib.providers.flagd.e2e;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.BlockingQueue;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.LinkedBlockingQueue;
+import lombok.extern.slf4j.Slf4j;
+
+/**
+ * A pool of pre-warmed {@link ContainerEntry} instances.
+ *
+ * <p>All containers are started in parallel during {@link #initialize()}, paying the ~45s Docker
+ * Compose startup cost only once. Scenarios borrow a container via {@link #acquire()} and return
+ * it via {@link #release(ContainerEntry)} after teardown, allowing the next scenario to reuse it
+ * immediately without any cold-start overhead.
+ *
+ * <p>Pool size is controlled by the system property {@code flagd.e2e.pool.size} (default: 2).
+ *
+ * <p>Multiple test classes may share the same JVM fork (Surefire {@code reuseForks=true}). Each
+ * class calls {@link #initialize()} and {@link #shutdown()} once. A reference counter ensures
+ * that containers are only started on the first {@code initialize()} call and only stopped when
+ * the last {@code shutdown()} call is made, preventing one class from destroying containers that
+ * are still in use by another class running concurrently in the same JVM.
+ */
+@slf4j
+public class ContainerPool {
+
+    private static final int POOL_SIZE = Integer.getInteger("flagd.e2e.pool.size", 2);
+
+    private static final BlockingQueue<ContainerEntry> pool = new LinkedBlockingQueue<>();
+    private static final List<ContainerEntry> all = new ArrayList<>();
+    private static final java.util.concurrent.atomic.AtomicInteger refCount =
+            new java.util.concurrent.atomic.AtomicInteger(0);
+
+    public static void initialize() throws Exception {
+        if (refCount.getAndIncrement() > 0) {
+            log.info("Container pool already initialized (refCount={}), reusing existing pool.", refCount.get());
+            return;
+        }
+        log.info("Starting container pool of size {}...", POOL_SIZE);
+        ExecutorService executor = Executors.newFixedThreadPool(POOL_SIZE);
+        List<Future<ContainerEntry>> futures = new ArrayList<>();
+
+        for (int i = 0; i < POOL_SIZE; i++) {
+            futures.add(executor.submit(ContainerEntry::start));
+        }
+
+        for (Future<ContainerEntry> future : futures) {
+            ContainerEntry entry = future.get();
+            pool.add(entry);
+            all.add(entry);
+        }
+
+        executor.shutdown();
+        log.info("Container pool ready ({} containers).", POOL_SIZE);
+    }
+
+    public static void shutdown() {
+        int remaining = refCount.decrementAndGet();
+        if (remaining > 0) {
+            log.info("Container pool still in use by {} class(es), deferring shutdown.", remaining);
+            return;
+        }
+        log.info("Last shutdown call — stopping all containers.");
+        all.forEach(entry -> {
+            try {
+                entry.stop();
+            } catch (IOException e) {
+                log.warn("Error stopping container entry", e);
+            }
+        });
+        pool.clear();
+        all.clear();
+    }
+
+    /**
+     * Borrow a container from the pool, blocking until one becomes available.
+     * The caller MUST call {@link #release(ContainerEntry)} when done.
+     */
+    public static ContainerEntry acquire() throws InterruptedException {
+        return pool.take();
+    }
+
+    /** Return a container to the pool so the next scenario can use it. */
+    public static void release(ContainerEntry entry) {
+        pool.add(entry);
+    }
+}
diff --git c/providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/State.java i/providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/State.java
index 2d3a227..15f555e 100644
--- c/providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/State.java
+++ i/providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/State.java
@@ -16,6 +16,11 @@ public class State {
     public ProviderType providerType;
     public Client client;
     public FeatureProvider provider;
+    /** The domain name under which this scenario's provider is registered with OpenFeatureAPI. */
+    public String providerName;
+    /** The container borrowed from {@link ContainerPool} for this scenario. */
+    public ContainerEntry containerEntry;
+
     public ConcurrentLinkedQueue<Event> events = new ConcurrentLinkedQueue<>();
     public Optional<Event> lastEvent;
     public FlagSteps.Flag flag;
diff --git c/providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/steps/ProviderSteps.java i/providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/steps/ProviderSteps.java
index 90d0822..b747672 100644
--- c/providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/steps/ProviderSteps.java
+++ i/providers/flagd/src/test/java/dev/openfeature/contrib/providers/flagd/e2e/steps/ProviderSteps.java
@@ -6,9 +6,12 @@ import static org.assertj.core.api.Assertions.assertThat;
 import dev.openfeature.contrib.providers.flagd.Config;
 import dev.openfeature.contrib.providers.flagd.FlagdOptions;
 import dev.openfeature.contrib.providers.flagd.FlagdProvider;
+import dev.openfeature.contrib.providers.flagd.e2e.ContainerEntry;
+import dev.openfeature.contrib.providers.flagd.e2e.ContainerPool;
 import dev.openfeature.contrib.providers.flagd.e2e.ContainerUtil;
 import dev.openfeature.contrib.providers.flagd.e2e.State;
 import dev.openfeature.sdk.FeatureProvider;
+import dev.openfeature.sdk.NoOpProvider;
 import dev.openfeature.sdk.OpenFeatureAPI;
 import dev.openfeature.sdk.ProviderState;
 import io.cucumber.java.After;
@@ -18,66 +21,60 @@ import io.cucumber.java.en.Given;
 import io.cucumber.java.en.Then;
 import io.cucumber.java.en.When;
 import java.io.File;
-import java.io.IOException;
-import java.nio.file.Files;
-import java.nio.file.Path;
-import java.nio.file.Paths;
-import java.time.Duration;
 import lombok.extern.slf4j.Slf4j;
-import org.apache.commons.io.FileUtils;
-import org.apache.commons.lang3.RandomStringUtils;
 import org.apache.commons.lang3.StringUtils;
 import org.testcontainers.containers.ComposeContainer;
-import org.testcontainers.containers.wait.strategy.Wait;

 @slf4j
 public class ProviderSteps extends AbstractSteps {

     public static final int UNAVAILABLE_PORT = 9999;
-    public static final int FORBIDDEN_PORT = 9212;
-    static ComposeContainer container;
-
-    static Path sharedTempDir;

     public ProviderSteps(State state) {
         super(state);
     }

     @BeforeAll
-    public static void beforeAll() throws IOException {
-        sharedTempDir = Files.createDirectories(
-                Paths.get("tmp/" + RandomStringUtils.randomAlphanumeric(8).toLowerCase() + "/"));
-        container = new ComposeContainer(new File("test-harness/docker-compose.yaml"))
-                .withEnv("FLAGS_DIR", sharedTempDir.toAbsolutePath().toString())
-                .withExposedService("flagd", 8013, Wait.forListeningPort())
-                .withExposedService("flagd", 8015, Wait.forListeningPort())
-                .withExposedService("flagd", 8080, Wait.forListeningPort())
-                .withExposedService("envoy", 9211, Wait.forListeningPort())
-                .withExposedService("envoy", FORBIDDEN_PORT, Wait.forListeningPort())
-                .withStartupTimeout(Duration.ofSeconds(45));
-        container.start();
+    public static void beforeAll() throws Exception {
+        ContainerPool.initialize();
     }

     @afterall
-    public static void afterAll() throws IOException {
-        container.stop();
-        FileUtils.deleteDirectory(sharedTempDir.toFile());
+    public static void afterAll() {
+        ContainerPool.shutdown();
     }

     @after
     public void tearDown() {
-        if (state.client != null) {
-            when().post("http://" + ContainerUtil.getLaunchpadUrl(container) + "/stop")
-                    .then()
-                    .statusCode(200);
+        if (state.containerEntry != null) {
+            if (state.client != null) {
+                when().post("http://" + ContainerUtil.getLaunchpadUrl(state.containerEntry.container) + "/stop")
+                        .then()
+                        .statusCode(200);
+            }
+            ContainerPool.release(state.containerEntry);
+            state.containerEntry = null;
+        }
+        // Replace the domain provider with a NoOp through the SDK lifecycle so the SDK
+        // properly calls detachEventProvider (nulls onEmit) and shuts down the emitter
+        // executor — neither of which happens when calling provider.shutdown() directly.
+        if (state.providerName != null) {
+            OpenFeatureAPI.getInstance().setProvider(state.providerName, new NoOpProvider());
         }
-        OpenFeatureAPI.getInstance().shutdown();
     }

     @given("a {} flagd provider")
     public void setupProvider(String providerType) throws InterruptedException {
+        state.containerEntry = ContainerPool.acquire();
+        ComposeContainer container = state.containerEntry.container;
+
         String flagdConfig = "default";
-        state.builder.deadline(1000).keepAlive(0).retryGracePeriod(2);
+        state.builder
+                .deadline(1000)
+                .keepAlive(0)
+                .retryGracePeriod(2)
+                .retryBackoffMs(500)
+                .retryBackoffMaxMs(2000);
         boolean wait = true;

         switch (providerType) {
@@ -85,25 +82,26 @@ public class ProviderSteps extends AbstractSteps {
                 this.state.providerType = ProviderType.SOCKET;
                 state.builder.port(UNAVAILABLE_PORT);
                 if (State.resolverType == Config.Resolver.FILE) {
-
                     state.builder.offlineFlagSourcePath("not-existing");
                 }
                 wait = false;
                 break;
             case "forbidden":
-                state.builder.port(container.getServicePort("envoy", FORBIDDEN_PORT));
+                state.builder.port(container.getServicePort("envoy", ContainerEntry.FORBIDDEN_PORT));
                 wait = false;
                 break;
             case "socket":
                 this.state.providerType = ProviderType.SOCKET;
-                String socketPath =
-                        sharedTempDir.resolve("socket.sock").toAbsolutePath().toString();
+                String socketPath = state.containerEntry
+                        .tempDir
+                        .resolve("socket.sock")
+                        .toAbsolutePath()
+                        .toString();
                 state.builder.socketPath(socketPath);
                 state.builder.port(UNAVAILABLE_PORT);
                 break;
             case "ssl":
                 String path = "test-harness/ssl/custom-root-cert.crt";
-
                 File file = new File(path);
                 String absolutePath = file.getAbsolutePath();
                 this.state.providerType = ProviderType.SSL;
@@ -115,12 +113,10 @@ public class ProviderSteps extends AbstractSteps {
                 break;
             case "metadata":
                 flagdConfig = "metadata";
-
                 if (State.resolverType == Config.Resolver.FILE) {
                     FlagdOptions build = state.builder.build();
                     String selector = build.getSelector();
                     String replace = selector.replace("rawflags/", "");
-
                     state.builder
                             .port(UNAVAILABLE_PORT)
                             .offlineFlagSourcePath(new File("test-harness/flags/" + replace).getAbsolutePath());
@@ -135,10 +131,10 @@ public class ProviderSteps extends AbstractSteps {
             case "stable":
                 this.state.providerType = ProviderType.DEFAULT;
                 if (State.resolverType == Config.Resolver.FILE) {
-
                     state.builder
                             .port(UNAVAILABLE_PORT)
-                            .offlineFlagSourcePath(sharedTempDir
+                            .offlineFlagSourcePath(state.containerEntry
+                                    .tempDir
                                     .resolve("allFlags.json")
                                     .toAbsolutePath()
                                     .toString());
@@ -174,26 +170,31 @@ public class ProviderSteps extends AbstractSteps {
         } else {
             api.setProvider(providerName, provider);
         }
+        this.state.provider = provider;
+        this.state.providerName = providerName;
         this.state.client = api.getClient(providerName);
     }

     @when("the connection is lost")
     public void the_connection_is_lost() {
-        when().post("http://" + ContainerUtil.getLaunchpadUrl(container) + "/stop")
+        when().post("http://" + ContainerUtil.getLaunchpadUrl(state.containerEntry.container) + "/stop")
                 .then()
                 .statusCode(200);
     }

     @when("the connection is lost for {int}s")
     public void the_connection_is_lost_for(int seconds) {
-        when().post("http://" + ContainerUtil.getLaunchpadUrl(container) + "/restart?seconds={seconds}", seconds)
+        when().post(
+                        "http://" + ContainerUtil.getLaunchpadUrl(state.containerEntry.container)
+                                + "/restart?seconds={seconds}",
+                        seconds)
                 .then()
                 .statusCode(200);
     }

     @when("the flag was modified")
     public void the_flag_was_modded() {
-        when().post("http://" + ContainerUtil.getLaunchpadUrl(container) + "/change")
+        when().post("http://" + ContainerUtil.getLaunchpadUrl(state.containerEntry.container) + "/change")
                 .then()
                 .statusCode(200);
     }
diff --git c/providers/flagd/test-harness i/providers/flagd/test-harness
index ff2fbe6c..f2782788 160000
--- c/providers/flagd/test-harness
+++ i/providers/flagd/test-harness
@@ -1 +1 @@
-Subproject commit ff2fbe6c6584953cb2753ae9188d1cee14f7f57f
+Subproject commit f2782788e72633e447b024548cd8a2cbf0c2a026
Enable cucumber.execution.parallel.enabled=true with fixed parallelism
matching the container pool size (2).

Correctness safeguards:
- @env-var scenarios serialised behind an ENV_VARS exclusive resource
  lock (requires @env-var tag in test-harness, see companion PR)
- @Grace scenarios serialised behind a CONTAINER_RESTART lock to avoid
  reconnection timeouts under parallel container restarts
- ConfigCucumberTest disables parallelism entirely (env-var mutations
  in <0.4s suite — no benefit, avoids races)
- EventSteps: drain-based event matching replaces clear() to prevent
  stale events from satisfying later assertions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Simon Schrottner <simon.schrottner@dynatrace.com>
aepfli and others added 3 commits April 7, 2026 14:27
Switch Cucumber plugin from 'pretty' (prints every step) to 'summary'
(only prints failures and a final count). Keeps CI logs readable.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Simon Schrottner <simon.schrottner@dynatrace.com>
Switch Cucumber strategy from 'fixed' to 'dynamic' (factor=1.0, i.e.
one thread per available processor). ContainerPool default pool size
also scales with availableProcessors() so pool slots match thread count.

Both are still overridable:
  -Dflagd.e2e.pool.size=N
  -Dcucumber.execution.parallel.config.dynamic.factor=N

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Simon Schrottner <simon.schrottner@dynatrace.com>
Default pool size was Runtime.availableProcessors() which on large machines
(22 CPUs) spawned too many simultaneous Docker Compose stacks and caused
ContainerLaunchException. Cap at min(availableProcessors, 4).

Cucumber threads still scale with CPUs (dynamic factor=1) — extra threads
simply block waiting for a free container, which is safe.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Simon Schrottner <simon.schrottner@dynatrace.com>
@aepfli aepfli force-pushed the feat/speed-up-flagd-e2e-tests branch from d532aaa to 5cec71f Compare April 7, 2026 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants