Skip to content

feat: bid precheck#386

Open
troian wants to merge 17 commits into
mainfrom
precheck
Open

feat: bid precheck#386
troian wants to merge 17 commits into
mainfrom
precheck

Conversation

@troian
Copy link
Copy Markdown
Member

@troian troian commented Apr 2, 2026

troian and others added 2 commits March 31, 2026 15:05
Signed-off-by: Artur Troian <troian@users.noreply.github.com>
@troian troian requested a review from a team as a code owner April 2, 2026 17:24
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 2, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a BidScreening flow: eligibility checks extraction, cluster dry-run reservation plumbing, pricing calculation, provider/gateway RPC wiring, regenerated mocks, and unit/integration tests implementing and exercising the new screening endpoint.

Changes

Cohort / File(s) Summary
Plans & Docs
AGENTS.md, .agents/plans/bid-screening.md
New agent operating rules and a BidScreening plan/spec guideline.
Bid Engine — Eligibility
bidengine/order.go, bidengine/order_test.go
Introduces CheckBidEligibility that accumulates human-readable decline reasons; refactors order.shouldBid to delegate and log reasons; unit tests cover multiple failure modes.
Bid Engine — Screening Service
bidengine/service.go, bidengine/service_test.go
Adds ScreenBid API and ScreenBidResult; implements flow: eligibility → cluster dry-run → pricing; tests cover pass/fail, pricing errors, and operational cancellation paths.
Cluster Inventory & Service
cluster/inventory.go, cluster/service.go, cluster/inventory_test.go
Adds dry-run reservation channel/handler and dryRunReservation type; validates CPU/GPU/Memory, leased-IP checks, calls Adjust(..., WithDryRun()); exposes Cluster.DryRunReserve and tests cancellation/insufficiency paths.
Provider Service
service.go, service_test.go
Extends provider Client with ScreenBid and implements mapping of ScreenBidResultBidScreeningResponse (including ResourceOffers and Price when passed); tests validate mappings and error handling.
gRPC Gateway
gateway/grpc/server.go, gateway/grpc/server_test.go
Server now depends on full provider.Client; BidScreening RPC performs hostname gating then delegates to client.ScreenBid; auth token extraction index adjusted; tests added for gating and forwarding.
REST Gateway
gateway/rest/router.go
Adds authenticated POST /v1/bid-screening handler that forwards to pclient.ScreenBid and maps errors to HTTP status codes.
Mocks
mocks/client/Client_mock.go, mocks/cluster/Cluster_mock.go, mocks/cluster/Service_mock.go
Regenerated mocks adding ScreenBid and DryRunReserve with testify expecters and typed call wrappers.
Integration Tests / Suite
integration/bidscreening_test.go, integration/e2e_test.go
Adds end-to-end BidScreening tests (single and concurrent), group-spec helpers, exposes gRPC host in suite and waits for gRPC gateway.
Misc / Build
go.mod, various test files
Module dependency bumps; many tests and mocks added/updated to compile with new interfaces; make bins expected to validate build.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Gateway as "gRPC Gateway"
    participant Provider as "Provider Service"
    participant BidEngine as "Bid Engine"
    participant Cluster as "Cluster"
    participant Inventory as "Inventory"
    participant Pricing as "Pricing Engine"

    Client->>Gateway: BidScreening(BidScreeningRequest)
    Gateway->>Gateway: Check hostname gating
    alt Hostname blocked
        Gateway-->>Client: BidScreeningResponse{Passed:false, Reasons}
    else Hostname OK
        Gateway->>Provider: ScreenBid(ctx, GroupSpec)
        Provider->>BidEngine: ScreenBid(ctx, GroupSpec)
        BidEngine->>BidEngine: CheckBidEligibility(gspec,...)
        alt Eligibility Fails
            BidEngine-->>Provider: ScreenBidResult{Passed:false, Reasons}
        else Eligibility Passes
            BidEngine->>Cluster: DryRunReserve(ResourceGroup)
            Cluster->>Inventory: dryRunReserve(resources)
            Inventory->>Inventory: validate resources, leased-IP checks
            alt Insufficient capacity
                Inventory-->>Cluster: error
                Cluster-->>BidEngine: error
                BidEngine-->>Provider: ScreenBidResult{Passed:false, Reasons}
            else Success
                Inventory-->>Cluster: dryRunReservation (allocated resources)
                Cluster-->>BidEngine: ReservationGroup
                BidEngine->>Pricing: CalculatePrice(Request)
                alt Pricing Fails
                    Pricing-->>BidEngine: error
                    BidEngine-->>Provider: ScreenBidResult{Passed:false, Reasons}
                else Pricing Succeeds
                    Pricing-->>BidEngine: Price
                    BidEngine-->>Provider: ScreenBidResult{Passed:true, Resources, Price}
                end
            end
        end
        Provider->>Provider: Convert Resources -> ResourceOffer
        Provider-->>Gateway: BidScreeningResponse
        Gateway-->>Client: BidScreeningResponse
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through code with eager paws,
eligibility gathered all the flaws.
Dry-run danced without a bind,
priced and mocked, well defined. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.84% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat: bid precheck' is clear and directly describes the main feature being added: bid pre-checking/screening functionality.
Description check ✅ Passed The description 'fixes akash-network/support#453' is related to the changeset and properly references the linked issue.
Linked Issues check ✅ Passed The PR implements the BidScreening endpoint across the provider stack (gRPC gateway, REST router, service layer, bid engine, cluster inventory) as required by issue #453, meeting the objective to expose the BidScreening API per proto specification.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing the BidScreening functionality. Updates to agent rules (AGENTS.md), go.mod dependencies, mocks, and tests all support the core feature implementation without introducing unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch precheck

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (3)
service.go (1)

241-249: Consider a defensive nil-result guard before dereferencing.

This prevents a panic if a future bidengine.Service implementation returns (nil, nil).

Suggested fix
 	result, err := s.bidengine.ScreenBid(ctx, req.GroupSpec)
 	if err != nil {
 		return nil, err
 	}
+	if result == nil {
+		return nil, fmt.Errorf("screen bid returned nil result")
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@service.go` around lines 241 - 249, Add a defensive nil-result guard after
calling s.bidengine.ScreenBid: check whether result is nil before accessing
result.Passed or result.Reasons (the call site is s.bidengine.ScreenBid and the
variables are result and provider.BidScreeningResponse); if result is nil return
a meaningful error (or wrap/translate to the existing error type used by this
service) instead of dereferencing to avoid a panic when an implementation
returns (nil, nil).
cluster/inventory.go (1)

267-278: Consider extracting shared resource validation to one helper.

The CPU/GPU/Memory nil checks now exist in both reserve() and dryRunReserve(). A shared validator will reduce drift.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cluster/inventory.go` around lines 267 - 278, The CPU/GPU/Memory nil-checks
are duplicated between dryRunReserve and reserve; extract a helper function
(e.g., validateResourceUnit or validateResourceGroup) that accepts a
dtypes.ResourceGroup or a single resource unit and performs the nil checks and
returns a typed error (using ErrInvalidResource and index info) and then call
that helper from both dryRunReserve and reserve to remove duplicated logic and
keep validation centralized; update dryRunReserve to call the new helper for
each unit (and similarly update reserve to use the same helper).
service_test.go (1)

76-165: Add one test for the “missing group spec” path.

service.ScreenBid explicitly handles missing spec; covering that branch will prevent regressions in API behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@service_test.go` around lines 76 - 165, Add a test named
Test_ServiceScreenBid_MissingGroupSpec that calls service.ScreenBid with a
provider.BidScreeningRequest whose GroupSpec is nil; construct svc :=
&service{bidengine: &fakeBidEngine{}} (or zero value) and assert that the call
returns an error and a nil response (e.g., require.Error(t, err) and
require.Nil(t, resp)) to cover the missing-group-spec branch in
service.ScreenBid and ensure the fakeBidEngine is not required to produce a
result.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/plans/bid-screening.md:
- Line 7: Add a language tag to the fenced code block so markdownlint rule MD040
is satisfied: edit the block that contains "gRPC Server (gateway/grpc/server.go)
→ Provider Service (service.go) → ..." and change the opening fence from ``` to
```text (i.e., add the language identifier "text" immediately after the three
backticks) so the block is explicitly labeled for the linter.

In `@AGENTS.md`:
- Line 35: The README line claiming "make bins — the compiled binary is placed
in `.cache/bin/`" is inaccurate; either update AGENTS.md to state the real
output directory produced by the current make rules for the "bins" target (as
defined by the make targets in make/releasing.mk and make/init.mk) or modify the
make rules so the "bins" target emits artifacts into `.cache/bin/`; locate the
"make bins" target references in the docs and in make/releasing.mk and
make/init.mk, then either change the doc string to the actual output path used
by those Makefiles or change the Makefile variables/targets to create
`.cache/bin/` to match the docs.

In `@bidengine/order_test.go`:
- Around line 955-957: The loop over reasons (for _, r := range reasons) can
panic when checking r[:5] for strings shorter than 5; replace that risky slice
check with a safe prefix test such as using len(r) >= 5 before slicing or,
better, strings.HasPrefix(r, "group") to set foundVolumes — update the loop
where foundVolumes is set to use the safe check.
- Around line 846-848: The code incorrectly uses the embedded field selector
`.Resources.` (e.g., gspec.Resources[0].Resources.GPU) even though the inner
Resources struct is embedded and its fields are promoted; replace usages like
gspec.Resources[0].Resources.GPU and gspec.Resources[0].Resources.Storage with
the promoted form gspec.Resources[0].GPU and gspec.Resources[0].Storage (also
update the other occurrences noted around lines 863 and 938), preserving the
same assignments (e.g., rtypes.NewResourceValue(...) and
attrtypes.Attributes{...}) so only the redundant `.Resources.` token is removed.

In `@bidengine/service.go`:
- Around line 228-241: ScreenBid currently returns Passed=true if
s.cfg.PricingStrategy.CalculatePrice succeeds, but the live bidding path (see
bidengine/order.go price/denom checks around group.GroupSpec.Price()) would
still decline that bid; update the ScreenBid implementation to also compare the
calculated price against group.GroupSpec.Price() and the currency/denom logic
used in the live path and set Passed=false with an explanatory reason if it
would be rejected, keeping the returned ScreenBidResult shape
(AllocatedResources, ClusterParams, Price); then add regression tests in
bidengine/service_test.go covering both branches (one where CalculatePrice
succeeds but price fails the group/denom check and one where it passes) to
ensure behavior matches live bidding.

In `@cluster/inventory.go`:
- Around line 267-293: dryRunReserve currently sends req on the unbuffered
channel is.dryRunReservech and can block until run() starts consuming after the
initial snapshot, causing BidScreening to hang; modify dryRunReserve (and other
callers at the other noted sites) to avoid blocking by performing a non-blocking
send or by short-circuiting when the inventory service is not yet initialized:
check the service state or a "initialized" flag (or use a select with a default)
before sending to is.dryRunReservech and return a deterministic error (e.g.,
ErrNotRunning or a new ErrNotInitialized) if the service hasn't started, or
change is.dryRunReservech to a buffered channel created early in run
initialization so sends never block before run begins; update references to
dryRunReserve, is.dryRunReservech, and run accordingly.

---

Nitpick comments:
In `@cluster/inventory.go`:
- Around line 267-278: The CPU/GPU/Memory nil-checks are duplicated between
dryRunReserve and reserve; extract a helper function (e.g., validateResourceUnit
or validateResourceGroup) that accepts a dtypes.ResourceGroup or a single
resource unit and performs the nil checks and returns a typed error (using
ErrInvalidResource and index info) and then call that helper from both
dryRunReserve and reserve to remove duplicated logic and keep validation
centralized; update dryRunReserve to call the new helper for each unit (and
similarly update reserve to use the same helper).

In `@service_test.go`:
- Around line 76-165: Add a test named Test_ServiceScreenBid_MissingGroupSpec
that calls service.ScreenBid with a provider.BidScreeningRequest whose GroupSpec
is nil; construct svc := &service{bidengine: &fakeBidEngine{}} (or zero value)
and assert that the call returns an error and a nil response (e.g.,
require.Error(t, err) and require.Nil(t, resp)) to cover the missing-group-spec
branch in service.ScreenBid and ensure the fakeBidEngine is not required to
produce a result.

In `@service.go`:
- Around line 241-249: Add a defensive nil-result guard after calling
s.bidengine.ScreenBid: check whether result is nil before accessing
result.Passed or result.Reasons (the call site is s.bidengine.ScreenBid and the
variables are result and provider.BidScreeningResponse); if result is nil return
a meaningful error (or wrap/translate to the existing error type used by this
service) instead of dereferencing to avoid a panic when an implementation
returns (nil, nil).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3d4be66b-17ef-4d6b-994b-a5958ba9d00c

📥 Commits

Reviewing files that changed from the base of the PR and between 0094f0c and 6a618e6.

📒 Files selected for processing (15)
  • .agents/plans/bid-screening.md
  • AGENTS.md
  • bidengine/order.go
  • bidengine/order_test.go
  • bidengine/service.go
  • bidengine/service_test.go
  • cluster/inventory.go
  • cluster/service.go
  • gateway/grpc/server.go
  • gateway/grpc/server_test.go
  • mocks/client/Client_mock.go
  • mocks/cluster/Cluster_mock.go
  • mocks/cluster/Service_mock.go
  • service.go
  • service_test.go

Comment thread .agents/plans/bid-screening.md
Comment thread AGENTS.md
- **Config management**: Viper (reading, writing, env binding, flag binding, live-reload)
- **Global variables**: Default to no package-level `var` usage. If you believe a global is necessary, you MUST ask for direction first (you may offer alternatives or suggestions). Only proceed after explicit approval and, if needed, a spec update.
- **`init`** - every `init` usage must be explicitly approved
- **Build**: `make bins` — the compiled binary is placed in `.cache/bin/`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Incorrect make bins output path in docs.

Line 35 says binaries are placed in .cache/bin/, but current build rules don’t define that output path (make/releasing.mk:43-52, make/init.mk:59, make/init.mk:71-73). Please update this line to match actual behavior (or update Make rules to enforce .cache/bin/).

Suggested doc fix
-- **Build**: `make bins` — the compiled binary is placed in `.cache/bin/`
+- **Build**: `make bins` — binaries are produced by the repository Make targets; verify output location from current Make rules.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- **Build**: `make bins` — the compiled binary is placed in `.cache/bin/`
- **Build**: `make bins`binaries are produced by the repository Make targets; verify output location from current Make rules.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` at line 35, The README line claiming "make bins — the compiled
binary is placed in `.cache/bin/`" is inaccurate; either update AGENTS.md to
state the real output directory produced by the current make rules for the
"bins" target (as defined by the make targets in make/releasing.mk and
make/init.mk) or modify the make rules so the "bins" target emits artifacts into
`.cache/bin/`; locate the "make bins" target references in the docs and in
make/releasing.mk and make/init.mk, then either change the doc string to the
actual output path used by those Makefiles or change the Makefile
variables/targets to create `.cache/bin/` to match the docs.

Comment thread bidengine/order_test.go
Comment thread bidengine/order_test.go
Comment thread bidengine/service.go
Comment thread cluster/inventory.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
integration/bidscreening_test.go (2)

34-40: Don't skip certificate verification in the new gRPC E2E.

With InsecureSkipVerify, this suite still passes if the provider serves the wrong cert or SAN, so it doesn't really validate the TLS wiring behind the new endpoint. SetupSuite already copies the provider PEM in integration/e2e_test.go, Lines 345-359, so you can trust that cert instead of disabling verification.

💡 Suggested change
 import (
 	"context"
 	"crypto/tls"
+	"crypto/x509"
 	"fmt"
+	"os"
+	"strings"
 	"sync"
 	"time"
@@
 func (s *E2EBidScreening) dialGRPC() (*grpc.ClientConn, providerv1.ProviderRPCClient) {
+	pemPath := fmt.Sprintf("%s/%s.pem", strings.Replace(s.cctx.HomeDir, "simd", "simcli", 1), s.addrProvider.String())
+	pemBytes, err := os.ReadFile(pemPath)
+	s.Require().NoError(err)
+
+	roots := x509.NewCertPool()
+	s.Require().True(roots.AppendCertsFromPEM(pemBytes), "failed to load provider certificate")
+
 	tlsCfg := &tls.Config{
-		InsecureSkipVerify: true,
-		MinVersion:         tls.VersionTLS13,
+		RootCAs:    roots,
+		ServerName: "localhost",
+		MinVersion: tls.VersionTLS13,
 	}
 
 	conn, err := grpc.DialContext(s.ctx, s.grpcHost,
 		grpc.WithTransportCredentials(credentials.NewTLS(tlsCfg)),
 	)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration/bidscreening_test.go` around lines 34 - 40, The test disables
certificate verification by setting InsecureSkipVerify: true in tlsCfg used by
grpc.DialContext, which prevents real TLS validation; instead load the
provider's PEM into a x509.CertPool (the same cert copied in SetupSuite) and set
tlsCfg.RootCAs to that pool and remove InsecureSkipVerify, then use
credentials.NewTLS(tlsCfg) when dialing so the gRPC connection validates the
provider cert/SAN correctly (update the tlsCfg construction and the
grpc.DialContext call accordingly).

191-225: Assert cleanup after the concurrent burst.

This only proves the 10 in-flight BidScreening calls can all succeed. It will miss leaked dry-run reservations that remain after the burst and poison the next screening, which is the cleanup path this test is really trying to protect.

💡 Suggested change
 	wg.Wait()
 	close(errs)
 
 	for err := range errs {
 		s.Require().NoError(err)
 	}
+
+	tctx, cancel := context.WithTimeout(s.ctx, 10*time.Second)
+	defer cancel()
+
+	resp, err := client.BidScreening(tctx, &providerv1.BidScreeningRequest{
+		GroupSpec: makeGroupSpec("post-concurrent", 100, 0, 16*mi, 128*mi, 1, 1000),
+	})
+	s.Require().NoError(err)
+	s.Require().True(resp.Passed, "screening should still pass after concurrent dry-runs complete")
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration/bidscreening_test.go` around lines 191 - 225, The test currently
only verifies the concurrent burst succeeds but doesn't assert that any dry-run
reservations were cleaned up; after the wg.Wait()/close(errs) block (i.e., at
the end of TestBidScreeningConcurrent, after the existing success checks), add
an explicit cleanup assertion: perform one (or a few) additional BidScreening
calls (using client.BidScreening with a small deterministic GroupSpec created by
makeGroupSpec) and require that they pass and/or return no leaked reservations
(e.g., resp.Passed == true and resp.Reasons empty), failing the test if any
residual dry-run state remains; ensure this follow-up call uses the same context
pattern (context.WithTimeout) and reports errors via s.Require().NoError /
s.Require().True checks so the test fails on leaked reservations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@integration/bidscreening_test.go`:
- Around line 34-40: The test disables certificate verification by setting
InsecureSkipVerify: true in tlsCfg used by grpc.DialContext, which prevents real
TLS validation; instead load the provider's PEM into a x509.CertPool (the same
cert copied in SetupSuite) and set tlsCfg.RootCAs to that pool and remove
InsecureSkipVerify, then use credentials.NewTLS(tlsCfg) when dialing so the gRPC
connection validates the provider cert/SAN correctly (update the tlsCfg
construction and the grpc.DialContext call accordingly).
- Around line 191-225: The test currently only verifies the concurrent burst
succeeds but doesn't assert that any dry-run reservations were cleaned up; after
the wg.Wait()/close(errs) block (i.e., at the end of TestBidScreeningConcurrent,
after the existing success checks), add an explicit cleanup assertion: perform
one (or a few) additional BidScreening calls (using client.BidScreening with a
small deterministic GroupSpec created by makeGroupSpec) and require that they
pass and/or return no leaked reservations (e.g., resp.Passed == true and
resp.Reasons empty), failing the test if any residual dry-run state remains;
ensure this follow-up call uses the same context pattern (context.WithTimeout)
and reports errors via s.Require().NoError / s.Require().True checks so the test
fails on leaked reservations.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 12626ec5-4117-43d9-a96d-c0adb1d8df3a

📥 Commits

Reviewing files that changed from the base of the PR and between 6a618e6 and 8b1f6d0.

📒 Files selected for processing (2)
  • integration/bidscreening_test.go
  • integration/e2e_test.go

@Zblocker64 Zblocker64 added this to the provider 0.12.0 milestone Apr 6, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
cluster/inventory.go (1)

286-292: ⚠️ Potential issue | 🟠 Major

Return a deterministic error before inventory is initialized.

dryRunReserve() still sends on an unbuffered is.dryRunReservech, while the run loop does not consume from dryRunCh until the first inventory snapshot arrives. An early BidScreening call can therefore hang until its context expires instead of failing fast with errInventoryNotAvailableYet.

💡 Suggested fix
-	var dryRunCh <-chan dryRunReserveRequest
+	var dryRunCh <-chan dryRunReserveRequest = is.dryRunReservech
 func (is *inventoryService) handleDryRunRequest(req dryRunReserveRequest, state *inventoryServiceState) {
+	if state.inventory == nil {
+		req.ch <- dryRunReserveResponse{err: errInventoryNotAvailableYet}
+		return
+	}
+
 	resourcesToCommit := is.resourcesToCommit(req.resources)
 	rg := &dryRunReservation{resources: resourcesToCommit}
-			// enable dry-run reserve processing now that inventory is available
-			dryRunCh = is.dryRunReservech

Also applies to: 572-573, 735-736

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cluster/inventory.go` around lines 286 - 292, Before attempting to send on
the unbuffered channel in dryRunReserve(), check whether the inventory has been
initialized and return errInventoryNotAvailableYet if it has not; specifically,
add a pre-check (e.g. if is.inventorySnapshot == nil or a helper
is.inventoryReady() returns false) and return errInventoryNotAvailableYet
instead of blocking on is.dryRunReservech. Apply the same pattern to the other
occurrences mentioned (the similar select sites around the other dryRun/dryRunCh
sends) so calls fail fast rather than hanging until context timeout.
bidengine/service.go (1)

228-241: ⚠️ Potential issue | 🟠 Major

Apply the group's max-price / denom gate before returning success.

This still treats any successful CalculatePrice as a pass. If the calculated coin uses a different denom or exceeds gspec.Price(), BidScreening reports Passed=true even though the normal bid path will refuse the order.

💡 Suggested fix
 	price, err := s.cfg.PricingStrategy.CalculatePrice(ctx, priceReq)
 	if err != nil {
 		return &ScreenBidResult{
 			Passed:  false,
 			Reasons: []string{fmt.Sprintf("pricing calculation failed: %v", err)},
 		}, nil
 	}
+
+	maxPrice := gspec.Price()
+	if maxPrice.GetDenom() != price.GetDenom() {
+		return &ScreenBidResult{
+			Passed:  false,
+			Reasons: []string{fmt.Sprintf("unsupported denomination: calculated %s, max price %s", price.String(), maxPrice.String())},
+		}, nil
+	}
+
+	if maxPrice.IsLT(price) {
+		return &ScreenBidResult{
+			Passed:  false,
+			Reasons: []string{fmt.Sprintf("price too high: calculated %s, max price %s", price.String(), maxPrice.String())},
+		}, nil
+	}

 	return &ScreenBidResult{
 		Passed:             true,
 		AllocatedResources: rg.GetAllocatedResources(),
 		ClusterParams:      rg.ClusterParams(),
 		Price:              price,
 	}, nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bidengine/service.go` around lines 228 - 241, After a successful call to
s.cfg.PricingStrategy.CalculatePrice(ctx, priceReq) validate the returned price
against the group's max-price/denom gate before returning a passing
ScreenBidResult: ensure the price denom matches gspec.Price().Denom and the
price does not exceed gspec.Price() (use the group spec object referenced as
gspec or equivalent), and if either check fails return Passed:false with a clear
Reasons entry (e.g., denom mismatch or exceeds group max price); keep the rest
of the success response (AllocatedResources, ClusterParams, Price) only when
these checks pass.
🧹 Nitpick comments (1)
integration/e2e_test.go (1)

432-434: Cross-file constant dependency works but creates coupling.

testBlockedHostname is defined in bidscreening_test.go and used here in the shared setup. While this compiles correctly (same package, same build tag), it creates a coupling where the setup file depends on a test-specific file.

Consider moving the constant to e2e_test.go for clearer dependency direction, or adding a brief comment noting where it's defined.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration/e2e_test.go` around lines 432 - 434, The shared setup in
e2e_test.go references the constant testBlockedHostname (used in the WithFlag
call for pcmd.FlagDeploymentBlockedHostnames), creating an undesirable
cross-test-file dependency; either move the testBlockedHostname constant into
e2e_test.go (so the setup owns its data) or add a brief comment above its use
pointing to its definition in bidscreening_test.go and why it’s intentionally
shared, then update any imports/visibility if needed so the constant is
colocated or clearly documented.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bidengine/service.go`:
- Around line 213-219: The current error branch after calling
s.cluster.DryRunReserve collapses all errors into a screening rejection; instead
detect operational cancellation/shutdown errors and propagate them. Update the
error handling for DryRunReserve: if errors.Is(err, context.Canceled) ||
errors.Is(err, context.DeadlineExceeded) or the cluster shutdown sentinel (e.g.
a cluster/engine-specific ErrShutdown) is true, return nil, err to surface the
request/server lifecycle failure; otherwise keep the existing
ScreenBidResult{Passed:false, Reasons:[]string{fmt.Sprintf("insufficient
capacity: %v", err)}} behavior. Use errors.Is and/or a small helper like
isShutdownError(err) to centralize the shutdown sentinel check and reference
DryRunReserve and ScreenBidResult when locating the code.

In `@cluster/inventory.go`:
- Around line 304-315: handleDryRunRequest currently skips the leased-IP
preflight checks and can approve groups that reserve() would reject; update
handleDryRunRequest to perform the same leased-IP screening as
handleRequest/reserve by checking is.clients.ip, state.ipAddrUsage and
countPendingIPs(state) for the requested resources before calling
resourcesToCommit/dryRunReservation and state.inventory.Adjust, and return the
same error values (errNoLeasedIPsAvailable or errInsufficientIPs) via req.ch
when those checks fail so dry-run responses mirror real reserve() behavior.

---

Duplicate comments:
In `@bidengine/service.go`:
- Around line 228-241: After a successful call to
s.cfg.PricingStrategy.CalculatePrice(ctx, priceReq) validate the returned price
against the group's max-price/denom gate before returning a passing
ScreenBidResult: ensure the price denom matches gspec.Price().Denom and the
price does not exceed gspec.Price() (use the group spec object referenced as
gspec or equivalent), and if either check fails return Passed:false with a clear
Reasons entry (e.g., denom mismatch or exceeds group max price); keep the rest
of the success response (AllocatedResources, ClusterParams, Price) only when
these checks pass.

In `@cluster/inventory.go`:
- Around line 286-292: Before attempting to send on the unbuffered channel in
dryRunReserve(), check whether the inventory has been initialized and return
errInventoryNotAvailableYet if it has not; specifically, add a pre-check (e.g.
if is.inventorySnapshot == nil or a helper is.inventoryReady() returns false)
and return errInventoryNotAvailableYet instead of blocking on
is.dryRunReservech. Apply the same pattern to the other occurrences mentioned
(the similar select sites around the other dryRun/dryRunCh sends) so calls fail
fast rather than hanging until context timeout.

---

Nitpick comments:
In `@integration/e2e_test.go`:
- Around line 432-434: The shared setup in e2e_test.go references the constant
testBlockedHostname (used in the WithFlag call for
pcmd.FlagDeploymentBlockedHostnames), creating an undesirable cross-test-file
dependency; either move the testBlockedHostname constant into e2e_test.go (so
the setup owns its data) or add a brief comment above its use pointing to its
definition in bidscreening_test.go and why it’s intentionally shared, then
update any imports/visibility if needed so the constant is colocated or clearly
documented.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 25ca40c5-62df-4582-a271-ab679375453a

📥 Commits

Reviewing files that changed from the base of the PR and between 8b1f6d0 and 2599a75.

📒 Files selected for processing (12)
  • bidengine/service.go
  • bidengine/service_test.go
  • cluster/inventory.go
  • cluster/inventory_test.go
  • cluster/service.go
  • gateway/grpc/server.go
  • gateway/grpc/server_test.go
  • integration/bidscreening_test.go
  • integration/e2e_test.go
  • mocks/client/Client_mock.go
  • mocks/cluster/Cluster_mock.go
  • mocks/cluster/Service_mock.go
✅ Files skipped from review due to trivial changes (1)
  • bidengine/service_test.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • cluster/service.go
  • mocks/cluster/Cluster_mock.go
  • gateway/grpc/server_test.go
  • mocks/client/Client_mock.go
  • integration/bidscreening_test.go

Comment thread bidengine/service.go
Comment thread cluster/inventory.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
cluster/inventory.go (1)

287-293: ⚠️ Potential issue | 🟠 Major

Dry-run precheck can still block before inventory initialization.

dryRunReserve sends on is.dryRunReservech (Line 288), but run() doesn’t consume until dryRunCh is enabled after inventory snapshot (Lines 749-750). Before that point, requests can hang until caller context expires instead of returning a deterministic initialization error.

💡 Suggested fix
-	var dryRunCh <-chan dryRunReserveRequest
+	var dryRunCh <-chan dryRunReserveRequest = is.dryRunReservech
 func (is *inventoryService) handleDryRunRequest(req dryRunReserveRequest, state *inventoryServiceState) {
+	if state.inventory == nil {
+		req.ch <- dryRunReserveResponse{err: errInventoryNotAvailableYet}
+		return
+	}
+
 	resourcesToCommit := is.resourcesToCommit(req.resources)
-			// enable dry-run reserve processing now that inventory is available
-			dryRunCh = is.dryRunReservech

Also applies to: 586-587, 749-750

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cluster/inventory.go` around lines 287 - 293, The select that sends to
is.dryRunReservech can block because run() only starts consuming from dryRunCh
after the inventory snapshot; before initialization we should fail fast instead
of hanging. Modify the caller code that does the send to first check the
inventory initialization state (or a dedicated flag) and return
ErrNotRunning/ErrNotInitialized if not ready, or change the select to a
non-blocking attempt that falls through to return a deterministic error;
reference is.dryRunReservech, dryRunCh, run(), and is.lc.ShuttingDown() and
ensure the same fix is applied at the other occurrences (the blocks around lines
noted).
bidengine/service.go (1)

231-244: ⚠️ Potential issue | 🟠 Major

ScreenBid can still pass bids the live bidder would reject.

After pricing succeeds (Lines 231-237), the code returns Passed: true directly (Lines 239-244) without validating denom/max price against gspec.Price(). This is still inconsistent with the live bidding checks and can produce false-positive precheck results.

💡 Suggested fix
 	price, err := s.cfg.PricingStrategy.CalculatePrice(ctx, priceReq)
 	if err != nil {
 		return &ScreenBidResult{
 			Passed:  false,
 			Reasons: []string{fmt.Sprintf("pricing calculation failed: %v", err)},
 		}, nil
 	}
+
+	maxPrice := gspec.Price()
+	if maxPrice.GetDenom() != price.GetDenom() {
+		return &ScreenBidResult{
+			Passed:  false,
+			Reasons: []string{fmt.Sprintf("unsupported denomination: calculated %s, max price %s", price.String(), maxPrice.String())},
+		}, nil
+	}
+	if maxPrice.IsLT(price) {
+		return &ScreenBidResult{
+			Passed:  false,
+			Reasons: []string{fmt.Sprintf("price too high: calculated %s, max price %s", price.String(), maxPrice.String())},
+		}, nil
+	}

 	return &ScreenBidResult{
 		Passed:             true,
 		AllocatedResources: rg.GetAllocatedResources(),
 		ClusterParams:      rg.ClusterParams(),
 		Price:              price,
 	}, nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bidengine/service.go` around lines 231 - 244, After calling
s.cfg.PricingStrategy.CalculatePrice(ctx, priceReq) and getting a price,
validate that the returned price's denomination and value fall within the
allowed/given gspec.Price() constraints (same checks the live bidder uses)
before constructing the ScreenBidResult; if the denom or max price are out of
bounds, return Passed: false with an explanatory reason string (instead of
returning Passed: true). Update the code in the block after
PricingStrategy.CalculatePrice (where price, err is handled) to perform these
checks against gspec.Price() and only set Passed: true and include
AllocatedResources/ClusterParams/Price when the price passes those validations.
Ensure you reference s.cfg.PricingStrategy.CalculatePrice, priceReq, the
returned price, ScreenBidResult, rg.GetAllocatedResources, rg.ClusterParams, and
gspec.Price() when implementing the fix.
🧹 Nitpick comments (1)
service_test.go (1)

76-214: Add a direct test for nil GroupSpec contract.

Current suite covers mapping well, but it doesn’t assert the explicit validation path in service.go (Line 234) that returns Passed=false and "missing group spec" without error. A small test here would prevent regressions in that API contract.

🧪 Proposed test addition
+func Test_ServiceScreenBid_MissingGroupSpec(t *testing.T) {
+	svc := &service{bidengine: &fakeBidEngine{}}
+
+	resp, err := svc.ScreenBid(context.Background(), &provider.BidScreeningRequest{})
+	require.NoError(t, err)
+	require.NotNil(t, resp)
+	require.False(t, resp.Passed)
+	require.Equal(t, []string{"missing group spec"}, resp.Reasons)
+	require.Nil(t, resp.ResourceOffer)
+	require.Nil(t, resp.Price)
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@service_test.go` around lines 76 - 214, Add a test that calls
service.ScreenBid with a nil GroupSpec in provider.BidScreeningRequest and
asserts the contract: no returned error, non-nil response, resp.Passed is false,
resp.Reasons contains the exact "missing group spec" message, and resp.Price and
resp.ResourceOffer are nil; place the test alongside the other
Test_ServiceScreenBid_* cases and reference the service struct and
service.ScreenBid to locate where the behavior is implemented.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@bidengine/service.go`:
- Around line 231-244: After calling s.cfg.PricingStrategy.CalculatePrice(ctx,
priceReq) and getting a price, validate that the returned price's denomination
and value fall within the allowed/given gspec.Price() constraints (same checks
the live bidder uses) before constructing the ScreenBidResult; if the denom or
max price are out of bounds, return Passed: false with an explanatory reason
string (instead of returning Passed: true). Update the code in the block after
PricingStrategy.CalculatePrice (where price, err is handled) to perform these
checks against gspec.Price() and only set Passed: true and include
AllocatedResources/ClusterParams/Price when the price passes those validations.
Ensure you reference s.cfg.PricingStrategy.CalculatePrice, priceReq, the
returned price, ScreenBidResult, rg.GetAllocatedResources, rg.ClusterParams, and
gspec.Price() when implementing the fix.

In `@cluster/inventory.go`:
- Around line 287-293: The select that sends to is.dryRunReservech can block
because run() only starts consuming from dryRunCh after the inventory snapshot;
before initialization we should fail fast instead of hanging. Modify the caller
code that does the send to first check the inventory initialization state (or a
dedicated flag) and return ErrNotRunning/ErrNotInitialized if not ready, or
change the select to a non-blocking attempt that falls through to return a
deterministic error; reference is.dryRunReservech, dryRunCh, run(), and
is.lc.ShuttingDown() and ensure the same fix is applied at the other occurrences
(the blocks around lines noted).

---

Nitpick comments:
In `@service_test.go`:
- Around line 76-214: Add a test that calls service.ScreenBid with a nil
GroupSpec in provider.BidScreeningRequest and asserts the contract: no returned
error, non-nil response, resp.Passed is false, resp.Reasons contains the exact
"missing group spec" message, and resp.Price and resp.ResourceOffer are nil;
place the test alongside the other Test_ServiceScreenBid_* cases and reference
the service struct and service.ScreenBid to locate where the behavior is
implemented.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e7760187-1680-4c82-a5d0-c1660a99b8b3

📥 Commits

Reviewing files that changed from the base of the PR and between 2599a75 and 2c31226.

📒 Files selected for processing (6)
  • bidengine/service.go
  • bidengine/service_test.go
  • cluster/inventory.go
  • cluster/inventory_test.go
  • service.go
  • service_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • cluster/inventory_test.go

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
gateway/rest/router.go (1)

444-458: Missing req.Body.Close() in deferred cleanup.

Other handlers in this file (e.g., createManifestHandler at line 483-485) explicitly close the request body. While Go's HTTP server typically handles this, explicit cleanup is a good defensive practice for consistency.

♻️ Suggested fix
 func createBidScreeningHandler(log log.Logger, pclient provider.Client) http.HandlerFunc {
 	return func(w http.ResponseWriter, req *http.Request) {
+		defer func() {
+			_ = req.Body.Close()
+		}()
 		var screeningReq providerv1.BidScreeningRequest
 		if err := json.NewDecoder(req.Body).Decode(&screeningReq); err != nil {
 			http.Error(w, err.Error(), http.StatusBadRequest)
 			return
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gateway/rest/router.go` around lines 444 - 458, The createBidScreeningHandler
is missing explicit request body cleanup; add a deferred close (defer
req.Body.Close()) at the start of the returned handler function (immediately
after entering the handler and before json.NewDecoder(...).Decode) so the body
is always closed on all return paths (matching the pattern used by
createManifestHandler) to avoid resource leaks when using pclient.ScreenBid and
writeJSON.
integration/bidscreening_test.go (1)

260-271: Use strings.Contains from standard library instead of custom implementation.

The containsSubstring and findSubstring functions reimplement strings.Contains. Using the standard library is simpler and less error-prone.

♻️ Proposed fix
+import "strings"
+
-func containsSubstring(s, substr string) bool {
-	return len(s) >= len(substr) && (s == substr || len(s) > 0 && findSubstring(s, substr))
-}
-
-func findSubstring(s, substr string) bool {
-	for i := 0; i <= len(s)-len(substr); i++ {
-		if s[i:i+len(substr)] == substr {
-			return true
-		}
-	}
-	return false
-}

Then update line 199:

-					if containsSubstring(r, tt.reason) {
+					if strings.Contains(r, tt.reason) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration/bidscreening_test.go` around lines 260 - 271, Replace the custom
containsSubstring and findSubstring implementations with the standard library
strings.Contains: remove both functions (containsSubstring, findSubstring),
import "strings" and update any callers (e.g., where containsSubstring(...) is
used) to call strings.Contains(s, substr) instead so you rely on the built-in,
tested implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@gateway/rest/router.go`:
- Around line 444-458: The REST handler createBidScreeningHandler is missing the
hostname availability validation performed in the gRPC path; before calling
pclient.ScreenBid, call the provider hostname availability check using the
client's Hostname().CanReserveHostnames with the decoded screeningReq.Hostnames
(propagate req.Context()), handle any error the same way as other errors (use
httperror.StatusCodeFrom to set the response code and return), and only then
call pclient.ScreenBid and writeJSON the response.

---

Nitpick comments:
In `@gateway/rest/router.go`:
- Around line 444-458: The createBidScreeningHandler is missing explicit request
body cleanup; add a deferred close (defer req.Body.Close()) at the start of the
returned handler function (immediately after entering the handler and before
json.NewDecoder(...).Decode) so the body is always closed on all return paths
(matching the pattern used by createManifestHandler) to avoid resource leaks
when using pclient.ScreenBid and writeJSON.

In `@integration/bidscreening_test.go`:
- Around line 260-271: Replace the custom containsSubstring and findSubstring
implementations with the standard library strings.Contains: remove both
functions (containsSubstring, findSubstring), import "strings" and update any
callers (e.g., where containsSubstring(...) is used) to call strings.Contains(s,
substr) instead so you rely on the built-in, tested implementation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8960a559-4bc9-4490-837f-821621395c3b

📥 Commits

Reviewing files that changed from the base of the PR and between 2c31226 and 32b95ee.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (6)
  • gateway/grpc/server_test.go
  • gateway/rest/router.go
  • go.mod
  • integration/bidscreening_test.go
  • service.go
  • service_test.go
✅ Files skipped from review due to trivial changes (2)
  • go.mod
  • service_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • service.go

Comment thread gateway/rest/router.go
@cloud-j-luna cloud-j-luna removed this from the provider 0.12.0 milestone Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement BidScreening

4 participants