Conversation
Signed-off-by: Artur Troian <troian@users.noreply.github.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a BidScreening flow: eligibility checks extraction, cluster dry-run reservation plumbing, pricing calculation, provider/gateway RPC wiring, regenerated mocks, and unit/integration tests implementing and exercising the new screening endpoint. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Gateway as "gRPC Gateway"
participant Provider as "Provider Service"
participant BidEngine as "Bid Engine"
participant Cluster as "Cluster"
participant Inventory as "Inventory"
participant Pricing as "Pricing Engine"
Client->>Gateway: BidScreening(BidScreeningRequest)
Gateway->>Gateway: Check hostname gating
alt Hostname blocked
Gateway-->>Client: BidScreeningResponse{Passed:false, Reasons}
else Hostname OK
Gateway->>Provider: ScreenBid(ctx, GroupSpec)
Provider->>BidEngine: ScreenBid(ctx, GroupSpec)
BidEngine->>BidEngine: CheckBidEligibility(gspec,...)
alt Eligibility Fails
BidEngine-->>Provider: ScreenBidResult{Passed:false, Reasons}
else Eligibility Passes
BidEngine->>Cluster: DryRunReserve(ResourceGroup)
Cluster->>Inventory: dryRunReserve(resources)
Inventory->>Inventory: validate resources, leased-IP checks
alt Insufficient capacity
Inventory-->>Cluster: error
Cluster-->>BidEngine: error
BidEngine-->>Provider: ScreenBidResult{Passed:false, Reasons}
else Success
Inventory-->>Cluster: dryRunReservation (allocated resources)
Cluster-->>BidEngine: ReservationGroup
BidEngine->>Pricing: CalculatePrice(Request)
alt Pricing Fails
Pricing-->>BidEngine: error
BidEngine-->>Provider: ScreenBidResult{Passed:false, Reasons}
else Pricing Succeeds
Pricing-->>BidEngine: Price
BidEngine-->>Provider: ScreenBidResult{Passed:true, Resources, Price}
end
end
end
Provider->>Provider: Convert Resources -> ResourceOffer
Provider-->>Gateway: BidScreeningResponse
Gateway-->>Client: BidScreeningResponse
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (3)
service.go (1)
241-249: Consider a defensive nil-result guard before dereferencing.This prevents a panic if a future
bidengine.Serviceimplementation returns(nil, nil).Suggested fix
result, err := s.bidengine.ScreenBid(ctx, req.GroupSpec) if err != nil { return nil, err } + if result == nil { + return nil, fmt.Errorf("screen bid returned nil result") + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@service.go` around lines 241 - 249, Add a defensive nil-result guard after calling s.bidengine.ScreenBid: check whether result is nil before accessing result.Passed or result.Reasons (the call site is s.bidengine.ScreenBid and the variables are result and provider.BidScreeningResponse); if result is nil return a meaningful error (or wrap/translate to the existing error type used by this service) instead of dereferencing to avoid a panic when an implementation returns (nil, nil).cluster/inventory.go (1)
267-278: Consider extracting shared resource validation to one helper.The CPU/GPU/Memory nil checks now exist in both
reserve()anddryRunReserve(). A shared validator will reduce drift.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cluster/inventory.go` around lines 267 - 278, The CPU/GPU/Memory nil-checks are duplicated between dryRunReserve and reserve; extract a helper function (e.g., validateResourceUnit or validateResourceGroup) that accepts a dtypes.ResourceGroup or a single resource unit and performs the nil checks and returns a typed error (using ErrInvalidResource and index info) and then call that helper from both dryRunReserve and reserve to remove duplicated logic and keep validation centralized; update dryRunReserve to call the new helper for each unit (and similarly update reserve to use the same helper).service_test.go (1)
76-165: Add one test for the “missing group spec” path.
service.ScreenBidexplicitly handles missing spec; covering that branch will prevent regressions in API behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@service_test.go` around lines 76 - 165, Add a test named Test_ServiceScreenBid_MissingGroupSpec that calls service.ScreenBid with a provider.BidScreeningRequest whose GroupSpec is nil; construct svc := &service{bidengine: &fakeBidEngine{}} (or zero value) and assert that the call returns an error and a nil response (e.g., require.Error(t, err) and require.Nil(t, resp)) to cover the missing-group-spec branch in service.ScreenBid and ensure the fakeBidEngine is not required to produce a result.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.agents/plans/bid-screening.md:
- Line 7: Add a language tag to the fenced code block so markdownlint rule MD040
is satisfied: edit the block that contains "gRPC Server (gateway/grpc/server.go)
→ Provider Service (service.go) → ..." and change the opening fence from ``` to
```text (i.e., add the language identifier "text" immediately after the three
backticks) so the block is explicitly labeled for the linter.
In `@AGENTS.md`:
- Line 35: The README line claiming "make bins — the compiled binary is placed
in `.cache/bin/`" is inaccurate; either update AGENTS.md to state the real
output directory produced by the current make rules for the "bins" target (as
defined by the make targets in make/releasing.mk and make/init.mk) or modify the
make rules so the "bins" target emits artifacts into `.cache/bin/`; locate the
"make bins" target references in the docs and in make/releasing.mk and
make/init.mk, then either change the doc string to the actual output path used
by those Makefiles or change the Makefile variables/targets to create
`.cache/bin/` to match the docs.
In `@bidengine/order_test.go`:
- Around line 955-957: The loop over reasons (for _, r := range reasons) can
panic when checking r[:5] for strings shorter than 5; replace that risky slice
check with a safe prefix test such as using len(r) >= 5 before slicing or,
better, strings.HasPrefix(r, "group") to set foundVolumes — update the loop
where foundVolumes is set to use the safe check.
- Around line 846-848: The code incorrectly uses the embedded field selector
`.Resources.` (e.g., gspec.Resources[0].Resources.GPU) even though the inner
Resources struct is embedded and its fields are promoted; replace usages like
gspec.Resources[0].Resources.GPU and gspec.Resources[0].Resources.Storage with
the promoted form gspec.Resources[0].GPU and gspec.Resources[0].Storage (also
update the other occurrences noted around lines 863 and 938), preserving the
same assignments (e.g., rtypes.NewResourceValue(...) and
attrtypes.Attributes{...}) so only the redundant `.Resources.` token is removed.
In `@bidengine/service.go`:
- Around line 228-241: ScreenBid currently returns Passed=true if
s.cfg.PricingStrategy.CalculatePrice succeeds, but the live bidding path (see
bidengine/order.go price/denom checks around group.GroupSpec.Price()) would
still decline that bid; update the ScreenBid implementation to also compare the
calculated price against group.GroupSpec.Price() and the currency/denom logic
used in the live path and set Passed=false with an explanatory reason if it
would be rejected, keeping the returned ScreenBidResult shape
(AllocatedResources, ClusterParams, Price); then add regression tests in
bidengine/service_test.go covering both branches (one where CalculatePrice
succeeds but price fails the group/denom check and one where it passes) to
ensure behavior matches live bidding.
In `@cluster/inventory.go`:
- Around line 267-293: dryRunReserve currently sends req on the unbuffered
channel is.dryRunReservech and can block until run() starts consuming after the
initial snapshot, causing BidScreening to hang; modify dryRunReserve (and other
callers at the other noted sites) to avoid blocking by performing a non-blocking
send or by short-circuiting when the inventory service is not yet initialized:
check the service state or a "initialized" flag (or use a select with a default)
before sending to is.dryRunReservech and return a deterministic error (e.g.,
ErrNotRunning or a new ErrNotInitialized) if the service hasn't started, or
change is.dryRunReservech to a buffered channel created early in run
initialization so sends never block before run begins; update references to
dryRunReserve, is.dryRunReservech, and run accordingly.
---
Nitpick comments:
In `@cluster/inventory.go`:
- Around line 267-278: The CPU/GPU/Memory nil-checks are duplicated between
dryRunReserve and reserve; extract a helper function (e.g., validateResourceUnit
or validateResourceGroup) that accepts a dtypes.ResourceGroup or a single
resource unit and performs the nil checks and returns a typed error (using
ErrInvalidResource and index info) and then call that helper from both
dryRunReserve and reserve to remove duplicated logic and keep validation
centralized; update dryRunReserve to call the new helper for each unit (and
similarly update reserve to use the same helper).
In `@service_test.go`:
- Around line 76-165: Add a test named Test_ServiceScreenBid_MissingGroupSpec
that calls service.ScreenBid with a provider.BidScreeningRequest whose GroupSpec
is nil; construct svc := &service{bidengine: &fakeBidEngine{}} (or zero value)
and assert that the call returns an error and a nil response (e.g.,
require.Error(t, err) and require.Nil(t, resp)) to cover the missing-group-spec
branch in service.ScreenBid and ensure the fakeBidEngine is not required to
produce a result.
In `@service.go`:
- Around line 241-249: Add a defensive nil-result guard after calling
s.bidengine.ScreenBid: check whether result is nil before accessing
result.Passed or result.Reasons (the call site is s.bidengine.ScreenBid and the
variables are result and provider.BidScreeningResponse); if result is nil return
a meaningful error (or wrap/translate to the existing error type used by this
service) instead of dereferencing to avoid a panic when an implementation
returns (nil, nil).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3d4be66b-17ef-4d6b-994b-a5958ba9d00c
📒 Files selected for processing (15)
.agents/plans/bid-screening.mdAGENTS.mdbidengine/order.gobidengine/order_test.gobidengine/service.gobidengine/service_test.gocluster/inventory.gocluster/service.gogateway/grpc/server.gogateway/grpc/server_test.gomocks/client/Client_mock.gomocks/cluster/Cluster_mock.gomocks/cluster/Service_mock.goservice.goservice_test.go
| - **Config management**: Viper (reading, writing, env binding, flag binding, live-reload) | ||
| - **Global variables**: Default to no package-level `var` usage. If you believe a global is necessary, you MUST ask for direction first (you may offer alternatives or suggestions). Only proceed after explicit approval and, if needed, a spec update. | ||
| - **`init`** - every `init` usage must be explicitly approved | ||
| - **Build**: `make bins` — the compiled binary is placed in `.cache/bin/` |
There was a problem hiding this comment.
Incorrect make bins output path in docs.
Line 35 says binaries are placed in .cache/bin/, but current build rules don’t define that output path (make/releasing.mk:43-52, make/init.mk:59, make/init.mk:71-73). Please update this line to match actual behavior (or update Make rules to enforce .cache/bin/).
Suggested doc fix
-- **Build**: `make bins` — the compiled binary is placed in `.cache/bin/`
+- **Build**: `make bins` — binaries are produced by the repository Make targets; verify output location from current Make rules.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - **Build**: `make bins` — the compiled binary is placed in `.cache/bin/` | |
| - **Build**: `make bins` — binaries are produced by the repository Make targets; verify output location from current Make rules. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@AGENTS.md` at line 35, The README line claiming "make bins — the compiled
binary is placed in `.cache/bin/`" is inaccurate; either update AGENTS.md to
state the real output directory produced by the current make rules for the
"bins" target (as defined by the make targets in make/releasing.mk and
make/init.mk) or modify the make rules so the "bins" target emits artifacts into
`.cache/bin/`; locate the "make bins" target references in the docs and in
make/releasing.mk and make/init.mk, then either change the doc string to the
actual output path used by those Makefiles or change the Makefile
variables/targets to create `.cache/bin/` to match the docs.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
integration/bidscreening_test.go (2)
34-40: Don't skip certificate verification in the new gRPC E2E.With
InsecureSkipVerify, this suite still passes if the provider serves the wrong cert or SAN, so it doesn't really validate the TLS wiring behind the new endpoint.SetupSuitealready copies the provider PEM inintegration/e2e_test.go, Lines 345-359, so you can trust that cert instead of disabling verification.💡 Suggested change
import ( "context" "crypto/tls" + "crypto/x509" "fmt" + "os" + "strings" "sync" "time" @@ func (s *E2EBidScreening) dialGRPC() (*grpc.ClientConn, providerv1.ProviderRPCClient) { + pemPath := fmt.Sprintf("%s/%s.pem", strings.Replace(s.cctx.HomeDir, "simd", "simcli", 1), s.addrProvider.String()) + pemBytes, err := os.ReadFile(pemPath) + s.Require().NoError(err) + + roots := x509.NewCertPool() + s.Require().True(roots.AppendCertsFromPEM(pemBytes), "failed to load provider certificate") + tlsCfg := &tls.Config{ - InsecureSkipVerify: true, - MinVersion: tls.VersionTLS13, + RootCAs: roots, + ServerName: "localhost", + MinVersion: tls.VersionTLS13, } conn, err := grpc.DialContext(s.ctx, s.grpcHost, grpc.WithTransportCredentials(credentials.NewTLS(tlsCfg)), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integration/bidscreening_test.go` around lines 34 - 40, The test disables certificate verification by setting InsecureSkipVerify: true in tlsCfg used by grpc.DialContext, which prevents real TLS validation; instead load the provider's PEM into a x509.CertPool (the same cert copied in SetupSuite) and set tlsCfg.RootCAs to that pool and remove InsecureSkipVerify, then use credentials.NewTLS(tlsCfg) when dialing so the gRPC connection validates the provider cert/SAN correctly (update the tlsCfg construction and the grpc.DialContext call accordingly).
191-225: Assert cleanup after the concurrent burst.This only proves the 10 in-flight
BidScreeningcalls can all succeed. It will miss leaked dry-run reservations that remain after the burst and poison the next screening, which is the cleanup path this test is really trying to protect.💡 Suggested change
wg.Wait() close(errs) for err := range errs { s.Require().NoError(err) } + + tctx, cancel := context.WithTimeout(s.ctx, 10*time.Second) + defer cancel() + + resp, err := client.BidScreening(tctx, &providerv1.BidScreeningRequest{ + GroupSpec: makeGroupSpec("post-concurrent", 100, 0, 16*mi, 128*mi, 1, 1000), + }) + s.Require().NoError(err) + s.Require().True(resp.Passed, "screening should still pass after concurrent dry-runs complete") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integration/bidscreening_test.go` around lines 191 - 225, The test currently only verifies the concurrent burst succeeds but doesn't assert that any dry-run reservations were cleaned up; after the wg.Wait()/close(errs) block (i.e., at the end of TestBidScreeningConcurrent, after the existing success checks), add an explicit cleanup assertion: perform one (or a few) additional BidScreening calls (using client.BidScreening with a small deterministic GroupSpec created by makeGroupSpec) and require that they pass and/or return no leaked reservations (e.g., resp.Passed == true and resp.Reasons empty), failing the test if any residual dry-run state remains; ensure this follow-up call uses the same context pattern (context.WithTimeout) and reports errors via s.Require().NoError / s.Require().True checks so the test fails on leaked reservations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@integration/bidscreening_test.go`:
- Around line 34-40: The test disables certificate verification by setting
InsecureSkipVerify: true in tlsCfg used by grpc.DialContext, which prevents real
TLS validation; instead load the provider's PEM into a x509.CertPool (the same
cert copied in SetupSuite) and set tlsCfg.RootCAs to that pool and remove
InsecureSkipVerify, then use credentials.NewTLS(tlsCfg) when dialing so the gRPC
connection validates the provider cert/SAN correctly (update the tlsCfg
construction and the grpc.DialContext call accordingly).
- Around line 191-225: The test currently only verifies the concurrent burst
succeeds but doesn't assert that any dry-run reservations were cleaned up; after
the wg.Wait()/close(errs) block (i.e., at the end of TestBidScreeningConcurrent,
after the existing success checks), add an explicit cleanup assertion: perform
one (or a few) additional BidScreening calls (using client.BidScreening with a
small deterministic GroupSpec created by makeGroupSpec) and require that they
pass and/or return no leaked reservations (e.g., resp.Passed == true and
resp.Reasons empty), failing the test if any residual dry-run state remains;
ensure this follow-up call uses the same context pattern (context.WithTimeout)
and reports errors via s.Require().NoError / s.Require().True checks so the test
fails on leaked reservations.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 12626ec5-4117-43d9-a96d-c0adb1d8df3a
📒 Files selected for processing (2)
integration/bidscreening_test.gointegration/e2e_test.go
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
cluster/inventory.go (1)
286-292:⚠️ Potential issue | 🟠 MajorReturn a deterministic error before inventory is initialized.
dryRunReserve()still sends on an unbufferedis.dryRunReservech, while the run loop does not consume fromdryRunChuntil the first inventory snapshot arrives. An earlyBidScreeningcall can therefore hang until its context expires instead of failing fast witherrInventoryNotAvailableYet.💡 Suggested fix
- var dryRunCh <-chan dryRunReserveRequest + var dryRunCh <-chan dryRunReserveRequest = is.dryRunReservechfunc (is *inventoryService) handleDryRunRequest(req dryRunReserveRequest, state *inventoryServiceState) { + if state.inventory == nil { + req.ch <- dryRunReserveResponse{err: errInventoryNotAvailableYet} + return + } + resourcesToCommit := is.resourcesToCommit(req.resources) rg := &dryRunReservation{resources: resourcesToCommit}- // enable dry-run reserve processing now that inventory is available - dryRunCh = is.dryRunReservechAlso applies to: 572-573, 735-736
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cluster/inventory.go` around lines 286 - 292, Before attempting to send on the unbuffered channel in dryRunReserve(), check whether the inventory has been initialized and return errInventoryNotAvailableYet if it has not; specifically, add a pre-check (e.g. if is.inventorySnapshot == nil or a helper is.inventoryReady() returns false) and return errInventoryNotAvailableYet instead of blocking on is.dryRunReservech. Apply the same pattern to the other occurrences mentioned (the similar select sites around the other dryRun/dryRunCh sends) so calls fail fast rather than hanging until context timeout.bidengine/service.go (1)
228-241:⚠️ Potential issue | 🟠 MajorApply the group's max-price / denom gate before returning success.
This still treats any successful
CalculatePriceas a pass. If the calculated coin uses a different denom or exceedsgspec.Price(),BidScreeningreportsPassed=trueeven though the normal bid path will refuse the order.💡 Suggested fix
price, err := s.cfg.PricingStrategy.CalculatePrice(ctx, priceReq) if err != nil { return &ScreenBidResult{ Passed: false, Reasons: []string{fmt.Sprintf("pricing calculation failed: %v", err)}, }, nil } + + maxPrice := gspec.Price() + if maxPrice.GetDenom() != price.GetDenom() { + return &ScreenBidResult{ + Passed: false, + Reasons: []string{fmt.Sprintf("unsupported denomination: calculated %s, max price %s", price.String(), maxPrice.String())}, + }, nil + } + + if maxPrice.IsLT(price) { + return &ScreenBidResult{ + Passed: false, + Reasons: []string{fmt.Sprintf("price too high: calculated %s, max price %s", price.String(), maxPrice.String())}, + }, nil + } return &ScreenBidResult{ Passed: true, AllocatedResources: rg.GetAllocatedResources(), ClusterParams: rg.ClusterParams(), Price: price, }, nil🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bidengine/service.go` around lines 228 - 241, After a successful call to s.cfg.PricingStrategy.CalculatePrice(ctx, priceReq) validate the returned price against the group's max-price/denom gate before returning a passing ScreenBidResult: ensure the price denom matches gspec.Price().Denom and the price does not exceed gspec.Price() (use the group spec object referenced as gspec or equivalent), and if either check fails return Passed:false with a clear Reasons entry (e.g., denom mismatch or exceeds group max price); keep the rest of the success response (AllocatedResources, ClusterParams, Price) only when these checks pass.
🧹 Nitpick comments (1)
integration/e2e_test.go (1)
432-434: Cross-file constant dependency works but creates coupling.
testBlockedHostnameis defined inbidscreening_test.goand used here in the shared setup. While this compiles correctly (same package, same build tag), it creates a coupling where the setup file depends on a test-specific file.Consider moving the constant to
e2e_test.gofor clearer dependency direction, or adding a brief comment noting where it's defined.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integration/e2e_test.go` around lines 432 - 434, The shared setup in e2e_test.go references the constant testBlockedHostname (used in the WithFlag call for pcmd.FlagDeploymentBlockedHostnames), creating an undesirable cross-test-file dependency; either move the testBlockedHostname constant into e2e_test.go (so the setup owns its data) or add a brief comment above its use pointing to its definition in bidscreening_test.go and why it’s intentionally shared, then update any imports/visibility if needed so the constant is colocated or clearly documented.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bidengine/service.go`:
- Around line 213-219: The current error branch after calling
s.cluster.DryRunReserve collapses all errors into a screening rejection; instead
detect operational cancellation/shutdown errors and propagate them. Update the
error handling for DryRunReserve: if errors.Is(err, context.Canceled) ||
errors.Is(err, context.DeadlineExceeded) or the cluster shutdown sentinel (e.g.
a cluster/engine-specific ErrShutdown) is true, return nil, err to surface the
request/server lifecycle failure; otherwise keep the existing
ScreenBidResult{Passed:false, Reasons:[]string{fmt.Sprintf("insufficient
capacity: %v", err)}} behavior. Use errors.Is and/or a small helper like
isShutdownError(err) to centralize the shutdown sentinel check and reference
DryRunReserve and ScreenBidResult when locating the code.
In `@cluster/inventory.go`:
- Around line 304-315: handleDryRunRequest currently skips the leased-IP
preflight checks and can approve groups that reserve() would reject; update
handleDryRunRequest to perform the same leased-IP screening as
handleRequest/reserve by checking is.clients.ip, state.ipAddrUsage and
countPendingIPs(state) for the requested resources before calling
resourcesToCommit/dryRunReservation and state.inventory.Adjust, and return the
same error values (errNoLeasedIPsAvailable or errInsufficientIPs) via req.ch
when those checks fail so dry-run responses mirror real reserve() behavior.
---
Duplicate comments:
In `@bidengine/service.go`:
- Around line 228-241: After a successful call to
s.cfg.PricingStrategy.CalculatePrice(ctx, priceReq) validate the returned price
against the group's max-price/denom gate before returning a passing
ScreenBidResult: ensure the price denom matches gspec.Price().Denom and the
price does not exceed gspec.Price() (use the group spec object referenced as
gspec or equivalent), and if either check fails return Passed:false with a clear
Reasons entry (e.g., denom mismatch or exceeds group max price); keep the rest
of the success response (AllocatedResources, ClusterParams, Price) only when
these checks pass.
In `@cluster/inventory.go`:
- Around line 286-292: Before attempting to send on the unbuffered channel in
dryRunReserve(), check whether the inventory has been initialized and return
errInventoryNotAvailableYet if it has not; specifically, add a pre-check (e.g.
if is.inventorySnapshot == nil or a helper is.inventoryReady() returns false)
and return errInventoryNotAvailableYet instead of blocking on
is.dryRunReservech. Apply the same pattern to the other occurrences mentioned
(the similar select sites around the other dryRun/dryRunCh sends) so calls fail
fast rather than hanging until context timeout.
---
Nitpick comments:
In `@integration/e2e_test.go`:
- Around line 432-434: The shared setup in e2e_test.go references the constant
testBlockedHostname (used in the WithFlag call for
pcmd.FlagDeploymentBlockedHostnames), creating an undesirable cross-test-file
dependency; either move the testBlockedHostname constant into e2e_test.go (so
the setup owns its data) or add a brief comment above its use pointing to its
definition in bidscreening_test.go and why it’s intentionally shared, then
update any imports/visibility if needed so the constant is colocated or clearly
documented.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 25ca40c5-62df-4582-a271-ab679375453a
📒 Files selected for processing (12)
bidengine/service.gobidengine/service_test.gocluster/inventory.gocluster/inventory_test.gocluster/service.gogateway/grpc/server.gogateway/grpc/server_test.gointegration/bidscreening_test.gointegration/e2e_test.gomocks/client/Client_mock.gomocks/cluster/Cluster_mock.gomocks/cluster/Service_mock.go
✅ Files skipped from review due to trivial changes (1)
- bidengine/service_test.go
🚧 Files skipped from review as they are similar to previous changes (5)
- cluster/service.go
- mocks/cluster/Cluster_mock.go
- gateway/grpc/server_test.go
- mocks/client/Client_mock.go
- integration/bidscreening_test.go
There was a problem hiding this comment.
♻️ Duplicate comments (2)
cluster/inventory.go (1)
287-293:⚠️ Potential issue | 🟠 MajorDry-run precheck can still block before inventory initialization.
dryRunReservesends onis.dryRunReservech(Line 288), butrun()doesn’t consume untildryRunChis enabled after inventory snapshot (Lines 749-750). Before that point, requests can hang until caller context expires instead of returning a deterministic initialization error.💡 Suggested fix
- var dryRunCh <-chan dryRunReserveRequest + var dryRunCh <-chan dryRunReserveRequest = is.dryRunReservechfunc (is *inventoryService) handleDryRunRequest(req dryRunReserveRequest, state *inventoryServiceState) { + if state.inventory == nil { + req.ch <- dryRunReserveResponse{err: errInventoryNotAvailableYet} + return + } + resourcesToCommit := is.resourcesToCommit(req.resources)- // enable dry-run reserve processing now that inventory is available - dryRunCh = is.dryRunReservechAlso applies to: 586-587, 749-750
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cluster/inventory.go` around lines 287 - 293, The select that sends to is.dryRunReservech can block because run() only starts consuming from dryRunCh after the inventory snapshot; before initialization we should fail fast instead of hanging. Modify the caller code that does the send to first check the inventory initialization state (or a dedicated flag) and return ErrNotRunning/ErrNotInitialized if not ready, or change the select to a non-blocking attempt that falls through to return a deterministic error; reference is.dryRunReservech, dryRunCh, run(), and is.lc.ShuttingDown() and ensure the same fix is applied at the other occurrences (the blocks around lines noted).bidengine/service.go (1)
231-244:⚠️ Potential issue | 🟠 MajorScreenBid can still pass bids the live bidder would reject.
After pricing succeeds (Lines 231-237), the code returns
Passed: truedirectly (Lines 239-244) without validating denom/max price againstgspec.Price(). This is still inconsistent with the live bidding checks and can produce false-positive precheck results.💡 Suggested fix
price, err := s.cfg.PricingStrategy.CalculatePrice(ctx, priceReq) if err != nil { return &ScreenBidResult{ Passed: false, Reasons: []string{fmt.Sprintf("pricing calculation failed: %v", err)}, }, nil } + + maxPrice := gspec.Price() + if maxPrice.GetDenom() != price.GetDenom() { + return &ScreenBidResult{ + Passed: false, + Reasons: []string{fmt.Sprintf("unsupported denomination: calculated %s, max price %s", price.String(), maxPrice.String())}, + }, nil + } + if maxPrice.IsLT(price) { + return &ScreenBidResult{ + Passed: false, + Reasons: []string{fmt.Sprintf("price too high: calculated %s, max price %s", price.String(), maxPrice.String())}, + }, nil + } return &ScreenBidResult{ Passed: true, AllocatedResources: rg.GetAllocatedResources(), ClusterParams: rg.ClusterParams(), Price: price, }, nil🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bidengine/service.go` around lines 231 - 244, After calling s.cfg.PricingStrategy.CalculatePrice(ctx, priceReq) and getting a price, validate that the returned price's denomination and value fall within the allowed/given gspec.Price() constraints (same checks the live bidder uses) before constructing the ScreenBidResult; if the denom or max price are out of bounds, return Passed: false with an explanatory reason string (instead of returning Passed: true). Update the code in the block after PricingStrategy.CalculatePrice (where price, err is handled) to perform these checks against gspec.Price() and only set Passed: true and include AllocatedResources/ClusterParams/Price when the price passes those validations. Ensure you reference s.cfg.PricingStrategy.CalculatePrice, priceReq, the returned price, ScreenBidResult, rg.GetAllocatedResources, rg.ClusterParams, and gspec.Price() when implementing the fix.
🧹 Nitpick comments (1)
service_test.go (1)
76-214: Add a direct test for nilGroupSpeccontract.Current suite covers mapping well, but it doesn’t assert the explicit validation path in
service.go(Line 234) that returnsPassed=falseand"missing group spec"without error. A small test here would prevent regressions in that API contract.🧪 Proposed test addition
+func Test_ServiceScreenBid_MissingGroupSpec(t *testing.T) { + svc := &service{bidengine: &fakeBidEngine{}} + + resp, err := svc.ScreenBid(context.Background(), &provider.BidScreeningRequest{}) + require.NoError(t, err) + require.NotNil(t, resp) + require.False(t, resp.Passed) + require.Equal(t, []string{"missing group spec"}, resp.Reasons) + require.Nil(t, resp.ResourceOffer) + require.Nil(t, resp.Price) +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@service_test.go` around lines 76 - 214, Add a test that calls service.ScreenBid with a nil GroupSpec in provider.BidScreeningRequest and asserts the contract: no returned error, non-nil response, resp.Passed is false, resp.Reasons contains the exact "missing group spec" message, and resp.Price and resp.ResourceOffer are nil; place the test alongside the other Test_ServiceScreenBid_* cases and reference the service struct and service.ScreenBid to locate where the behavior is implemented.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@bidengine/service.go`:
- Around line 231-244: After calling s.cfg.PricingStrategy.CalculatePrice(ctx,
priceReq) and getting a price, validate that the returned price's denomination
and value fall within the allowed/given gspec.Price() constraints (same checks
the live bidder uses) before constructing the ScreenBidResult; if the denom or
max price are out of bounds, return Passed: false with an explanatory reason
string (instead of returning Passed: true). Update the code in the block after
PricingStrategy.CalculatePrice (where price, err is handled) to perform these
checks against gspec.Price() and only set Passed: true and include
AllocatedResources/ClusterParams/Price when the price passes those validations.
Ensure you reference s.cfg.PricingStrategy.CalculatePrice, priceReq, the
returned price, ScreenBidResult, rg.GetAllocatedResources, rg.ClusterParams, and
gspec.Price() when implementing the fix.
In `@cluster/inventory.go`:
- Around line 287-293: The select that sends to is.dryRunReservech can block
because run() only starts consuming from dryRunCh after the inventory snapshot;
before initialization we should fail fast instead of hanging. Modify the caller
code that does the send to first check the inventory initialization state (or a
dedicated flag) and return ErrNotRunning/ErrNotInitialized if not ready, or
change the select to a non-blocking attempt that falls through to return a
deterministic error; reference is.dryRunReservech, dryRunCh, run(), and
is.lc.ShuttingDown() and ensure the same fix is applied at the other occurrences
(the blocks around lines noted).
---
Nitpick comments:
In `@service_test.go`:
- Around line 76-214: Add a test that calls service.ScreenBid with a nil
GroupSpec in provider.BidScreeningRequest and asserts the contract: no returned
error, non-nil response, resp.Passed is false, resp.Reasons contains the exact
"missing group spec" message, and resp.Price and resp.ResourceOffer are nil;
place the test alongside the other Test_ServiceScreenBid_* cases and reference
the service struct and service.ScreenBid to locate where the behavior is
implemented.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e7760187-1680-4c82-a5d0-c1660a99b8b3
📒 Files selected for processing (6)
bidengine/service.gobidengine/service_test.gocluster/inventory.gocluster/inventory_test.goservice.goservice_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- cluster/inventory_test.go
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
gateway/rest/router.go (1)
444-458: Missingreq.Body.Close()in deferred cleanup.Other handlers in this file (e.g.,
createManifestHandlerat line 483-485) explicitly close the request body. While Go's HTTP server typically handles this, explicit cleanup is a good defensive practice for consistency.♻️ Suggested fix
func createBidScreeningHandler(log log.Logger, pclient provider.Client) http.HandlerFunc { return func(w http.ResponseWriter, req *http.Request) { + defer func() { + _ = req.Body.Close() + }() var screeningReq providerv1.BidScreeningRequest if err := json.NewDecoder(req.Body).Decode(&screeningReq); err != nil { http.Error(w, err.Error(), http.StatusBadRequest) return }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gateway/rest/router.go` around lines 444 - 458, The createBidScreeningHandler is missing explicit request body cleanup; add a deferred close (defer req.Body.Close()) at the start of the returned handler function (immediately after entering the handler and before json.NewDecoder(...).Decode) so the body is always closed on all return paths (matching the pattern used by createManifestHandler) to avoid resource leaks when using pclient.ScreenBid and writeJSON.integration/bidscreening_test.go (1)
260-271: Usestrings.Containsfrom standard library instead of custom implementation.The
containsSubstringandfindSubstringfunctions reimplementstrings.Contains. Using the standard library is simpler and less error-prone.♻️ Proposed fix
+import "strings" + -func containsSubstring(s, substr string) bool { - return len(s) >= len(substr) && (s == substr || len(s) > 0 && findSubstring(s, substr)) -} - -func findSubstring(s, substr string) bool { - for i := 0; i <= len(s)-len(substr); i++ { - if s[i:i+len(substr)] == substr { - return true - } - } - return false -}Then update line 199:
- if containsSubstring(r, tt.reason) { + if strings.Contains(r, tt.reason) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integration/bidscreening_test.go` around lines 260 - 271, Replace the custom containsSubstring and findSubstring implementations with the standard library strings.Contains: remove both functions (containsSubstring, findSubstring), import "strings" and update any callers (e.g., where containsSubstring(...) is used) to call strings.Contains(s, substr) instead so you rely on the built-in, tested implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@gateway/rest/router.go`:
- Around line 444-458: The REST handler createBidScreeningHandler is missing the
hostname availability validation performed in the gRPC path; before calling
pclient.ScreenBid, call the provider hostname availability check using the
client's Hostname().CanReserveHostnames with the decoded screeningReq.Hostnames
(propagate req.Context()), handle any error the same way as other errors (use
httperror.StatusCodeFrom to set the response code and return), and only then
call pclient.ScreenBid and writeJSON the response.
---
Nitpick comments:
In `@gateway/rest/router.go`:
- Around line 444-458: The createBidScreeningHandler is missing explicit request
body cleanup; add a deferred close (defer req.Body.Close()) at the start of the
returned handler function (immediately after entering the handler and before
json.NewDecoder(...).Decode) so the body is always closed on all return paths
(matching the pattern used by createManifestHandler) to avoid resource leaks
when using pclient.ScreenBid and writeJSON.
In `@integration/bidscreening_test.go`:
- Around line 260-271: Replace the custom containsSubstring and findSubstring
implementations with the standard library strings.Contains: remove both
functions (containsSubstring, findSubstring), import "strings" and update any
callers (e.g., where containsSubstring(...) is used) to call strings.Contains(s,
substr) instead so you rely on the built-in, tested implementation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8960a559-4bc9-4490-837f-821621395c3b
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (6)
gateway/grpc/server_test.gogateway/rest/router.gogo.modintegration/bidscreening_test.goservice.goservice_test.go
✅ Files skipped from review due to trivial changes (2)
- go.mod
- service_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- service.go
fixes akash-network/support#453