Skip to content

fix: use WriteBatch in BadgerDB to prevent "Txn is too big" errors#2228

Open
nimrod-teich wants to merge 2 commits intomainfrom
fix/reward-db-txn-too-big
Open

fix: use WriteBatch in BadgerDB to prevent "Txn is too big" errors#2228
nimrod-teich wants to merge 2 commits intomainfrom
fix/reward-db-txn-too-big

Conversation

@nimrod-teich
Copy link
Copy Markdown
Contributor

Replace single-transaction db.Update() with BadgerDB's WriteBatch in saveAll(), which automatically splits large writes across multiple transactions. This fixes providers seeing repeated "Txn is too big to fit into one request" errors when serving high relay volumes.

Also fix a bug in RewardDB.BatchSave where successful chunked saves still returned the original error due to a missing else branch.

Description

Closes: #XXXX


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • read the contribution guide
  • included the correct type prefix in the PR title, you can find examples of the prefixes below:
  • confirmed ! in the type prefix if API or client breaking change
  • targeted the main branch
  • provided a link to the relevant issue or specification
  • reviewed "Files changed" and left comments if necessary
  • included the necessary unit and integration tests
  • updated the relevant documentation or specification, including comments for documenting Go code
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic, API design and naming, documentation is accurate, tests and test coverage

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 22, 2026

Codecov Report

❌ Patch coverage is 40.00000% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
protocol/rpcprovider/rewardserver/badger_db.go 57.14% 2 Missing and 1 partial ⚠️
protocol/rpcprovider/rewardserver/reward_db.go 0.00% 3 Missing ⚠️
Flag Coverage Δ
consensus 8.71% <ø> (ø)
protocol 34.63% <40.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
protocol/rpcprovider/rewardserver/badger_db.go 50.83% <57.14%> (+2.07%) ⬆️
protocol/rpcprovider/rewardserver/reward_db.go 63.88% <0.00%> (+0.44%) ⬆️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 22, 2026

Test Results

0 tests  ±0   0 ✅ ±0   0s ⏱️ ±0s
0 suites ±0   0 💤 ±0 
7 files   ±0   0 ❌ ±0 

Results for commit 9bbf789. ± Comparison against base commit 9861746.

♻️ This comment has been updated with latest results.

@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Fix BadgerDB transaction size limits with WriteBatch and improve error handling

🐞 Bug fix 🧪 Tests

Grey Divider

Walkthroughs

Description
• Replace single-transaction db.Update() with BadgerDB's WriteBatch in saveAll() to handle
  large writes
• Fix missing else branch in RewardDB.BatchSave that returned errors after successful chunked
  saves
• Add comprehensive tests for large transaction handling exceeding BadgerDB limits
• Simplify batch chunking logic using min() function
Diagram
flowchart LR
  A["Single Transaction<br/>db.Update()"] -->|"Replace with"| B["WriteBatch<br/>Auto-splits large writes"]
  C["Missing else branch<br/>returns error"] -->|"Fix"| D["Proper error handling<br/>after chunked saves"]
  B -->|"Tested by"| E["Large payload tests<br/>2000 entries x 10KB"]
  D -->|"Verified by"| E
Loading

Grey Divider

File Changes

1. protocol/rpcprovider/rewardserver/badger_db.go 🐞 Bug fix +8/-12

Use WriteBatch to handle large transactions

• Replace db.Update() transaction with WriteBatch that automatically splits large writes across
 multiple transactions
• Simplify error handling by calling wb.Cancel() on error and wb.Flush() to commit
• Remove nested transaction callback structure for cleaner code

protocol/rpcprovider/rewardserver/badger_db.go


2. protocol/rpcprovider/rewardserver/badger_db_test.go 🧪 Tests +55/-0

Add test for large transaction handling

• Add new test TestBadgerDB_BatchSaveLargeTransaction to verify handling of 2000 entries (~20MB
 total)
• Test exceeds BadgerDB's default ~9.6MB transaction limit to ensure WriteBatch splits correctly
• Verify all entries are persisted and retrievable after large batch save

protocol/rpcprovider/rewardserver/badger_db_test.go


3. protocol/rpcprovider/rewardserver/reward_db.go 🐞 Bug fix +3/-5

Fix error handling in chunked batch saves

• Fix bug where successful chunked saves still returned the original error due to missing else
 branch
• Simplify batch end calculation using min() function instead of conditional logic
• Ensure error is only returned when chunked save actually fails, not after successful retry

protocol/rpcprovider/rewardserver/reward_db.go


View more (1)
4. protocol/rpcprovider/rewardserver/reward_db_test.go 🧪 Tests +37/-0

Add test for large reward batch persistence

• Add new test TestSaveBatchLargePayload with 2000 entries using 10KB content hashes
• Test total batch size (~20MB) exceeds BadgerDB's transaction limit to verify chunking works
• Verify all entries are persisted correctly in a single epoch

protocol/rpcprovider/rewardserver/reward_db_test.go


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (4) 📎 Requirement gaps (0)

Grey Divider


Remediation recommended

1. saveAll() lacks wb.Cancel() 📘 Rule violation ⛯ Reliability
Description
saveAll() creates a Badger WriteBatch but doesn’t guarantee wb.Cancel() is called on all paths
(e.g., if wb.Flush() returns an error), which can leave resources/buffers unreleased and
complicate recovery. This misses robust failure-path handling for a critical write operation.
Code

protocol/rpcprovider/rewardserver/badger_db.go[R51-59]

+	wb := mdb.db.NewWriteBatch()
+	for key, data := range mdb.rewards {
+		e := badger.NewEntry([]byte(key), data.data).WithTTL(data.remainingTtl())
+		if err := wb.SetEntry(e); err != nil {
+			wb.Cancel()
+			return err
		}
-
-		return nil
-	})
-
-	return err
+	}
+	return wb.Flush()
Evidence
Compliance ID 3 requires handling failure points and edge cases; here the batch is only canceled on
SetEntry error, but not if Flush() fails, leaving an unhandled failure/cleanup path in newly
changed code.

Rule 3: Generic: Robust Error Handling and Edge Case Management
protocol/rpcprovider/rewardserver/badger_db.go[51-59]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`saveAll()` does not guarantee `wb.Cancel()` is executed when `wb.Flush()` fails, leaving incomplete cleanup on an error path.

## Issue Context
The PR replaced `db.Update()` with `NewWriteBatch()` to avoid &quot;Txn is too big&quot; errors. WriteBatch usage should include consistent cleanup to handle failure paths robustly.

## Fix Focus Areas
- protocol/rpcprovider/rewardserver/badger_db.go[51-59]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. saveAll() returns bare errors 📘 Rule violation ✧ Quality
Description
The new WriteBatch code returns underlying Badger errors directly without adding operation/key
context, reducing debuggability in production. This makes it harder to identify which key/operation
failed during large batch writes.
Code

protocol/rpcprovider/rewardserver/badger_db.go[R53-59]

+		e := badger.NewEntry([]byte(key), data.data).WithTTL(data.remainingTtl())
+		if err := wb.SetEntry(e); err != nil {
+			wb.Cancel()
+			return err
		}
-
-		return nil
-	})
-
-	return err
+	}
+	return wb.Flush()
Evidence
Compliance ID 3 requires error messages with actionable context; the changed code returns raw errors
from wb.SetEntry/wb.Flush without indicating the failing key or that the error occurred during
batch flush.

Rule 3: Generic: Robust Error Handling and Edge Case Management
protocol/rpcprovider/rewardserver/badger_db.go[53-59]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Errors from batch writes are returned without context, which makes diagnosing large-write failures difficult.

## Issue Context
During high relay volumes, batch size and write failures are likely; returning contextual errors helps identify failing operations/keys.

## Fix Focus Areas
- protocol/rpcprovider/rewardserver/badger_db.go[53-59]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Advisory comments

3. cpes variable name unclear 📘 Rule violation ✓ Correctness
Description
The new test uses the abbreviation cpes, which is not self-explanatory and reduces
readability/maintainability. This violates the requirement for meaningful naming and
self-documenting code.
Code

protocol/rpcprovider/rewardserver/reward_db_test.go[R203-216]

+	cpes := make([]*rewardserver.RewardEntity, numEntries)
+	for i := range numEntries {
+		proof := common.BuildRelayRequestWithSession(ctx, "providerAddr", largeContentHash, uint64(i+1), uint64(0), "specId", nil)
+		cpes[i] = &rewardserver.RewardEntity{
+			Epoch:        uint64(proof.Epoch),
+			ConsumerAddr: fmt.Sprintf("consumerAddr%d", i),
+			ConsumerKey:  fmt.Sprintf("consumerKey%d", i),
+			SessionId:    proof.SessionId,
+			Proof:        proof,
+		}
+	}
+
+	err = rs.BatchSave(cpes)
+	require.NoError(t, err)
Evidence
Compliance ID 2 requires identifiers to clearly express intent; cpes is an unclear abbreviation in
newly added code and makes the test harder to understand without extra mental mapping.

Rule 2: Generic: Meaningful Naming and Self-Documenting Code
protocol/rpcprovider/rewardserver/reward_db_test.go[203-216]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
A non-obvious abbreviation (`cpes`) is used for a slice of `*RewardEntity` in a new test, reducing clarity.

## Issue Context
Tests are read frequently during maintenance; clearer naming reduces mistakes and improves comprehension.

## Fix Focus Areas
- protocol/rpcprovider/rewardserver/reward_db_test.go[203-216]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


4. Test uses generic data 📘 Rule violation ✓ Correctness
Description
The new BadgerDB test uses the generic name data for entry payload bytes, which is discouraged by
the naming compliance rule. A more specific name would make the test intent clearer.
Code

protocol/rpcprovider/rewardserver/badger_db_test.go[R28-38]

+	for i := range numEntries {
+		data := make([]byte, dataSize)
+		// Fill with non-zero data to ensure realistic size
+		for j := range data {
+			data[j] = byte(i % 256)
+		}
+		entries[i] = &DBEntry{
+			Key:  fmt.Sprintf("key-%d", i),
+			Data: data,
+			Ttl:  24 * time.Hour,
+		}
Evidence
Compliance ID 2 calls out generic names like data; the newly added test uses data for the entry
payload, which is not self-documenting.

Rule 2: Generic: Meaningful Naming and Self-Documenting Code
protocol/rpcprovider/rewardserver/badger_db_test.go[28-38]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The test variable name `data` is overly generic and not self-documenting.

## Issue Context
The compliance naming rule discourages generic names like `data` when a more descriptive option is available.

## Fix Focus Areas
- protocol/rpcprovider/rewardserver/badger_db_test.go[28-38]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

nimrod-teich and others added 2 commits March 3, 2026 13:37
Replace single-transaction db.Update() with BadgerDB's WriteBatch in
saveAll(), which automatically splits large writes across multiple
transactions. This fixes providers seeing repeated "Txn is too big to
fit into one request" errors when serving high relay volumes.

Also fix a bug in RewardDB.BatchSave where successful chunked saves
still returned the original error due to a missing else branch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nimrod-teich nimrod-teich force-pushed the fix/reward-db-txn-too-big branch from aae86eb to 9bbf789 Compare March 3, 2026 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant