fix: use WriteBatch in BadgerDB to prevent "Txn is too big" errors by nimrod-teich · Pull Request #2228 · lavanet/lava

nimrod-teich · 2026-02-22T10:33:35Z

Replace single-transaction db.Update() with BadgerDB's WriteBatch in saveAll(), which automatically splits large writes across multiple transactions. This fixes providers seeing repeated "Txn is too big to fit into one request" errors when serving high relay volumes.

Also fix a bug in RewardDB.BatchSave where successful chunked saves still returned the original error due to a missing else branch.

Description

Closes: #XXXX

Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

read the contribution guide
included the correct type prefix in the PR title, you can find examples of the prefixes below:
confirmed ! in the type prefix if API or client breaking change
targeted the main branch
provided a link to the relevant issue or specification
reviewed "Files changed" and left comments if necessary
included the necessary unit and integration tests
updated the relevant documentation or specification, including comments for documenting Go code
confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

confirmed the correct type prefix in the PR title
confirmed all author checklist items have been addressed
reviewed state machine logic, API design and naming, documentation is accurate, tests and test coverage

codecov · 2026-02-22T10:41:58Z

Codecov Report

❌ Patch coverage is 40.00000% with 6 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
protocol/rpcprovider/rewardserver/badger_db.go	57.14%	2 Missing and 1 partial ⚠️
protocol/rpcprovider/rewardserver/reward_db.go	0.00%	3 Missing ⚠️

Flag	Coverage Δ
consensus	`8.71% <ø> (ø)`
protocol	`34.63% <40.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
protocol/rpcprovider/rewardserver/badger_db.go	`50.83% <57.14%> (+2.07%)`	⬆️
protocol/rpcprovider/rewardserver/reward_db.go	`63.88% <0.00%> (+0.44%)`	⬆️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2026-02-22T10:51:50Z

Test Results

0 tests ±0 0 ✅ ±0 0s ⏱️ ±0s
0 suites ±0 0 💤 ±0
7 files ±0 0 ❌ ±0

Results for commit 9bbf789. ± Comparison against base commit 9861746.

♻️ This comment has been updated with latest results.

qodo-code-review · 2026-02-22T14:35:56Z

Review Summary by Qodo

Fix BadgerDB transaction size limits with WriteBatch and improve error handling

🐞 Bug fix 🧪 Tests

Walkthroughs

Description

• Replace single-transaction db.Update() with BadgerDB's WriteBatch in saveAll() to handle
  large writes
• Fix missing else branch in RewardDB.BatchSave that returned errors after successful chunked
  saves
• Add comprehensive tests for large transaction handling exceeding BadgerDB limits
• Simplify batch chunking logic using min() function

Diagram

flowchart LR
  A["Single Transaction<br/>db.Update()"] -->|"Replace with"| B["WriteBatch<br/>Auto-splits large writes"]
  C["Missing else branch<br/>returns error"] -->|"Fix"| D["Proper error handling<br/>after chunked saves"]
  B -->|"Tested by"| E["Large payload tests<br/>2000 entries x 10KB"]
  D -->|"Verified by"| E

File Changes

1. protocol/rpcprovider/rewardserver/badger_db.go 🐞 Bug fix +8/-12

Use WriteBatch to handle large transactions

• Replace db.Update() transaction with WriteBatch that automatically splits large writes across
 multiple transactions
• Simplify error handling by calling wb.Cancel() on error and wb.Flush() to commit
• Remove nested transaction callback structure for cleaner code

protocol/rpcprovider/rewardserver/badger_db.go

2. protocol/rpcprovider/rewardserver/badger_db_test.go 🧪 Tests +55/-0

Add test for large transaction handling

• Add new test TestBadgerDB_BatchSaveLargeTransaction to verify handling of 2000 entries (~20MB
 total)
• Test exceeds BadgerDB's default ~9.6MB transaction limit to ensure WriteBatch splits correctly
• Verify all entries are persisted and retrievable after large batch save

protocol/rpcprovider/rewardserver/badger_db_test.go

3. protocol/rpcprovider/rewardserver/reward_db.go 🐞 Bug fix +3/-5

Fix error handling in chunked batch saves

• Fix bug where successful chunked saves still returned the original error due to missing else
 branch
• Simplify batch end calculation using min() function instead of conditional logic
• Ensure error is only returned when chunked save actually fails, not after successful retry

protocol/rpcprovider/rewardserver/reward_db.go

View more (1)

4. protocol/rpcprovider/rewardserver/reward_db_test.go 🧪 Tests +37/-0

Add test for large reward batch persistence

• Add new test TestSaveBatchLargePayload with 2000 entries using 10KB content hashes
• Test total batch size (~20MB) exceeds BadgerDB's transaction limit to verify chunking works
• Verify all entries are persisted correctly in a single epoch

protocol/rpcprovider/rewardserver/reward_db_test.go

qodo-code-review · 2026-02-22T14:39:13Z

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (4) 📎 Requirement gaps (0)

1. saveAll() lacks wb.Cancel() 📘 Rule violation ⛯ Reliability

Description

saveAll() creates a Badger WriteBatch but doesn’t guarantee wb.Cancel() is called on all paths
(e.g., if wb.Flush() returns an error), which can leave resources/buffers unreleased and
complicate recovery. This misses robust failure-path handling for a critical write operation.

Code

protocol/rpcprovider/rewardserver/badger_db.go[R51-59]

+	wb := mdb.db.NewWriteBatch()
+	for key, data := range mdb.rewards {
+		e := badger.NewEntry([]byte(key), data.data).WithTTL(data.remainingTtl())
+		if err := wb.SetEntry(e); err != nil {
+			wb.Cancel()
+			return err
		}
-
-		return nil
-	})
-
-	return err
+	}
+	return wb.Flush()

Evidence
Compliance ID 3 requires handling failure points and edge cases; here the batch is only canceled on
SetEntry error, but not if Flush() fails, leaving an unhandled failure/cleanup path in newly
changed code.
Rule 3: Generic: Robust Error Handling and Edge Case Management
protocol/rpcprovider/rewardserver/badger_db.go[51-59]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`saveAll()` does not guarantee `wb.Cancel()` is executed when `wb.Flush()` fails, leaving incomplete cleanup on an error path.

## Issue Context
The PR replaced `db.Update()` with `NewWriteBatch()` to avoid &quot;Txn is too big&quot; errors. WriteBatch usage should include consistent cleanup to handle failure paths robustly.

## Fix Focus Areas
- protocol/rpcprovider/rewardserver/badger_db.go[51-59]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. saveAll() returns bare errors 📘 Rule violation ✧ Quality

Description

The new WriteBatch code returns underlying Badger errors directly without adding operation/key
context, reducing debuggability in production. This makes it harder to identify which key/operation
failed during large batch writes.

Code

protocol/rpcprovider/rewardserver/badger_db.go[R53-59]

+		e := badger.NewEntry([]byte(key), data.data).WithTTL(data.remainingTtl())
+		if err := wb.SetEntry(e); err != nil {
+			wb.Cancel()
+			return err
		}
-
-		return nil
-	})
-
-	return err
+	}
+	return wb.Flush()

Evidence
Compliance ID 3 requires error messages with actionable context; the changed code returns raw errors
from wb.SetEntry/wb.Flush without indicating the failing key or that the error occurred during
batch flush.
Rule 3: Generic: Robust Error Handling and Edge Case Management
protocol/rpcprovider/rewardserver/badger_db.go[53-59]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Errors from batch writes are returned without context, which makes diagnosing large-write failures difficult.

## Issue Context
During high relay volumes, batch size and write failures are likely; returning contextual errors helps identify failing operations/keys.

## Fix Focus Areas
- protocol/rpcprovider/rewardserver/badger_db.go[53-59]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. cpes variable name unclear 📘 Rule violation ✓ Correctness

Description

The new test uses the abbreviation cpes, which is not self-explanatory and reduces
readability/maintainability. This violates the requirement for meaningful naming and
self-documenting code.

Code

protocol/rpcprovider/rewardserver/reward_db_test.go[R203-216]

+	cpes := make([]*rewardserver.RewardEntity, numEntries)
+	for i := range numEntries {
+		proof := common.BuildRelayRequestWithSession(ctx, "providerAddr", largeContentHash, uint64(i+1), uint64(0), "specId", nil)
+		cpes[i] = &rewardserver.RewardEntity{
+			Epoch:        uint64(proof.Epoch),
+			ConsumerAddr: fmt.Sprintf("consumerAddr%d", i),
+			ConsumerKey:  fmt.Sprintf("consumerKey%d", i),
+			SessionId:    proof.SessionId,
+			Proof:        proof,
+		}
+	}
+
+	err = rs.BatchSave(cpes)
+	require.NoError(t, err)

Evidence
Compliance ID 2 requires identifiers to clearly express intent; cpes is an unclear abbreviation in
newly added code and makes the test harder to understand without extra mental mapping.
Rule 2: Generic: Meaningful Naming and Self-Documenting Code
protocol/rpcprovider/rewardserver/reward_db_test.go[203-216]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
A non-obvious abbreviation (`cpes`) is used for a slice of `*RewardEntity` in a new test, reducing clarity.

## Issue Context
Tests are read frequently during maintenance; clearer naming reduces mistakes and improves comprehension.

## Fix Focus Areas
- protocol/rpcprovider/rewardserver/reward_db_test.go[203-216]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

4. Test uses generic data 📘 Rule violation ✓ Correctness

Description

The new BadgerDB test uses the generic name data for entry payload bytes, which is discouraged by
the naming compliance rule. A more specific name would make the test intent clearer.

Code

protocol/rpcprovider/rewardserver/badger_db_test.go[R28-38]

+	for i := range numEntries {
+		data := make([]byte, dataSize)
+		// Fill with non-zero data to ensure realistic size
+		for j := range data {
+			data[j] = byte(i % 256)
+		}
+		entries[i] = &DBEntry{
+			Key:  fmt.Sprintf("key-%d", i),
+			Data: data,
+			Ttl:  24 * time.Hour,
+		}

Evidence
Compliance ID 2 calls out generic names like data; the newly added test uses data for the entry
payload, which is not self-documenting.
Rule 2: Generic: Meaningful Naming and Self-Documenting Code
protocol/rpcprovider/rewardserver/badger_db_test.go[28-38]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The test variable name `data` is overly generic and not self-documenting.

## Issue Context
The compliance naming rule discourages generic names like `data` when a more descriptive option is available.

## Fix Focus Areas
- protocol/rpcprovider/rewardserver/badger_db_test.go[28-38]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ The new review experience is currently in Beta. Learn more

Replace single-transaction db.Update() with BadgerDB's WriteBatch in saveAll(), which automatically splits large writes across multiple transactions. This fixes providers seeing repeated "Txn is too big to fit into one request" errors when serving high relay volumes. Also fix a bug in RewardDB.BatchSave where successful chunked saves still returned the original error due to a missing else branch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

pull-request-size bot added the size/L label Feb 22, 2026

github-actions bot added C:protocol Team:Protocol labels Feb 22, 2026

nimrod-teich closed this Feb 22, 2026

nimrod-teich reopened this Feb 22, 2026

nimrod-teich and others added 2 commits March 3, 2026 13:37

fix: address lint forcetypeassert in badger_db_test

9bbf789

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

nimrod-teich force-pushed the fix/reward-db-txn-too-big branch from aae86eb to 9bbf789 Compare March 3, 2026 11:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use WriteBatch in BadgerDB to prevent "Txn is too big" errors#2228

fix: use WriteBatch in BadgerDB to prevent "Txn is too big" errors#2228
nimrod-teich wants to merge 2 commits intomainfrom
fix/reward-db-txn-too-big

nimrod-teich commented Feb 22, 2026

Uh oh!

codecov bot commented Feb 22, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 22, 2026 •

edited

Loading

Uh oh!

qodo-code-review bot commented Feb 22, 2026

Uh oh!

qodo-code-review bot commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nimrod-teich commented Feb 22, 2026

Description

Author Checklist

Reviewers Checklist

Uh oh!

codecov bot commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

qodo-code-review bot commented Feb 22, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review bot commented Feb 22, 2026

Code Review by Qodo

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov bot commented Feb 22, 2026 •

edited

Loading

github-actions bot commented Feb 22, 2026 •

edited

Loading