Skip to content

[harness] nemo_retriever: Add dataset recall adapters for earnings and FinanceBench#1489

Merged
jioffe502 merged 4 commits intoNVIDIA:mainfrom
jioffe502:retriever-harness-recall-adapters
Mar 5, 2026
Merged

[harness] nemo_retriever: Add dataset recall adapters for earnings and FinanceBench#1489
jioffe502 merged 4 commits intoNVIDIA:mainfrom
jioffe502:retriever-harness-recall-adapters

Conversation

@jioffe502
Copy link
Collaborator

Description

Adds dataset-specific recall input normalization and matching modes so FinanceBench and earnings can run end-to-end in the new retriever harness with correct recall semantics. Also simplifies internal recall logic by centralizing hit checks in recall core.

Adds recall_adapter + recall_match_mode wiring in harness config and run flow
Implements page_plus_one and financebench_json adapters for query normalization
Supports document-level (pdf_only) and page-level (pdf_page) recall with tests

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

- normalize page-based CSVs into pdf_page recall keys
- add financebench JSON adapter with pdf_only matching mode
- recurse input globs so nested dataset directories ingest cleanly

Signed-off-by: Jacob Ioffe <jioffe@nvidia.com>
Made-with: Cursor
- reuse recall-core hit checks from batch pipeline
- use adapter registry for cleaner recall adapter routing
- parameterize recall-mode tests to reduce duplication

Signed-off-by: Jacob Ioffe <jioffe@nvidia.com>
Made-with: Cursor
@jioffe502 jioffe502 requested a review from a team as a code owner March 4, 2026 21:06
@jioffe502 jioffe502 requested a review from edknv March 4, 2026 21:06
jioffe502 and others added 2 commits March 4, 2026 16:08
- move nv_ingest_api import into NIM embed helper
- avoid module import failure when API package is absent

Signed-off-by: Jacob Ioffe <jioffe@nvidia.com>
Made-with: Cursor
@jioffe502 jioffe502 merged commit 704219d into NVIDIA:main Mar 5, 2026
9 checks passed
oliverholworthy pushed a commit to oliverholworthy/NeMo-Retriever that referenced this pull request Mar 6, 2026
…d FinanceBench (NVIDIA#1489)

Signed-off-by: Jacob Ioffe <jioffe@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants