Skip to content

Conversation

@emsonder
Copy link
Contributor

Fixes:

  • Usage of pre-computed insertion profiles
  • Normalization of NA-containing columns (previously all would be "normalized" to NA)

Changes:

  • Ensuring order of features explicitly during predictions
  • Removing cellular contexts labelled testing/validation from binding pattern computation

@emsonder emsonder requested review from Copilot and plger October 27, 2025 11:51
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes issues with pre-computed insertion profiles and NA-containing column normalization. The changes ensure that feature order is explicitly maintained during predictions and exclude testing/validation datasets from binding pattern computations.

  • Fixed usage of pre-computed insertion profiles by properly handling calcProfile parameter
  • Added NA-awareness to normalization functions to prevent all-NA results
  • Ensured feature ordering matches model expectations during predictions

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/testthat/test-tfFeatures.R Added test verifying testing datasets are excluded from binding pattern features
tests/testthat/test-predictTfBinding.R Added test for feature order matching between model and feature matrix
tests/testthat/test-insertionProfiles.R Updated message expectations and added test for mismatched pre-computed profiles
tests/testthat/test-getFeatureMatrix.R Added tests verifying NA preservation in normalization functions
tests/testthat/test-contextTfFeatures.R Added comprehensive tests for pre-computed profile handling
R/tfFeatures.R Added informational messages and excluded testing/validation samples from binding patterns
R/predict.R Replaced column filtering with explicit feature ordering from model
R/helpers.R Added naRm parameter to margin calculation functions
R/getInsertionProfiles.R Added validation for pre-computed profiles and improved messaging
R/getFeatureMatrix.R Added naRm parameter to normalization functions
R/contextTfFeatures.R Added pre-computed profile handling and motif_id assignment

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 58 to 60
modText <- model$save_model_to_string()
featOrder <- unlist(tstrsplit(modText, split="\n", keep=8))
featOrder <- unlist(tstrsplit(gsub("feature_names=","", featOrder), split=" "))
Copy link

Copilot AI Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature order extraction relies on hard-coded line number (keep=8) from the model string representation, which is fragile and may break if the model's string format changes. Consider adding error handling to verify that feature names were successfully extracted, or use a more robust method to access feature names if available in the lightgbm API.

Suggested change
modText <- model$save_model_to_string()
featOrder <- unlist(tstrsplit(modText, split="\n", keep=8))
featOrder <- unlist(tstrsplit(gsub("feature_names=","", featOrder), split=" "))
featOrder <- model$feature_name()

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model$feature_name() does not seem to exist in the R-API of lightgbm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one could still check that the "feature_names=" is present in that row

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, see commit 2ef3949](2ef3949)

@codecov
Copy link

codecov bot commented Oct 27, 2025

Codecov Report

❌ Patch coverage is 96.29630% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
R/getFeatureMatrix.R 85.71% 1 Missing ⚠️
R/predict.R 87.50% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@emsonder emsonder merged commit 6944a1d into main Oct 28, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants