fix(warehouses): Catch nan strings that look like floats to Python.#338
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe PR updates the conditional logic in the ChangesAdditionalComment markdown conversion
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
An exception was beong thrown saying float did not have a replace method. The value of the cell was nan. Just leave this as it is.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
warehouses/facility_ops_landing/ingest/accelerator/opralogweb/opralogweb.py (1)
108-121:⚠️ Potential issue | 🟠 MajorAdd a test that covers the pandas NaN/"nan" edge case for
AdditionalCommentmarkdown conversion.
The existingopralogwebintegration test (warehouses/facility_ops_landing/ingest/accelerator/opralogweb/tests/test_extract_and_load.py) only generatesAdditionalCommentas normal strings via the fake source (AdditionalComment=f"<p>{record['AdditionalComment']}</p>"), so it never exercisesNone,float('nan'), or the"nan"string case pandas may coerce.
Add a focused unit test foradditional_comment_to_markdowncovering:None,float('nan')/pandas-NaN,"nan"as a string, and a valid HTML string; assert the expected output for each.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@warehouses/facility_ops_landing/ingest/accelerator/opralogweb/opralogweb.py` around lines 108 - 121, Add a focused unit test for the function additional_comment_to_markdown that builds a pyarrow.Table with AdditionalComment values [None, float('nan'), "nan", "<p>bold</p>"], calls additional_comment_to_markdown, and asserts the results: the first two entries are pandas.isna (to cover None and pandas/NumPy NaN), the third equals the literal "nan", and the fourth equals html2text("<p>bold</p>") (use html2text to compute the expected markdown). Place this new test alongside the existing tests (e.g., in a new test_additional_comment_to_markdown test module) so it runs with the existing test suite and references additional_comment_to_markdown and html2text.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@warehouses/facility_ops_landing/ingest/accelerator/opralogweb/opralogweb.py`:
- Line 117: The lambda's condition (x is None and isinstance(x, str)) is
impossible and causes html2text to be called on non-strings (e.g., NaN floats);
change the expression to only call html2text when x is actually a string—replace
the lambda with something like: use html2text(x) if isinstance(x, str) else x
(or, if you need to treat NaN specially, check math.isnan(x) for floats and
return x unchanged). Update the lambda where html2text is used so it guards with
isinstance(x, str) (and import math/numpy.isnan if you choose the NaN check).
---
Outside diff comments:
In `@warehouses/facility_ops_landing/ingest/accelerator/opralogweb/opralogweb.py`:
- Around line 108-121: Add a focused unit test for the function
additional_comment_to_markdown that builds a pyarrow.Table with
AdditionalComment values [None, float('nan'), "nan", "<p>bold</p>"], calls
additional_comment_to_markdown, and asserts the results: the first two entries
are pandas.isna (to cover None and pandas/NumPy NaN), the third equals the
literal "nan", and the fourth equals html2text("<p>bold</p>") (use html2text to
compute the expected markdown). Place this new test alongside the existing tests
(e.g., in a new test_additional_comment_to_markdown test module) so it runs with
the existing test suite and references additional_comment_to_markdown and
html2text.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c119d22b-67af-46f9-b8bd-6d987b7ba549
📒 Files selected for processing (1)
warehouses/facility_ops_landing/ingest/accelerator/opralogweb/opralogweb.py
275c503 to
1c11568
Compare
|
Test failure is an error pulling container images. Fix needed urgently. |
Summary
An error:
appeared in the Opralogweb extract logs. It came from a text containing 'nan' that had been converted to a Python float and html2text failed on it. This adds protection to only process strings.
Note that this was spotted by a user as we don't have the required visibility of the ELT pipelines.
Summary by CodeRabbit
Release Notes