Update README.md by rlreamy · Pull Request #163 · KPMP/libra-data

rlreamy · 2026-03-12T15:24:02Z

Summary by CodeRabbit

Documentation
- Added comprehensive documentation describing DLU Watcher functionality, including slide renaming workflows, file processing procedures, DataLake file management for Whole Slide Images and non-WSI data, and a note regarding a known issue on macOS systems.

coderabbitai · 2026-03-12T15:26:45Z

Walkthrough

Documentation expanded in data_management/README.md to describe DLU Watcher functionality, including slide renaming workflows, processing steps for manifest imports and curation, file movement operations to DataLake, and a known Docker macOS issue. No functional code changes.

Changes

Cohort / File(s)	Summary
Documentation `data_management/README.md`	Added sections documenting DLU Watcher overview, Slide Renaming workflow, slide_manifest_import and slide_scan_curation processing steps, file movement to DataLake with WSI/Non-WSI subpath handling, and a Docker on macOS known issue note.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch dlu-watcher-doco

📝 Coding Plan for PR comments

Generate coding plan

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

data_management/README.md (2)

122-126: Consider simplifying wordy phrases for better readability.

Several phrases could be made more concise:

"new records" → "records" (appears multiple times)

"all of the" → "all the"

"in order to" → "to"

These changes would improve readability without altering meaning.

139-139: Consider improving technical terminology and phrasing.

A few optional improvements:

"md5checksums" → "MD5 checksums" (standard capitalization and spacing)

"extremely large files" → "very large files" or specify a size threshold

These changes would enhance technical clarity.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0a970690-cdb1-40ba-9e6f-d7bdcdc962fe

📥 Commits

Reviewing files that changed from the base of the PR and between 6d84115 and 732e810.

📒 Files selected for processing (1)

data_management/README.md

coderabbitai · 2026-03-12T15:26:49Z

data_management/README.md

+
+The majority of the functionality for processing slide renaming lives in slide_management.py in services.
+
+The first step is to process the new rows in slide_manifest_import and verify that we have all of the neccesary information in order to do an import. If the process is not picking up new records in the slide_manifest_import, check the log files in dlu_watcher for potential errors as we have not created a record in the slide_scan_curation table at this point, and do not want to since there are issues.


⚠️ Potential issue | 🟡 Minor

Fix spelling error: "neccesary" → "necessary".

The word "neccesary" is misspelled and should be corrected to "necessary".

📝 Proposed fix

-The first step is to process the new rows in slide_manifest_import and verify that we have all of the neccesary information in order to do an import. If the process is not picking up new records in the slide_manifest_import, check the log files in dlu_watcher for potential errors as we have not created a record in the slide_scan_curation table at this point, and do not want to since there are issues. +The first step is to process the new rows in slide_manifest_import and verify that we have all the necessary information to do an import. If the process is not picking up new records in the slide_manifest_import, check the log files in dlu_watcher for potential errors as we have not created a record in the slide_scan_curation table at this point, and do not want to since there are issues.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

The first step is to process the new rows in slide_manifest_import and verify that we have all of the neccesary information in order to do an import. If the process is not picking up new records in the slide_manifest_import, check the log files in dlu_watcher for potential errors as we have not created a record in the slide_scan_curation table at this point, and do not want to since there are issues.

The first step is to process the new rows in slide_manifest_import and verify that we have all the necessary information to do an import. If the process is not picking up new records in the slide_manifest_import, check the log files in dlu_watcher for potential errors as we have not created a record in the slide_scan_curation table at this point, and do not want to since there are issues.

🧰 Tools

🪛 LanguageTool

[style] ~126-~126: Consider removing “of” to be more concise
Context: ...manifest_import and verify that we have all of the neccesary information in order to do an...

(ALL_OF_THE)

[grammar] ~126-~126: Ensure spelling is correct
Context: ...port and verify that we have all of the neccesary information in order to do an import. I...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[style] ~126-~126: Consider a more concise word here.
Context: ...e have all of the neccesary information in order to do an import. If the process is not pic...

(IN_ORDER_TO_PREMIUM)

[style] ~126-~126: ‘new records’ might be wordy. Consider a shorter alternative.
Context: ...mport. If the process is not picking up new records in the slide_manifest_import, check the...

(EN_WORDINESS_PREMIUM_NEW_RECORDS)

coderabbitai · 2026-03-12T15:26:49Z

data_management/README.md

+
+#### Non-WSI 
+
+This will do a number of checks to make sure everything is copasetic and then call out to the appropriate locations to create new records in the databases, create directories in the data lake, and copy the data into the data lake. This also includes generating the md5checksums to store in the databases and can sometimes cause issues when we have extremely large files.


⚠️ Potential issue | 🟡 Minor

Fix spelling error: "copasetic" → "copacetic".

The word "copasetic" is misspelled and should be corrected to "copacetic".

📝 Proposed fix

-This will do a number of checks to make sure everything is copasetic and then call out to the appropriate locations to create new records in the databases, create directories in the data lake, and copy the data into the data lake. This also includes generating the md5checksums to store in the databases and can sometimes cause issues when we have extremely large files. +This will do a number of checks to make sure everything is copacetic and then call out to the appropriate locations to create new records in the databases, create directories in the data lake, and copy the data into the data lake. This also includes generating the md5checksums to store in the databases and can sometimes cause issues when we have extremely large files.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

This will do a number of checks to make sure everything is copasetic and then call out to the appropriate locations to create new records in the databases, create directories in the data lake, and copy the data into the data lake. This also includes generating the md5checksums to store in the databases and can sometimes cause issues when we have extremely large files.

This will do a number of checks to make sure everything is copacetic and then call out to the appropriate locations to create new records in the databases, create directories in the data lake, and copy the data into the data lake. This also includes generating the md5checksums to store in the databases and can sometimes cause issues when we have extremely large files.

🧰 Tools

🪛 LanguageTool

[grammar] ~139-~139: Ensure spelling is correct
Context: ...er of checks to make sure everything is copasetic and then call out to the appropriate lo...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[style] ~139-~139: ‘new records’ might be wordy. Consider a shorter alternative.
Context: ... to the appropriate locations to create new records in the databases, create directories in...

(EN_WORDINESS_PREMIUM_NEW_RECORDS)

[grammar] ~139-~139: Ensure spelling is correct
Context: ...lake. This also includes generating the md5checksums to store in the databases and can somet...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[style] ~139-~139: As an alternative to the over-used intensifier ‘extremely’, consider replacing this phrase.
Context: ...can sometimes cause issues when we have extremely large files. ## Known Bug There is a known [...

(EN_WEAK_ADJECTIVE)

Update README.md

732e810

coderabbitai bot reviewed Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update README.md#163

Update README.md#163
rlreamy wants to merge 1 commit intodevelopfrom
dlu-watcher-doco

rlreamy commented Mar 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 12, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 12, 2026

Uh oh!

coderabbitai bot Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		The majority of the functionality for processing slide renaming lives in slide_management.py in services.

		The first step is to process the new rows in slide_manifest_import and verify that we have all of the neccesary information in order to do an import. If the process is not picking up new records in the slide_manifest_import, check the log files in dlu_watcher for potential errors as we have not created a record in the slide_scan_curation table at this point, and do not want to since there are issues.


		#### Non-WSI

		This will do a number of checks to make sure everything is copasetic and then call out to the appropriate locations to create new records in the databases, create directories in the data lake, and copy the data into the data lake. This also includes generating the md5checksums to store in the databases and can sometimes cause issues when we have extremely large files.

Conversation

rlreamy commented Mar 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 12, 2026

Walkthrough

Changes

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rlreamy commented Mar 12, 2026 •

edited by coderabbitai bot

Loading