Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions data_management/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,5 +114,29 @@ This endpoint retrieves the packages that are ready to be moved from Globus. The
### rebuild library modules on change
`$ python3 setup.py install --user`

## DLU Watcher

The dlu_watcher is controlled via the DluWatcher docker file and runs watch_files.py. It uses many of the same services as are used for the data-manager-service. The dlu watcher does a lot of background processing of data that is often triggered by the data managers updating records in the DMD tables.

### Slide Renaming
One major (new) functionality of the dlu-watcher is the slide renaming process. We periodically check the slide_manifest_import table for new records that we have not put into the slide_scan_curation table (this is a pretty brain-dead check of equal number of rows). If there are new rows in the slide_manifest_import table, we continue processing those rows.

The majority of the functionality for processing slide renaming lives in slide_management.py in services.

The first step is to process the new rows in slide_manifest_import and verify that we have all of the neccesary information in order to do an import. If the process is not picking up new records in the slide_manifest_import, check the log files in dlu_watcher for potential errors as we have not created a record in the slide_scan_curation table at this point, and do not want to since there are issues.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix spelling error: "neccesary" → "necessary".

The word "neccesary" is misspelled and should be corrected to "necessary".

📝 Proposed fix
-The first step is to process the new rows in slide_manifest_import and verify that we have all of the neccesary information in order to do an import. If the process is not picking up new records in the slide_manifest_import, check the log files in dlu_watcher for potential errors as we have not created a record in the slide_scan_curation table at this point, and do not want to since there are issues.
+The first step is to process the new rows in slide_manifest_import and verify that we have all the necessary information to do an import. If the process is not picking up new records in the slide_manifest_import, check the log files in dlu_watcher for potential errors as we have not created a record in the slide_scan_curation table at this point, and do not want to since there are issues.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
The first step is to process the new rows in slide_manifest_import and verify that we have all of the neccesary information in order to do an import. If the process is not picking up new records in the slide_manifest_import, check the log files in dlu_watcher for potential errors as we have not created a record in the slide_scan_curation table at this point, and do not want to since there are issues.
The first step is to process the new rows in slide_manifest_import and verify that we have all the necessary information to do an import. If the process is not picking up new records in the slide_manifest_import, check the log files in dlu_watcher for potential errors as we have not created a record in the slide_scan_curation table at this point, and do not want to since there are issues.
🧰 Tools
🪛 LanguageTool

[style] ~126-~126: Consider removing “of” to be more concise
Context: ...manifest_import and verify that we have all of the neccesary information in order to do an...

(ALL_OF_THE)


[grammar] ~126-~126: Ensure spelling is correct
Context: ...port and verify that we have all of the neccesary information in order to do an import. I...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~126-~126: Consider a more concise word here.
Context: ...e have all of the neccesary information in order to do an import. If the process is not pic...

(IN_ORDER_TO_PREMIUM)


[style] ~126-~126: ‘new records’ might be wordy. Consider a shorter alternative.
Context: ...mport. If the process is not picking up new records in the slide_manifest_import, check the...

(EN_WORDINESS_PREMIUM_NEW_RECORDS)


If records meet the preliminary checks, then they get added to the slide_scan_curation table. From here on, if there is an error, we will add that error to the appropriate records inside of slide_scan_curation.

### Moving files to DataLake
The other function of the dlu-watcher is to move files from Globus into the Data Lake. There are 2 main paths here.

#### Whole Slide Images

For Whole slide images that are not a recalled package, we need to go down a different path to move the files into the data lake. We need to rename the files and then move the files in place and update the databases. This is a parallel path to non-WSI files, and calls many of the same functions, but ended up needing to be a partial duplicate of the process for non-wsi, because we were unable to rename the files on copy.

#### Non-WSI

This will do a number of checks to make sure everything is copasetic and then call out to the appropriate locations to create new records in the databases, create directories in the data lake, and copy the data into the data lake. This also includes generating the md5checksums to store in the databases and can sometimes cause issues when we have extremely large files.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix spelling error: "copasetic" → "copacetic".

The word "copasetic" is misspelled and should be corrected to "copacetic".

📝 Proposed fix
-This will do a number of checks to make sure everything is copasetic and then call out to the appropriate locations to create new records in the databases, create directories in the data lake, and copy the data into the data lake. This also includes generating the md5checksums to store in the databases and can sometimes cause issues when we have extremely large files.
+This will do a number of checks to make sure everything is copacetic and then call out to the appropriate locations to create new records in the databases, create directories in the data lake, and copy the data into the data lake. This also includes generating the md5checksums to store in the databases and can sometimes cause issues when we have extremely large files.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
This will do a number of checks to make sure everything is copasetic and then call out to the appropriate locations to create new records in the databases, create directories in the data lake, and copy the data into the data lake. This also includes generating the md5checksums to store in the databases and can sometimes cause issues when we have extremely large files.
This will do a number of checks to make sure everything is copacetic and then call out to the appropriate locations to create new records in the databases, create directories in the data lake, and copy the data into the data lake. This also includes generating the md5checksums to store in the databases and can sometimes cause issues when we have extremely large files.
🧰 Tools
🪛 LanguageTool

[grammar] ~139-~139: Ensure spelling is correct
Context: ...er of checks to make sure everything is copasetic and then call out to the appropriate lo...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~139-~139: ‘new records’ might be wordy. Consider a shorter alternative.
Context: ... to the appropriate locations to create new records in the databases, create directories in...

(EN_WORDINESS_PREMIUM_NEW_RECORDS)


[grammar] ~139-~139: Ensure spelling is correct
Context: ...lake. This also includes generating the md5checksums to store in the databases and can somet...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~139-~139: As an alternative to the over-used intensifier ‘extremely’, consider replacing this phrase.
Context: ...can sometimes cause issues when we have extremely large files. ## Known Bug There is a known [...

(EN_WEAK_ADJECTIVE)


## Known Bug
There is a known [bug with docker on MacOS](https://github.com/docker/for-mac/issues/2670) in which the container is unable to talk to the host network. This problem may occur when attempting to connect to a tunnel created on the host machine. To work around this issue, you can either run this on a linux machine/windows machine, or bypass docker completely and run the script directly on your local machine.
Loading