Skip to content

DD-2248: Name conflict between directory and file#242

Draft
jo-pol wants to merge 39 commits into
DANS-KNAW:v6.9-DANS-DataStationfrom
DANS-KNAW-jp:DD-2248-file-dir-name-conflict
Draft

DD-2248: Name conflict between directory and file#242
jo-pol wants to merge 39 commits into
DANS-KNAW:v6.9-DANS-DataStationfrom
DANS-KNAW-jp:DD-2248-file-dir-name-conflict

Conversation

@jo-pol
Copy link
Copy Markdown

@jo-pol jo-pol commented Apr 9, 2026

What this PR does / why we need it:

  • When a dataset has a file with a name equal to a directory name (full path or ancestor), it is not possible to unzip a download.
  • The error message showed only the first duplicate.
image

Which issue(s) this PR closes:

Special notes for your reviewer:

  • Existing unit tests for duplicate FileMetadata(s) use the same instances in the dataset as well for files to be added. Created new style tests with deep clones to cover the changed code.

Suggestions on how to test this:

See new unit tests.
More complex manual tests than the DANS Jira issue describes:

  • Before deploy: create a dataset with conflicts like the screenshot above shows,

    • download all its files, unzip is not possible

    • the existing conflicts should be reported when executing as postgres user (see help):

      python3 /vagrant/external/dataverse/scripts/issues/dirs-duplicating-files/find_duplicates.py
      
  • web UI after deploy

    • Show that existing conflicts need a fix.
      • Add a file that does not have a conflict.
      • Saving results in an error message: The files could not be updated. – Duplicate path and/or filenames: bar, foo/bar, foo, bar/foo
    • Given a dataset with files bar/foo and foo/bar/beer upload files foo and bar without a path: the files are renamed with a -1 extension.
    • Upload the zip with file/directory conflicts to another dataset with different files or no files: Saving fails with an error message: The files could not be updated. – Duplicate path and/or filenames: ...
  • API

  • test tabular files, See also https://drivenbydata.atlassian.net/browse/DD-2270?focusedCommentId=68001

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

See the changed dataset-management.rst file, the other changed rst file had a typo.

Is there a release notes update needed for this change?:

Existing datasets with the new type of duplicates should be detected and fixed.

Additional documentation:

@jo-pol jo-pol marked this pull request as draft April 9, 2026 12:37
Comment thread src/test/java/edu/harvard/iq/dataverse/ingest/IngestUtilTest.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants