MID-10990, MID-11046, MID-11047: Fix CSV export performance and stability issues#482
Open
wadahiro wants to merge 10 commits intoEvolveum:support-4.10from
Open
MID-10990, MID-11046, MID-11047: Fix CSV export performance and stability issues#482wadahiro wants to merge 10 commits intoEvolveum:support-4.10from
wadahiro wants to merge 10 commits intoEvolveum:support-4.10from
Conversation
… better performance
9e53d2e to
a58b7e7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR addresses OutOfMemoryError during CSV export of large datasets and improves export performance significantly.
Background
Before this fix, CSV export of large datasets had several critical issues:
1. OutOfMemoryError
Loading all data into memory caused OOME with large datasets.
2. PostgreSQL IN clause parameter limit
Export of more than 65,535 records was impossible due to PostgreSQL's prepared statement parameter limit:
3. AccessCertificationWorkItem export performance issues
Even after resolving the OOME and IN clause issues, AccessCertificationWorkItem export had severe performance problems. Exporting 5,000 WorkItems took over 8 minutes due to multiple N+1 query problems:
Data structure:
N+1 queries per WorkItem:
loadObject()to resolve display namesSolution
This fix implements:
beforeTransformation: Loads Campaign, Case, and references in batches of 100 items using IN clausesReferenceNameResolver: Caches name/displayName across batches to avoid redundant queriesref.getObject()instead of callingloadObject()for each rowChanges
MID-10990: OutOfMemoryError during CSV Export of Large Datasets
IterativeExportSupportandStreamingCsvDataExporteriterationPageSize=-1beforeTransformationto eliminate N+1 queriesMID-11047: AccessCertificationWorkItem list unstable display order
(ownerOid, accessCertCaseCid, cid)for stable display orderMID-11046: CSV export missing .csv extension
.csvextension is appended regardless of user inputPerformance Results
AccessCertificationWorkItem Export
User Export (Large Dataset)