We've run into some performance issues when running ddf_utils.package.create_datapackage(). We have some files with hundreds of thousands of entities and running this function takes a very long time in those cases.
After some profiling it turns out that the culprit is EntityDomain.add_entity() in ddf_utils.model.ddf which as I understand it loops through all rows in entity files and runs some identity checks. Would it be possible to vectorize that loop?
We've run into some performance issues when running
ddf_utils.package.create_datapackage(). We have some files with hundreds of thousands of entities and running this function takes a very long time in those cases.After some profiling it turns out that the culprit is
EntityDomain.add_entity()inddf_utils.model.ddfwhich as I understand it loops through all rows in entity files and runs some identity checks. Would it be possible to vectorize that loop?