Add possibility to overwrite data in datapackage by OlivierCoen · Pull Request #73 · dataforgoodfr/Coordonnees

OlivierCoen · 2026-04-13T06:18:38Z

Closes #57

This PR adds a new parameter in coordo to choose which strategy to use whenever a resource is added while a resource with the same name already exists:

raise error: no overwrite / merge is done, a mere error is raised
overwrite: the previous resource is deleted and all foreign keys pointing to it are removed; the file file replaces it but foreign keys must be added again manually for now
append: append to existing data without checking schema
append_strict: append to existing data, but ensures first that both schema match exactly

NOTE: for both append and append_strict, the actual merging of dataframes IS NOT IMPLEMENTED YET. This will be performed by @Cabanon in another US

It is a big PR and I took some freedom to refactor and factorise code. Feel free to comment / refuse whenever you feel some portions of code were changed unecessarily

Cabanon · 2026-04-16T10:20:13Z

        for res in self.resources:
            if res.name == name:
                continue
-            sm = safe(res, "schema")


I think this should stay. I'm not really fine with the fact that suppressing a resource is automatically deleting the fks, i prefer we raise, tell the user to delete the fks and try again. It's much more explicit and less error-prone

Hum ok 👍 But then, when adding data in "overwrite" strategy, what should we do?
Should we just check that both schemas match, overwrite data if it does, and raise error if it does not (with a message saying that we should force removing foreign keys beforehand if we really want to overwrite)?

IMO coordo should try to avoid doing side effects, if there is foreign keys then it's up to the user (the user being either a person or another project using coordo, data4trees for example) to decide how to manage it

Cabanon · 2026-04-16T10:32:40Z

+        if self.resource_exists(resource.name):
+            # resource already exists
+
+            if strategy == ResourceExistsStrategy.overwrite:


Why removing and recreating the resource ? what happens if there is a foreignkey ? this code will remove the fks and recreating the resource without the user knowing

this was the indended purpose indeed ;) in my opinion, there could two types of overwrite: "overwrite_strict", which overwrites data but checks that previous and current schemas match (and if it does not, it raises an error), and overwrite_flexible, which is the current behavior of overwrite. Do you think we should just raise an error whenever schema do not match and ask the user to remove foreign keys manually before?

Let's see this from a user standpoint:

On the website, we want a user to be able to import new data in the datapackage.
For external files and form data, we want to be able to overwrite data, but we already said that form metadata won't be updatable.
So IMO the overwrite function we want here is just to erase data and import new one, without changing the datapackage.json. So the load command should not always expect the datapackage definition, at least a ressource name, and overwrite only the data if it conforms to the already exsiting schema in datapackage.json. If not, raise an error.

Yeah i agree with arnaud, there is 2 possibility, either replace the whole table by the uploaded data or append the data (with a merging strategy which ensure primary keys are not duplicated) but in either case the schema should not be touched at all

if you wish to modify the schema, delete the resource and recreate it, but handling it in coordo will be error prone and difficult to maintain, let's stay as simple as possible

I agree with you

Cabanon

Really good job on the loader side, the implementation looks clean !

On the overwrite strategy I feel like this is overly complicated, because there is many ways of doing the same thing so I fear it will be difficult to maintain.

IMO ight now the add resource function is trying to do too much thing, and we should only have 3 specific actions :

Adding a resource : raise an error if name already exists and tell the user to either delete the existing resource or update the resource if he wants to insert data.

Updating a resource : by default this should show how many lines will be updated and created (you don't need to implement it in that PR because it will be done with the ducklake PR), and prompt the user if he wants to continue (or a flag in the func for integration with the data4trees web app)

Removing a resource: raise an error if there is foreign keys to it and tell the user to delete the fks

…e "schema" in "Resource" shadows an attribute in parent "BaseModel"'

…rite

Cabanon · 2026-04-18T12:22:48Z

+
        self.schema.foreignKeys.append(fk)

+    def remove_foreignkey(self, fk: ForeignKey) -> None:


Why add_foreignkey and remove_foreignkey doesn't have the same arguments ? It's not very coherent

I did not change remove_foreignkey because it is not used anywhere, but actually I'm going to write a cli for it

Cabanon · 2026-04-18T12:27:46Z


-    def add_foreignkey(self, fk: ForeignKey) -> None:
+    def add_foreignkey(self, fields: list[str], foreign_fields: list[str], foreign_resource: str) -> None:
+        fk = ForeignKey(


Composite foreign keys are not yet supported by coordo so we should limit to only one field pointing to one another field. Also we can't have multiple keys pointing to the same external resource (because the parser wouldn't know which one to use when auto-joining)

I understand that the cli does not support that yet, but perhaps we could already implement something more general? It does not hurt anyway

Yeah I understand that you want to support something more general but IMO it's better to block it and then add it later (and also update the parser at the same time) than to let users do something that is not supported by other parts of coordo

Even if this is not exposed from the cli, we could for instance use this method from D4T backend , so it is exposed as a lib component. Therefore I rather agree with Mathias, let's not expose something we don't really support yet.
Maybe we can keep this current signature, but make an assert on the lists so if their size is > 1 , we raise "Composite forign keys are not supported yet". WDYT ?

I made the modification :)

Add tests for datapackage

OlivierCoen requested a review from Cabanon April 13, 2026 06:18

OlivierCoen force-pushed the feat/add-additional-data branch from 5dc6087 to 67cc18d Compare April 13, 2026 06:23

OlivierCoen marked this pull request as draft April 13, 2026 06:39

OlivierCoen marked this pull request as ready for review April 14, 2026 22:01

OlivierCoen force-pushed the feat/add-additional-data branch 2 times, most recently from 32032c8 to fd1316d Compare April 14, 2026 22:06

Cabanon reviewed Apr 16, 2026

View reviewed changes

Comment thread coordo-py/coordo/datapackage/package.py Outdated

Cabanon reviewed Apr 16, 2026

View reviewed changes

Comment thread coordo-py/coordo/datapackage/package.py

Cabanon reviewed Apr 16, 2026

View reviewed changes

OlivierCoen force-pushed the feat/add-additional-data branch from 5758534 to d2effc6 Compare April 16, 2026 14:25

OlivierCoen marked this pull request as draft April 16, 2026 20:05

OlivierCoen marked this pull request as ready for review April 16, 2026 21:20

OlivierCoen force-pushed the feat/add-additional-data branch 2 times, most recently from c8da8f0 to 342fc75 Compare April 16, 2026 21:21

OlivierCoen requested a review from Cabanon April 16, 2026 21:21

OlivierCoen added 11 commits April 18, 2026 08:08

create __init__.py and add filter for warning 'UserWarning: Field nam…

8fc19e5

…e "schema" in "Resource" shadows an attribute in parent "BaseModel"'

implement functionnality to overwrite existing file

f004f92

add parameter strategy to choose overwrite behavior + implement overw…

25f5b33

…rite

Update README.md

93f3e2b

add copyright

87ebb76

apply ruff

e4c70ad

factorise loaders

4c52975

pass linters

964bb15

address first comments for review

f7698f4

simplify resouce handling by datapackage

c1d8911

add list of individual foreign keys to remove in error message

46ae7bc

OlivierCoen force-pushed the feat/add-additional-data branch from 697d7e3 to 46ae7bc Compare April 18, 2026 06:08

update addition of foreign key

5649e71

add error when foreign key exists

e7cee9b

OlivierCoen force-pushed the feat/add-additional-data branch from fe4f58d to e7cee9b Compare April 18, 2026 07:05

Cabanon reviewed Apr 18, 2026

View reviewed changes

OlivierCoen and others added 11 commits April 18, 2026 15:14

add cli for removing foreign key

5c37c8f

Merge branch 'main' into feat/add-additional-data

a0cfd11

Update resource.py

c954c33

Merge branch 'main' into feat/add-additional-data

eb5d839

add anonymised test data

cf6370d

change location of files

cca0854

add test function

ff1d50b

factorise code in test_cli.py

bbd0a76

update uv environment

ad0967b

add pytest step in CI

7a912d9

add copyright

5faedca

arnaudfnr approved these changes Apr 22, 2026

View reviewed changes

Merge pull request #79 from dataforgoodfr/feat/add-test-data

2ce1407

Add tests for datapackage

OlivierCoen merged commit 9341668 into main Apr 22, 2026
3 checks passed


		self.schema.foreignKeys.append(fk)

		def remove_foreignkey(self, fk: ForeignKey) -> None:

Conversation

OlivierCoen commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Cabanon Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Cabanon left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

OlivierCoen commented Apr 13, 2026 •

edited

Loading

Cabanon Apr 17, 2026 •

edited

Loading

Cabanon left a comment •

edited

Loading