-
Notifications
You must be signed in to change notification settings - Fork 3
Description
caufieldjh
left a comment
Many of the scripts added here are scripts for agents to use. Why not just include them in the main module and have the agent follow the skills for directions?
The stand-alone scripts don't appear to have corresponding tests (but I could be missing something since they do have CLI functions that appear to do some test-like functionality)
Also for the standalone scripts: most appear to have their own CLI functions. A centralized CLI would be easier to test, maintain, etc. etc. (especially if using click instead of hardcoded string output)
There are numerous places where importing tooling would be justified and would make the whole process much easier to understand. Examples:
mapping functions handled by linkml-map (I know I keep talking about this, but it works well)
SSSOM reading/writing by sssom-py
JSON-LD transforms by linkml loaders/generators
I don't understand why the agent scripts define hard-coded versions of mappings.
I'd support merging this PR so we have something working and can then immediately help with refactoring.
.claude/agents/scripts/d4d_builder.py
@@ -0,0 +1,319 @@
#!/usr/bin/env python3
Member
@caufieldjh
caufieldjh
4 hours ago
I do think that much of this functionality can be handled by linkml-map, particularly for things like type conversions, datetime strings, etc. If this works for now then this is fine as a description of what needs to happen in order to get from RO-Crate metadata to D4D.
@realmarcin Reply...
.claude/agents/scripts/field_prioritizer.py
AGGREGATE = "aggregate" # Aggregate statistics (prefer primary for totals)
class FieldPrioritizer:
Member
@caufieldjh
caufieldjh
4 hours ago
It would be helpful to have these field names live in a single constants file so they're easier to reference and edit if needed (especially if the RO-Crate schema changes)
@realmarcin Reply...
.claude/agents/scripts/field_prioritizer.py
return "General"
if name == "main":
Member
@caufieldjh
caufieldjh
4 hours ago
Could tests like these move to the common tests directory?
@realmarcin Reply...
.claude/agents/scripts/generate_enhanced_tsv.py
from pathlib import Path
from typing import Dict, List, Tuple
SKOS mapping type rules based on the alignment
Member
@caufieldjh
caufieldjh
4 hours ago
I don't understand why these mapping rules are stored here in this format. I also don't understand why they contain hard-coded values, especially in the exactMatch cases where the values are identical.
@realmarcin Reply...
.claude/agents/scripts/generate_interface_mapping.py
18. Format (5 fields)
19. Unmapped (14 fields)
Output format inspired by SSSOM (Simple Standard for Sharing Ontological Mappings)
Member
@caufieldjh
caufieldjh
4 hours ago
Does that mean it isn't actually SSSOM? Will it still validate if I parse it with the SSSOM python tools?
@realmarcin Reply...
1 hidden conversation
Load more…
.claude/agents/scripts/informativeness_scorer.py
if name == "main":
# Test the informativeness scorer
Member
@caufieldjh
caufieldjh
3 hours ago
Should these tests go with the others in tests?
@realmarcin Reply...
.claude/agents/scripts/mapping_loader.py
from typing import Dict, List, Optional, Set
class MappingLoader:
Member
@caufieldjh
caufieldjh
3 hours ago
If the mappings are in SSSOM format, we can use the SSSOM loader for this, no need for a bespoke loader (unless they are not actually SSSOM)
@realmarcin Reply...
.claude/agents/scripts/mapping_loader.py
if name == "main":
# Test the mapping loader
Member
@caufieldjh
caufieldjh
3 hours ago
These look less like tests and more like the actual loader functions
@realmarcin Reply...
.claude/agents/scripts/rocrate_merger.py
if name == "main":
# Test the RO-Crate merger
Member
@caufieldjh
caufieldjh
3 hours ago
Are these tests or CLI?
I suppose this project doesn't have a common CLI. Such a thing could be very useful for centralizing interface details like these.
@realmarcin Reply...
.claude/agents/scripts/rocrate_parser.py
@@ -0,0 +1,288 @@
#!/usr/bin/env python3
Member
@caufieldjh
caufieldjh
3 hours ago
Much of this can probably be done with linkml tooling, unless there are specific ways in which RO-Crate JSONLD diverges from the usual JSONLD (or something else I'm not aware of)
Potential next steps:
Implement d4d_to_rocrate() transformation (reverse direction)
Complete round-trip preservation tests
SHACL shape validation for RO-Crate profile
Performance optimization for large files