Skip to content

Migrate legacy constants from JSON-as-data to spec + code generation #181

@rtibbles

Description

@rtibbles

Overview

This tracking issue coordinates the migration of 5 legacy Python constant modules from the old JSON-as-data approach to our modern spec + code generation system.

Background

Current (legacy) approach:

  • Constants defined in JSON files under le_utils/resources/
  • Python modules load JSON at runtime using pkgutil.get_data()
  • Manual Python constants must be kept in sync with JSON files
  • JavaScript code cannot use these constants (no JS export)
  • Tests verify Python/JSON sync (which is the pain point)

Target (modern) approach:

  • Constants defined once in JSON spec files under spec/
  • Code generation script (generate_from_specs.py) creates both Python and JavaScript files
  • Single source of truth eliminates sync issues
  • Automatic JavaScript export enables frontend use
  • Already used successfully for 8 modules (modalities, labels, schemas, etc.)

Modules to Migrate

  1. file_formats.py (FOUNDATION - must be done first)
  2. licenses.py (blocked by file_formats)
  3. content_kinds.py (blocked by file_formats, enhances generation for metadata/mappings)
  4. format_presets.py (blocked by file_formats and content_kinds)
  5. languages.py (blocked by file_formats)

Migration Strategy

file_formats is the FOUNDATION issue that:

  • Enhances generate_from_specs.py to support namedtuple-based constants
  • Establishes the spec format pattern for all other issues
  • Includes helper function generation (getformat())
  • Must be completed before the rest can proceed

content_kinds further enhances generation:

  • Adds support for metadata-driven code generation (MAPPING dict)
  • Must be completed before format_presets

licenses and languages (can be done in parallel after file_formats completes):

  • Follow the pattern established in file_formats
  • Create spec file using the namedtuple format
  • Run generation to create Python/JS files
  • Update tests to verify against spec
  • Delete old JSON resource file

format_presets must wait for both file_formats and content_kinds to complete.

All 5 modules share a common structure (namedtuples, {MODULE}LIST, choices), with progressive enhancement of the generation script.

Spec File Format

All migrated modules will use this consistent JSON structure in their spec files:

{
  "namedtuple": {
    "name": "Format",
    "fields": ["id", "mimetype"]
  },
  "constants": {
    "mp4": {"mimetype": "video/mp4"},
    "webm": {"mimetype": "video/webm"},
    "pdf": {"mimetype": "application/pdf"}
  }
}

The generation script will use this to create:

  • Python namedtuple class: class Format(namedtuple("Format", ["id", "mimetype"])): pass
  • Python LIST variable: FORMATLIST = [Format(id="mp4", mimetype="video/mp4"), ...]
  • Python constants: MP4 = "mp4", WEBM = "webm", etc.
  • Python choices tuple: choices = ((MP4, "Mp4"), (WEBM, "Webm"), ...)
  • JavaScript exports: export default { MP4: "mp4", WEBM: "webm", ... }

Each module will have different namedtuple fields appropriate to its data:

  • file_formats: ["id", "mimetype"]
  • licenses: ["id", "name", "exists", "url", "description", "custom", "copyright_holder_required"]
  • content_kinds: ["id", "name"] (plus metadata for MAPPING generation)
  • format_presets: ["id", "kind_id", "allowed_formats", ...] (10+ fields)
  • languages: ["lang_code", "lang_subcode", "readable_name", ...] (complex structure)

Post-Migration Cleanup

After all 5 modules are migrated:

  • Remove package_data={"le_utils": ["resources/*.json"]} from setup.py
  • Delete le_utils/resources/ directory
  • Update README.md to remove manual sync warnings
  • Update CHANGELOG.md with migration notes

Benefits

✅ Single source of truth (spec files)
✅ JavaScript export for all constants
✅ Eliminates manual sync requirement
✅ Consistent with modern modules
✅ Better developer experience for contributors

Disclosure

🤖 This issue was written by Claude Code, under supervision, review and final edits by @rtibbles 🤖

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Tracking Issue.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions