Skip to content

Feature request: ObjectScript (InterSystems IRIS) language support #462

@isc-tdyar

Description

@isc-tdyar

Feature request: ObjectScript (InterSystems IRIS) language support

What problem does this solve?

InterSystems IRIS and Caché (InterSystems IRIS Documentation) are used in healthcare IT, financial services, and large enterprise systems. Codebases built on them — including some of the largest healthcare interoperability platforms — are written in ObjectScript, a language with no existing support in CBM or most other code graph tools.

ObjectScript files (.cls, .mac, .int) are common in organizations that need code graph analysis the most: large legacy systems where understanding call chains and dependency structure is critical and manual.

What we built

We implemented full ObjectScript support covering:

Two file formats

  • UDL (.cls) — the primary class definition format: Class, Method, ClassMethod, Property, Parameter, Index, Trigger (with body text), XData, Storage, Query members all extracted as nodes.
  • MAC/INT routines (.mac/.int) — tag-based subroutine format.

Both use vendored tree-sitter-objectscript grammars compiled into the C pipeline.

Four call dispatch patterns resolved

ObjectScript has dispatch patterns that are structurally invisible to text search — all four are resolved:

  1. ##class(Pkg.Class).Method() — explicit cross-class call (standard, resolved from AST)
  2. ..Method() — relative-dot self-call. This is how ~80% of intra-class calls are written in ObjectScript. Without it, CALLS analysis is structurally incomplete. Impact on a large (~1,200-class) corpus: CALLS edges increased ~3.5× from a 2-line change.
  3. $$$MacroName — macro expansion. .inc include files define macros that expand to class names and method calls; resolved at index time from a CBMMacroTable built from project .inc files.
  4. Type inference from %New/%OpenId and return type declarations — Set obj = ##class(MyApp.Patient).%New() followed by obj.Save() resolves to MyApp.Patient.Save.

Ensemble production topology

InterSystems Ensemble (the IRIS interoperability framework) routes messages between components via string-keyed dispatch in XML configuration — invisible to normal call analysis. We added:

  • EnsembleItem nodes — one per production component
  • ROUTES_TO edges — SendRequestSync("Target", msg) resolved to the target class's message handler

Parsed from ProductionDefinition XData blocks at index time. No live IRIS instance required.

WorkMgr parallel queue dispatch

.Queue("##class(X).method") is a string literal in source. We emit CALLS edges from these sites to the target method.

Validation

Tested against a large real-world ObjectScript codebase (~1,200 classes, ~4,150 methods) for scalability and correctness:

Node type Approx. count
Class ~1,200
Method ~4,150
XData ~850
Storage ~320
EnsembleItem ~275
Index ~100
CALLS edges ~3,350
ROUTES_TO edges ~290

All existing CBM tests pass (zero regressions). New tests cover UDL class/method extraction, all four call dispatch patterns, Ensemble topology parsing, and macro expansion.

Implementation scope

Following the infra-pass pattern in CLAUDE.md:

  • internal/cbm/grammar_objectscript_udl.c / grammar_objectscript_routine.c — grammar shims
  • internal/cbm/vendored/grammars/objectscript_udl/ and objectscript_routine/ — vendored tree-sitter grammars (MIT licensed, from intersystems/tree-sitter-objectscript)
  • internal/cbm/lang_specs.cCBM_LANG_OBJECTSCRIPT_UDL and CBM_LANG_OBJECTSCRIPT_ROUTINE entries
  • internal/cbm/cbm.h — enum additions
  • internal/cbm/extract_defs.c, extract_calls.c, extract_imports.c — ObjectScript extraction logic added alongside existing language handlers
  • tests/test_extraction.c — ~30 new test cases

The oref self-call resolution (..Method()) and the macro expansion pass are implemented as targeted additions to handle_calls() — no structural changes to the pipeline.

Questions for maintainers

  1. Grammar vendoring: the objectscript_udl and objectscript_routine grammars are ~2.5MB of generated C each (comparable to existing vendored grammars). They come from intersystems/tree-sitter-objectscript (MIT licensed). Is there a preferred way to vendor these or should they follow the existing pattern in internal/cbm/vendored/grammars/?
  2. EnsembleItem node label: this is domain-specific (IRIS Interoperability). Would you prefer a more generic label like ServiceComponent or WorkflowNode, with ensemble_item as a property?
  3. PR structure: given the size, would you prefer this as one large PR or split — (a) grammar + basic extraction, (b) CALLS resolution, (c) Ensemble topology?

Happy to open the PR when you give the green light, or to share any specific files/diffs for early review.

Proposed solution

The proposed solution of full ObjectScript language support is implemented in a public fork of CBM repo.

Alternatives considered

No response

Confirmations

  • I searched existing issues and this is not a duplicate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions