Skip to content

Project-wide symbol table for cross-language cross-file policy binding #82

@boorad

Description

@boorad

Background

Issue #66 split cross-file policy-binding detection into three tiers and recommended landing them in order. PR on branch feat/66-cross-file-policy-binding ships tiers 1 and 2:

  1. Path-based bypass for in-package policy implementation files — 647c207.
  2. Go package-scoped propagation — files sharing a package directory are treated as one propagation domain. Took the OCP corpus from 65% → 100% externalized.

This issue tracks tier 3.

What's missing

Go's directory-equals-package convention made tier 2 a natural primitive. The same shape doesn't generalize:

  • Java: DI containers (Spring, Guice, Dagger) wire a policy bean in a @Configuration class and inject it across the entire module. Per-package propagation isn't enough — the binding crosses package boundaries.
  • TypeScript / JavaScript: a policy primitive exported from lib/authz.ts flows through any file that imports it. Local-imports following would handle the common case but still requires resolving import paths against the project.
  • Python: similar to TS — from authz import check somewhere, re-export through __init__.py, used elsewhere.
  • C#: namespace-based DI is closer to Java than to Go.

In each case the missing primitive is the same: a coarse cross-file symbol map — exported name → defining file → bindings flowing into it — that the propagation pass can consult.

Proposed approach

Build a project-wide symbol table during a pre-pass:

  1. Walk every source file, parse, collect:
    • Top-level exports / public symbols and the propagation edges that feed them.
    • Imports and what they refer to (best-effort path resolution per language).
  2. Run propagation across the union — same fixed-point loop, just keyed by qualified symbol instead of bare name.
  3. Each consumer file looks up its imported symbols in the table to seed its local binding set before running the existing per-file propagation.

Cost

Substantially more code than tiers 1 and 2 combined. Memory grows with project size (one parsed tree-snapshot or symbol record per file). Likely needs a cache so re-scans don't reparse everything.

Probably overkill for the externalization metric alone — the OCP test was already 100% after Go package propagation. The trigger for picking this up is the first corpus where a Java/TS DI-heavy codebase shows a significant gap.

Suggested trigger

Defer until we have a second corpus (Java DI app, or a TS service with lib/authz re-exports) that shows a measurable shortfall the cheaper tiers can't close. The downstream features the symbol table would enable — find-all-callers, cross-file finding context — would help justify the cost when there's a use case.

Repro target

A future Java/TS corpus where the per-package fallback is visibly insufficient.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions