- Layer: L5
- Implementation directory:
oqlpack/ - Status: Provisional. Java is sketched (~13 files); other languages are unstarted.
- Version: 0.1
L5 is the QL-level standard library that turns the relational tables
of L4 into the object-oriented predicate vocabulary that CodeQL queries
expect (classes like Method, MethodAccess, Type, DataFlow::Node,
etc.).
L5 has two goals:
- CodeQL substitution. A user's existing CodeQL query, after L0
syntax transformation, should compile and run against
oqlpack/. - Neurosymbolic extension surface. The library is the natural place
to expose neural augmentations as ordinary QL predicates (e.g.
LLMNamedSink,LearnedTaintStep).
oqlpack/
<lang>/
qlpack.yml // Provisional — currently absent
lib/
<lang>.qll // top-level reexport
semmle/
code/
FileSystem.qll
Location.qll
Unit.qll
<lang>/
Element.qll
Type.qll
...
config/
semmlecode.dbscheme // mirrors L4 schema for that language
Today only oqlpack/java/ exists, with the lib/semmle/code/{,java/}
hierarchy and a config/semmlecode.dbscheme.
L5 mirrors the file structure of upstream codeql/<lang>/ql/lib/ so
that a user can move a query between systems with minimal churn.
Differences from upstream are scoped to:
- The L0 syntax compromises (turbofish, etc.) — applied to the entire
oqlpack/tree. - L1 features still missing (parameterized modules, signatures) — when they land, L5 modules are migrated back toward upstream shape.
- Engine features still missing (flow extension) — until §5 of L3 ships, the dataflow-related modules in L5 are stubs that use the plain Datalog engine and document the precision deficit.
Each language directory holds a qlpack.yml (Provisional — file does not
yet exist):
name: open-codeql/java
version: 0.1.0
library: true
extractor: java
upgrades: upgrades/ # Provisional
default-suite-file: codeql-suites/code-scanning.qls
dependencies: {}The manifest is parsed by an L5 loader (TBD) that resolves imports, picks an extractor, and ties together the L4 schema with the QL library.
Every language has an Element class that sits at the root of its AST
class hierarchy. Member predicates of Element should at minimum
include:
string toString()Location getLocation()File getFile()
Each language has three trees:
- A type hierarchy (
Type→PrimitiveType,RefType, …). - An expression hierarchy (
Expr→Literal,BinaryExpr, …). - A statement hierarchy (
Stmt→IfStmt,LoopStmt, …).
Each leaf class is the characteristic predicate of a database tag (e.g.
exprs(_, kind, _) where kind = 84 is VarAccess). The mapping from
tag to class is the concern of the language's extractor (L4-X).
Cross-cutting decorations live in their own modules and are never embedded in the Element subtree.
Until L3 §5 lands, DataFlow.qll and TaintTracking.qll are minimal
modules that:
- Define
Node,Configuration,flow(Node, Node)predicates that evaluate to plain Datalog reachability. - Carry a documented precision warning at the top of the file.
- Expose a stable signature so that user queries written today continue to work when the flow engine ships.
L5 uses CodeQL's naming conventions verbatim (PascalCase classes,
camelCase predicates, Module for module names) so that user-written
queries port cleanly. The L0 turbofish compromise is the only deviation
inside .qll files.
- L5 smoke tests (
crates/ocql-e2e-tests/tests/oqlpack_java.rs): every.qllanalyzes without errors; every documented class has at least one populated extent on a fixture database. - L5 parity tests (
crates/ocql-e2e-tests/tests/java_parity.rs): for a fixed input fixture, classes likeMethod,Field,Typematch the upstream CodeQL extraction in count and (for entity classes) intoString()shape.
When vendor/codeql/ is checked in, parity tests should be elaborated
into a codeql query run harness that runs the same query under both
systems and diffs the results.
- The qlpack version (
qlpack.yml) is bumped per release ofoqlpack. - A bump in the L4 schema for a language forces a bump in that language's qlpack version.
- A new class predicate should derive its class membership from
characteristic predicates, not from an
instanceofcast inside its body. (This makes L2 lowering predictable and L3 stratification cleaner.) - A predicate that may grow large (e.g. transitive closure of a step
relation) should be marked
cachedif it is reused. - A predicate intended as a hot extension point for downstream queries should use a module-signature surface, even though L1 doesn't yet enforce it. This documents intent and unblocks signature-aware consumers later.
- Multi-language scope: which languages get oqlpack content next (C/C++ first, given the test corpus, vs. JavaScript first, given user demand)?
- Compatibility shim layer: do we ship a shim that
rewrites legacy CodeQL syntax to L0 at import time, or do we only
accept pre-transformed
.qllfiles? - Versioning: semantic versioning per qlpack, or a single open-codeql version pinned across all qlpacks?