From 090442306bbd207fce1c9cd7ac09e21c8b4a1c3f Mon Sep 17 00:00:00 2001 From: Robin Bate Boerop Date: Wed, 13 May 2026 10:49:40 -0700 Subject: [PATCH 1/8] fix: soundness recovery for wrapped libs, ppx-runtime, virtual-impl deps MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Restores correctness for three cases the bare BFS filter mishandles: - Deps that implement a virtual library: dep-graph through them is computed elsewhere ([Dep_rules.imported_vlib_deps]); the per-module filter can miss cmi changes. Gate: fall through to glob whenever the cctx has [has_virtual_impl]. - Wrapped local libs the consumer references through the wrapper name: the ocamldep walk can't see the alias chain into the lib's [wrapped_compat] / inner modules. Reach: glob the wrapped lib's [Lib.closure]. - [ppx_runtime_libraries] introduced by [pps] in the consumer's preprocessor: their modules appear in the post-pp source which ocamldep can't see. Reach: glob their [Lib.closure]. [Module_compilation.lib_deps_for_module]: - After [can_filter], read [Compilation_context.has_virtual_impl]; if true, fall back to glob. - Read [Compilation_context.pps_runtime_libs] and include them in [direct_libs] so [Lib.closure] sees them. - Compute [wrapped_libs_referenced] from the consumer's [referenced_modules] (BFS-initial frontier — pre-cross-lib-walk). Take the [Lib.closure] of that set union [pps_runtime_libs] to get [must_glob_libs]; the classification fold sends every member to the glob path. [Modules]: - [Wrapped.entry_modules]: new function. Returns the wrapper ([lib_interface]) plus every [wrapped_compat] shim. Mirrors what [(wrapped (transition ...))] libraries expose to consumers. - [entry_modules]'s wrapped case switches to use it. Net effect: in transition wrapped libs, consumers can resolve any of the bare module names the lib exposes; this lifts a false-negative in the index that previously hid the consumer's reference to a [wrapped_compat] shim from the per-module filter. Tests (cherry-picked from #14492): - New soundness fixtures land here: [cross-lib-instrumentation-barrier.t], [cross-lib-preprocess-barrier.t], [cross-lib-pps-runtime-no-ocamldep-barrier.t], [wrapped-from-vlib-soundness.t], [wrapped-transition-soundness.t], [mixed-per-module-preprocess.t], [mixed-per-module-preprocess-precision.t], [cmx-native-tight-deps.t]. - The five pre-existing tests broken by L4 ([auto-wrapped-child-reexport.t], [ppx-runtime-libraries.t], [virtual-library.t], [wrapped-closure-precision.t], [wrapped-reexport-via-open-flag.t]) pass again — soundness recovery restores their original behavior; no test file change in #14492's diff for them. Changelog: [doc/changes/added/14492.md] lands now. Signed-off-by: Robin Bate Boerop --- doc/changes/added/14492.md | 7 + src/dune_rules/module_compilation.ml | 234 +++++++++++------- src/dune_rules/modules.ml | 8 +- .../cmx-native-tight-deps.t | 50 ++++ .../ppx/dune | 13 + .../ppx/dune-project | 3 + .../ppx/hello.ml | 1 + .../ppx/hello_ppx.ml | 48 ++++ .../cross-lib-instrumentation-barrier.t/run.t | 67 +++++ .../run.t | 80 ++++++ .../cross-lib-preprocess-barrier.t | 70 ++++++ .../run.t | 72 ++++++ .../mixed-per-module-preprocess.t/run.t | 91 +++++++ .../wrapped-from-vlib-soundness.t | 152 ++++++++++++ .../wrapped-transition-soundness.t | 105 ++++++++ 15 files changed, 906 insertions(+), 95 deletions(-) create mode 100644 doc/changes/added/14492.md create mode 100644 test/blackbox-tests/test-cases/per-module-lib-deps/cmx-native-tight-deps.t create mode 100644 test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-instrumentation-barrier.t/ppx/dune create mode 100644 test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-instrumentation-barrier.t/ppx/dune-project create mode 100644 test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-instrumentation-barrier.t/ppx/hello.ml create mode 100644 test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-instrumentation-barrier.t/ppx/hello_ppx.ml create mode 100644 test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-instrumentation-barrier.t/run.t create mode 100644 test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-pps-runtime-no-ocamldep-barrier.t/run.t create mode 100644 test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-preprocess-barrier.t create mode 100644 test/blackbox-tests/test-cases/per-module-lib-deps/mixed-per-module-preprocess-precision.t/run.t create mode 100644 test/blackbox-tests/test-cases/per-module-lib-deps/mixed-per-module-preprocess.t/run.t create mode 100644 test/blackbox-tests/test-cases/per-module-lib-deps/wrapped-from-vlib-soundness.t create mode 100644 test/blackbox-tests/test-cases/per-module-lib-deps/wrapped-transition-soundness.t diff --git a/doc/changes/added/14492.md b/doc/changes/added/14492.md new file mode 100644 index 00000000000..58fa028907e --- /dev/null +++ b/doc/changes/added/14492.md @@ -0,0 +1,7 @@ +- Filter inter-library dependencies per-module using `ocamldep` output, and + match the `-I`/`-H` include flags to that filter. When a dependency + library's cmi changes, only consumer modules that reference it get rebuilt + — eliminating spurious recompilations of unrelated sibling modules. + Applies to unwrapped dependency libraries; wrapped libraries continue to + use a glob over the whole library's objdir. (#14492, fixes #4572, + @robinbb) diff --git a/src/dune_rules/module_compilation.ml b/src/dune_rules/module_compilation.ml index ba8fa04158a..501cd31d955 100644 --- a/src/dune_rules/module_compilation.ml +++ b/src/dune_rules/module_compilation.ml @@ -85,103 +85,149 @@ let lib_deps_for_module ~cctx ~obj_dir ~for_ ~dep_graph ~opaque ~cm_kind ~ml_kin Action_builder.return (cctx_includes_for_cm_kind (), Lib_file_deps.deps_of_entries ~opaque ~cm_kind libs) else - let* lib_index = Resolve.Memo.read (Compilation_context.lib_index cctx) in - let sandbox = Compilation_context.sandbox cctx in - let sctx = Compilation_context.super_context cctx in - let* trans_deps = Dep_graph.deps_of dep_graph m in - (* Read [dep_m]'s [.ml]-side ocamldep only when its references can - propagate to the consumer: - - | [dep_m] is | [cm_kind] | [opaque] | read [.ml]? | - | ----------------------- | ----------- | -------- | ------------ | - | consumer ([m] itself) | any | any | iff [Impl] | - | trans_dep, no [.mli] | any | any | yes | - | trans_dep, has [.mli] | [Cmx] | false | yes (inline) | - | trans_dep, has [.mli] | [Cmx] | true | no | - | trans_dep, has [.mli] | [Cmi]/[Cmo] | any | no | *) - let need_impl_deps_of dep_m ~is_consumer = - if is_consumer - then ( - match ml_kind with - | Ml_kind.Impl -> true - | Intf -> false) - else - (not (Module.has dep_m ~ml_kind:Intf)) - || - match cm_kind with - | Ocaml Cmx -> not opaque - | Ocaml (Cmi | Cmo) | Melange _ -> false + let* has_virtual_impl = + Resolve.Memo.read (Compilation_context.has_virtual_impl cctx) in - let read_dep_m_raw dep_m ~is_consumer = - let* impl_deps = - if need_impl_deps_of dep_m ~is_consumer - then - Ocamldep.read_immediate_deps_raw_of ~sandbox ~sctx ~obj_dir ~ml_kind:Impl dep_m - else Action_builder.return Module_name.Set.empty + if has_virtual_impl + then + Action_builder.return + (cctx_includes_for_cm_kind (), Lib_file_deps.deps_of_entries ~opaque ~cm_kind libs) + else + let* lib_index = Resolve.Memo.read (Compilation_context.lib_index cctx) in + let sandbox = Compilation_context.sandbox cctx in + let sctx = Compilation_context.super_context cctx in + let* trans_deps = Dep_graph.deps_of dep_graph m in + (* Read [dep_m]'s [.ml]-side ocamldep only when its references can + propagate to the consumer: + + | [dep_m] is | [cm_kind] | [opaque] | read [.ml]? | + | ----------------------- | ----------- | -------- | ------------ | + | consumer ([m] itself) | any | any | iff [Impl] | + | trans_dep, no [.mli] | any | any | yes | + | trans_dep, has [.mli] | [Cmx] | false | yes (inline) | + | trans_dep, has [.mli] | [Cmx] | true | no | + | trans_dep, has [.mli] | [Cmi]/[Cmo] | any | no | *) + let need_impl_deps_of dep_m ~is_consumer = + if is_consumer + then ( + match ml_kind with + | Ml_kind.Impl -> true + | Intf -> false) + else + (not (Module.has dep_m ~ml_kind:Intf)) + || + match cm_kind with + | Ocaml Cmx -> not opaque + | Ocaml (Cmi | Cmo) | Melange _ -> false in - let+ intf_deps = - Ocamldep.read_immediate_deps_raw_of ~sandbox ~sctx ~obj_dir ~ml_kind:Intf dep_m + let read_dep_m_raw dep_m ~is_consumer = + let* impl_deps = + if need_impl_deps_of dep_m ~is_consumer + then + Ocamldep.read_immediate_deps_raw_of + ~sandbox + ~sctx + ~obj_dir + ~ml_kind:Impl + dep_m + else Action_builder.return Module_name.Set.empty + in + let+ intf_deps = + Ocamldep.read_immediate_deps_raw_of ~sandbox ~sctx ~obj_dir ~ml_kind:Intf dep_m + in + Module_name.Set.union impl_deps intf_deps in - Module_name.Set.union impl_deps intf_deps - in - let* m_raw = read_dep_m_raw m ~is_consumer:true in - let* trans_raw = - union_module_name_sets_mapped trans_deps ~f:(read_dep_m_raw ~is_consumer:false) - in - let all_raw = Module_name.Set.union m_raw trans_raw in - let* flags = Ocaml_flags.get (Compilation_context.flags cctx) mode in - let open_modules = Ocaml_flags.extract_open_module_names flags in - let referenced = Module_name.Set.union all_raw open_modules in - let { Lib_file_deps.Lib_index.tight; non_tight } = - Lib_file_deps.Lib_index.filter_libs_with_modules - lib_index - ~referenced_modules:referenced - in - let direct_libs = - List.sort_uniq ~compare:Lib.compare (Lib.Map.keys tight @ Lib.Set.to_list non_tight) - in - (* Close transitively over transparent aliases that ocamldep doesn't - report. *) - let* all_libs = Resolve.Memo.read (Lib.closure direct_libs ~linking:false ~for_) in - let* tight_set = - cross_lib_tight_set ~sandbox ~sctx ~lib_index ~initial_refs:referenced - in - (* Classify each lib in [all_libs]: - lib has [None]-entry referenced - (mixed-entry or wrapped) → glob (covers the None entries' [.cmi]s); - - lib has only [Some] entries referenced → per-module deps; - lib - unreached but tight-eligible → drop (link rule pulls it in via - [requires_link]); - lib unreached and not tight-eligible → glob. - [kept_libs] gets every lib that contributes a tight or glob dep — used - by [Compilation_context.filtered_include_flags] to scope the consumer's - [-I]/[-H] flags. *) - let { Lib_file_deps.Lib_index.tight = tight_modules; non_tight = non_tight_set } = - Lib_file_deps.Lib_index.filter_libs_with_modules - lib_index - ~referenced_modules:tight_set - in - let tight_deps, glob_libs, _kept_libs = - List.fold_left - all_libs - ~init:(Dep.Set.empty, [], Lib.Set.empty) - ~f:(fun (td, gl, kl) lib -> - if Lib.Set.mem non_tight_set lib - then td, lib :: gl, Lib.Set.add kl lib - else ( - match Lib.Map.find tight_modules lib with - | Some modules -> - ( Dep.Set.union - td - (Lib_file_deps.deps_of_entry_modules ~opaque ~cm_kind lib modules) - , gl - , Lib.Set.add kl lib ) - | None -> - if Lib_file_deps.Lib_index.is_tight_eligible lib_index lib - then td, gl, kl - else td, lib :: gl, Lib.Set.add kl lib)) - in - let glob_deps = Lib_file_deps.deps_of_entries ~opaque ~cm_kind glob_libs in - Action_builder.return - (cctx_includes_for_cm_kind (), Dep.Set.union tight_deps glob_deps) + let* m_raw = read_dep_m_raw m ~is_consumer:true in + let* trans_raw = + union_module_name_sets_mapped trans_deps ~f:(read_dep_m_raw ~is_consumer:false) + in + let all_raw = Module_name.Set.union m_raw trans_raw in + let* flags = Ocaml_flags.get (Compilation_context.flags cctx) mode in + let open_modules = Ocaml_flags.extract_open_module_names flags in + let referenced = Module_name.Set.union all_raw open_modules in + let { Lib_file_deps.Lib_index.tight; non_tight } = + Lib_file_deps.Lib_index.filter_libs_with_modules + lib_index + ~referenced_modules:referenced + in + (* [ppx_runtime_libraries] introduce module references through post-pp + source that ocamldep cannot see; carry them through to [all_libs] so + the classification fold sees them, and force them onto the glob path + via [must_glob_set]. *) + let* pps_runtime_libs = + Resolve.Memo.read (Compilation_context.pps_runtime_libs cctx) + in + (* [Lib.closure]'s memo key is order- and multiplicity-sensitive on the + input list. [pps_runtime_libs] can both contain duplicates (multiple + pps sharing a runtime dep) and overlap with [tight]/[non_tight] (a lib + referenced both syntactically and via [add_pp_runtime_deps]). + [sort_uniq] keeps the input canonical for memoization. *) + let direct_libs = + List.sort_uniq + ~compare:Lib.compare + (Lib.Map.keys tight @ Lib.Set.to_list non_tight @ pps_runtime_libs) + in + (* Close transitively over transparent aliases that ocamldep doesn't + report. *) + let* all_libs = Resolve.Memo.read (Lib.closure direct_libs ~linking:false ~for_) in + (* Wrapped-lib soundness recovery: when [referenced] names a wrapped local + lib's wrapper, the consumer may reach the lib's transitive closure + through aliases the cross-library walk cannot see; glob that closure + unconditionally. *) + let wrapped_referenced = + Lib_file_deps.Lib_index.wrapped_libs_referenced + lib_index + ~referenced_modules:referenced + in + let* must_glob_libs = + Resolve.Memo.read + (Lib.closure + (List.sort_uniq + ~compare:Lib.compare + (Lib.Set.to_list wrapped_referenced @ pps_runtime_libs)) + ~linking:false + ~for_) + in + let must_glob_set = Lib.Set.of_list must_glob_libs in + let* tight_set = + cross_lib_tight_set ~sandbox ~sctx ~lib_index ~initial_refs:referenced + in + (* Classify each lib in [all_libs]: - lib has [None]-entry referenced + (mixed-entry or wrapped) → glob (covers the None entries' [.cmi]s); - + lib has only [Some] entries referenced → per-module deps; - lib + unreached but tight-eligible → drop (link rule pulls it in via + [requires_link]); - lib unreached and not tight-eligible → glob. + [kept_libs] gets every lib that contributes a tight or glob dep — used + by [Compilation_context.filtered_include_flags] to scope the consumer's + [-I]/[-H] flags. *) + let { Lib_file_deps.Lib_index.tight = tight_modules; non_tight = non_tight_set } = + Lib_file_deps.Lib_index.filter_libs_with_modules + lib_index + ~referenced_modules:tight_set + in + let tight_deps, glob_libs, _kept_libs = + List.fold_left + all_libs + ~init:(Dep.Set.empty, [], Lib.Set.empty) + ~f:(fun (td, gl, kl) lib -> + if Lib.Set.mem must_glob_set lib || Lib.Set.mem non_tight_set lib + then td, lib :: gl, Lib.Set.add kl lib + else ( + match Lib.Map.find tight_modules lib with + | Some modules -> + ( Dep.Set.union + td + (Lib_file_deps.deps_of_entry_modules ~opaque ~cm_kind lib modules) + , gl + , Lib.Set.add kl lib ) + | None -> + if Lib_file_deps.Lib_index.is_tight_eligible lib_index lib + then td, gl, kl + else td, lib :: gl, Lib.Set.add kl lib)) + in + let glob_deps = Lib_file_deps.deps_of_entries ~opaque ~cm_kind glob_libs in + Action_builder.return + (cctx_includes_for_cm_kind (), Dep.Set.union tight_deps glob_deps) ;; let lib_cm_deps ~cctx ~cm_kind ~ml_kind ~mode m = diff --git a/src/dune_rules/modules.ml b/src/dune_rules/modules.ml index eed14fdbd33..8b5a6e07914 100644 --- a/src/dune_rules/modules.ml +++ b/src/dune_rules/modules.ml @@ -670,6 +670,12 @@ module Wrapped = struct let lib_interface t = Group.lib_interface t.group + (* Entry modules visible to consumers of a wrapped library: the wrapper + itself, plus any [wrapped_compat] shims (present for + [(wrapped (transition ...))] libraries, which expose bare module names in + addition to qualified ones). *) + let entry_modules t = lib_interface t :: Module_name.Map.values t.wrapped_compat + let fold_user_available { group; toplevel_module; _ } ~init ~f = let init = match toplevel_module with @@ -1011,7 +1017,7 @@ let entry_modules t = | Unwrapped m -> Unwrapped.entry_modules m | Wrapped m -> (* we assume this is never called for implementations *) - [ Wrapped.lib_interface m ]) + Wrapped.entry_modules m) ;; module With_vlib = struct diff --git a/test/blackbox-tests/test-cases/per-module-lib-deps/cmx-native-tight-deps.t b/test/blackbox-tests/test-cases/per-module-lib-deps/cmx-native-tight-deps.t new file mode 100644 index 00000000000..d8813ca1c1b --- /dev/null +++ b/test/blackbox-tests/test-cases/per-module-lib-deps/cmx-native-tight-deps.t @@ -0,0 +1,50 @@ +A consumer of a tight-eligible local library, compiled in native +mode under [--profile release], exercises the [want_cmx = true] +branch of [Lib_file_deps.deps_of_entry_modules] — emitting +per-module [.cmx] deps in addition to per-module [.cmi] deps. + +The default [dev] profile sets [opaque = true], which makes +[want_cmx = false] for local libs and the cmx branch is skipped +entirely. Switching to [release] flips [opaque] off, so the +consumer's native compile takes the per-module path with +[cm_kind = Ocaml Cmx] and the function records cmx-file deps. +Building [consumer.cmxa] explicitly forces the native compile, +unlike [@check] which only materialises [.cmi]/[.cmt] artifacts. + + $ cat > dune-project < (lang dune 3.23) + > EOF + +[dep_lib]: an unwrapped tight-eligible library with two modules, +one of them with both [.ml] and [.mli]: + + $ mkdir dep_lib + $ cat > dep_lib/dune < (library (name dep_lib) (wrapped false)) + > EOF + $ cat > dep_lib/m.ml < let v = 1 + > EOF + $ cat > dep_lib/m.mli < val v : int + > EOF + $ cat > dep_lib/n.ml < let _ = () + > EOF + +[consumer]: a library that references [M] from [dep_lib]: + + $ mkdir consumer + $ cat > consumer/dune < (library + > (name consumer) + > (libraries dep_lib)) + > EOF + $ cat > consumer/c.ml < let _ = M.v + > EOF + +Build the consumer's [.cmxa] under release profile, forcing +native compile of every module: + + $ dune build --profile release consumer/consumer.cmxa diff --git a/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-instrumentation-barrier.t/ppx/dune b/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-instrumentation-barrier.t/ppx/dune new file mode 100644 index 00000000000..495828633bd --- /dev/null +++ b/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-instrumentation-barrier.t/ppx/dune @@ -0,0 +1,13 @@ +(library + (name hello_ppx) + (public_name hello.ppx) + (kind ppx_rewriter) + (ppx_runtime_libraries hello) + (libraries ppxlib) + (modules hello_ppx)) + +(library + (public_name hello) + (modules hello) + (instrumentation.backend + (ppx hello.ppx))) diff --git a/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-instrumentation-barrier.t/ppx/dune-project b/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-instrumentation-barrier.t/ppx/dune-project new file mode 100644 index 00000000000..568df953e22 --- /dev/null +++ b/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-instrumentation-barrier.t/ppx/dune-project @@ -0,0 +1,3 @@ +(lang dune 2.7) + +(package (name hello)) diff --git a/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-instrumentation-barrier.t/ppx/hello.ml b/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-instrumentation-barrier.t/ppx/hello.ml new file mode 100644 index 00000000000..7f63d09714f --- /dev/null +++ b/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-instrumentation-barrier.t/ppx/hello.ml @@ -0,0 +1 @@ +let hello s = print_endline (Printf.sprintf "Hello from %s!" s) diff --git a/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-instrumentation-barrier.t/ppx/hello_ppx.ml b/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-instrumentation-barrier.t/ppx/hello_ppx.ml new file mode 100644 index 00000000000..ffcd0b355a0 --- /dev/null +++ b/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-instrumentation-barrier.t/ppx/hello_ppx.ml @@ -0,0 +1,48 @@ +open Ast_helper + +let place = ref None +let file = ref None + +let read_file () = + match !file with + | None -> "" + | Some s -> + let ic = open_in s in + (match input_line ic with + | exception End_of_file -> + close_in ic; + "" + | s -> + close_in ic; + s) +;; + +let impl str = + let arg = + match !place with + | None -> Exp.ident (Location.mknoloc (Longident.Lident "__MODULE__")) + | Some s -> Exp.constant (Const.string (Printf.sprintf "%s (%s)" s (read_file ()))) + in + Str.eval + (Exp.apply + (Exp.ident + (Location.mknoloc + (Longident.Ldot + ( { txt = Longident.Lident "Hello"; loc = Location.none } + , { txt = "hello"; loc = Location.none } )))) + [ Nolabel, arg ]) + :: str +;; + +let () = + Ppxlib.Driver.add_arg + "-place" + (Arg.String (fun s -> place := Some s)) + ~doc:"PLACE where to say hello from"; + Ppxlib.Driver.add_arg + "-file" + (Arg.String (fun s -> file := Some s)) + ~doc:"Add info from file" +;; + +let () = Ppxlib.Driver.register_transformation_using_ocaml_current_ast ~impl "hello" diff --git a/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-instrumentation-barrier.t/run.t b/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-instrumentation-barrier.t/run.t new file mode 100644 index 00000000000..d7f4d5cf906 --- /dev/null +++ b/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-instrumentation-barrier.t/run.t @@ -0,0 +1,67 @@ +Reproducer for the bug introduced by storing post-pp [Module.t] in +[Lib_index]: a library declares [(instrumentation (backend X))] +without any [(preprocess ...)]. Instrumentation is *disabled* at +build time (no [--instrument-with] flag), so dune does not produce +[.pp.ml] files. But [Lib_info.preprocess] still returns +[Pps { pps = [Instrumentation_backend X]; ... }] (the static +config), and [build_lib_index] currently maps that to +[Module.pped (Module.ml_source m)] — i.e. paths like [foo.pp.ml] — +which dune then can't find a rule for when the cross-library +walker reads ocamldep on those modules. + + $ cat >dune-project < (lang dune 3.0) + > EOF + +[middle] declares an instrumentation backend but no +[(preprocess ...)]. With instrumentation disabled at build time, +no [.pp.ml] files are produced. + + $ mkdir leaf + $ cat > leaf/dune < (library (name leaf) (wrapped false)) + > EOF + $ cat > leaf/leaf.ml < type t = int + > let zero : t = 0 + > EOF + + $ mkdir middle + $ cat > middle/dune < (library + > (name middle) + > (wrapped false) + > (libraries leaf) + > (instrumentation (backend hello))) + > EOF + $ cat > middle/middle.mli < val identity : Leaf.t -> Leaf.t + > EOF + $ cat > middle/middle.ml < let identity x = x + > EOF + + $ mkdir consumer + $ cat > consumer/dune < (executable (name consumer) (libraries middle)) + > EOF + $ cat > consumer/consumer.ml < let _ = Middle.identity 0 + > EOF + +Build without [--instrument-with]: the cross-library walker +should NOT demand [middle.pp.ml] / [middle.pp.mli]. With the +buggy mapping that demands them anyway, dune reports +"No rule found for middle.pp.ml". + + $ dune build consumer/consumer.exe + +The cross-library walker should produce per-module narrow deps +on the consumer's compile rule, not a wide glob over [middle]'s +objdir. Assert no glob entry over [middle]'s objdir. + + $ dune rules --root . --format=json --deps _build/default/consumer/.consumer.eobjs/byte/dune__exe__Consumer.cmo > deps.json + + $ jq -r 'include "dune"; .[] | depsGlobs + > | select(.dir | endswith("middle/.middle.objs/byte")) + > | .dir + " " + .predicate' < deps.json diff --git a/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-pps-runtime-no-ocamldep-barrier.t/run.t b/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-pps-runtime-no-ocamldep-barrier.t/run.t new file mode 100644 index 00000000000..6d70add4c63 --- /dev/null +++ b/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-pps-runtime-no-ocamldep-barrier.t/run.t @@ -0,0 +1,80 @@ +Regression guard for [build_lib_index]'s [no_ocamldep_lib] +classification: it must mirror [Dep_rules.skip_ocamldep]'s +[has_library_deps] gating, which uses the *resolved* requires +(including pps runtime libs added via [add_pp_runtime_deps]), +not the static [(libraries ...)] field. A single-module local +lib with no [(libraries ...)] but with [(preprocess (pps X))] +has non-empty resolved requires (X's runtime libs), so its +ocamldep is run and classifying it as [no_ocamldep] would cause +the cross-library walk to skip its post-pp ocamldep output — +dropping transitive [.cmi] deps from the consumer's compile rule +and breaking sandboxed builds. + + $ cat > dune-project < (lang dune 3.0) + > EOF + +[hello] is the ppx runtime lib (single-module unwrapped, no library +deps of its own — correctly classified as no-ocamldep). It exposes +[Hello.t]: + + $ mkdir hello + $ cat > hello/dune < (library (name hello) (wrapped false)) + > EOF + $ cat > hello/hello.ml < type t = int + > let zero : t = 0 + > EOF + +[hello_ppx] is a no-op ppx_rewriter declaring [hello] as its +[ppx_runtime_libraries]: + + $ mkdir hello_ppx + $ cat > hello_ppx/dune < (library + > (name hello_ppx) + > (kind ppx_rewriter) + > (ppx_runtime_libraries hello) + > (libraries ppxlib)) + > EOF + $ cat > hello_ppx/hello_ppx.ml < let () = + > Ppxlib.Driver.register_transformation_using_ocaml_current_ast + > ~impl:(fun s -> s) "noop" + > EOF + +[middle] is single-module, has no [(libraries ...)], and uses +[(preprocess (pps hello_ppx))]. Its interface mentions [Hello.t]: + + $ mkdir middle + $ cat > middle/dune < (library + > (name middle) + > (wrapped false) + > (preprocess (pps hello_ppx))) + > EOF + $ cat > middle/middle.mli < val helper : Hello.t -> Hello.t + > EOF + $ cat > middle/middle.ml < let helper x = x + > EOF + +[consumer] depends on [middle]; references [Middle.helper] but +never names [Hello] in source: + + $ mkdir consumer + $ cat > consumer/dune < (executable (name consumer) (libraries middle)) + > EOF + $ cat > consumer/consumer.ml < let _ = Middle.helper 0 + > EOF + +Sandbox-forced build of the consumer succeeds: the walker visits +[middle]'s post-pp ocamldep, surfaces [Hello] in the tight set, +and the classification fold keeps [hello]'s [.cmi] as a compile- +rule dep so it is pulled into the sandbox. + + $ dune build --sandbox=copy consumer/consumer.exe diff --git a/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-preprocess-barrier.t b/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-preprocess-barrier.t new file mode 100644 index 00000000000..7b8c4586c49 --- /dev/null +++ b/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-preprocess-barrier.t @@ -0,0 +1,70 @@ +Regression guard for cross-library transitive [.cmi] reads +through a preprocessed intermediate. A consumer references a +module from a preprocessed library; that preprocessed module's +interface mentions a type from a leaf library that the consumer +never names syntactically. The consumer's compile must depend on +the leaf's [.cmi] so the OCaml compiler can resolve the type +reference imported via the intermediate's [.cmi]. + +[build_lib_index] stores each entry's *post-pp* [Module.t] +(mirroring [Pp_spec.pped_modules_map]), so [cross_lib_tight_set] +reads ocamldep on the dep lib's [.pp.ml] artifact. The walker +therefore reaches the leaf's name through the intermediate's +interface, the classification fold places the leaf in the +consumer's tight per-module deps, and forced sandboxing succeeds +because [leaf]'s [.cmi] is in the sandbox. + + $ cat > dune-project < (lang dune 3.23) + > EOF + +[leaf] exposes [Leaf.t]: + + $ mkdir leaf + $ cat > leaf/dune < (library (name leaf) (wrapped false)) + > EOF + $ cat > leaf/leaf.ml < type t = int + > let zero : t = 0 + > EOF + +[middle] depends on [leaf]; its single module is preprocessed via +[(preprocess (action ...))], and its interface mentions [Leaf.t]: + + $ mkdir middle + $ cat > middle/dune < (library + > (name middle) + > (wrapped false) + > (libraries leaf) + > (preprocess (action (run cat %{input-file})))) + > EOF + $ cat > middle/middle.mli < val identity : Leaf.t -> Leaf.t + > EOF + $ cat > middle/middle.ml < let identity x = x + > EOF + +[consumer] depends on [middle]; references [Middle.identity] but +never names [Leaf] in source: + + $ mkdir consumer + $ cat > consumer/dune < (executable (name consumer) (libraries middle)) + > EOF +Applying [Middle.identity] to a literal [0] forces the compiler +to unify [int] with [Leaf.t], which requires loading [Leaf.cmi]. +ocamldep on this source still reports only [Middle] as referenced, +since [Leaf] is not named. + + $ cat > consumer/consumer.ml < let _ = Middle.identity 0 + > EOF + +Sandbox-forced build of the consumer succeeds: [leaf]'s [.cmi] +is declared as a compile-rule dep through the per-module +filter's cross-lib walk and is therefore present in the sandbox. + + $ dune build --sandbox=copy consumer/consumer.exe diff --git a/test/blackbox-tests/test-cases/per-module-lib-deps/mixed-per-module-preprocess-precision.t/run.t b/test/blackbox-tests/test-cases/per-module-lib-deps/mixed-per-module-preprocess-precision.t/run.t new file mode 100644 index 00000000000..f7e2e590a8a --- /dev/null +++ b/test/blackbox-tests/test-cases/per-module-lib-deps/mixed-per-module-preprocess-precision.t/run.t @@ -0,0 +1,72 @@ +Precision regression guard for the per-pair tight-eligibility +in [Lib_index]: when a consumer references only the [Some]- +entry modules of a mixed-pp lib, the [None]-entry modules must +NOT be pulled into the consumer's compile-rule deps. + +We assert precision by giving [b] (the [None]-entry module) an +unresolvable identifier and leaving [a]'s source independent of +[b]. If the consumer's compile rule listed [b.cmi] as a dep, +dune would try to compile [b.ml] and fail. Building the +consumer's [.cmo] (compile only, no link) succeeds → [b] is +correctly excluded. + + $ cat > dune-project < (lang dune 3.0) + > EOF + +A no-op staged ppx (identical to the soundness reproducer). + + $ mkdir ppx + $ cat > ppx/dune < (library + > (name ppx_noop) + > (kind ppx_rewriter) + > (ppx.driver (main Ppx_noop.main))) + > EOF + $ cat > ppx/ppx_noop.ml < let main () = + > let n = Array.length Sys.argv in + > if n < 2 then assert false; + > let input = Sys.argv.(n - 2) in + > let output = Sys.argv.(n - 1) in + > Filename.quote_command "cp" [input; output] + > |> Sys.command + > |> exit + > EOF + +[mylib]: [a] uses default (Some entry), [b] uses staged_pps +(None entry). [a]'s source is independent of [b]; [b.ml] +contains an unresolvable identifier so any attempt to compile +it will fail. + + $ mkdir mylib + $ cat > mylib/dune < (library + > (name mylib) + > (wrapped false) + > (preprocess (per_module ((staged_pps ppx_noop) b)))) + > EOF + $ cat > mylib/a.ml < let answer = 42 + > EOF + $ cat > mylib/b.ml < let bar = no_such_thing + > EOF + +[consumer] references only [A]: + + $ mkdir consumer + $ cat > consumer/dune < (executable (name consumer) (libraries mylib)) + > EOF + $ cat > consumer/consumer.ml < let () = print_int A.answer + > EOF + +Build only the consumer's [.cmo] (compile rule, not link). If +per-pair tight-eligibility holds, the consumer depends on +[a.cmi] alone and the build succeeds. If the fix regressed and +[mylib] got globbed for the consumer, dune would attempt to +compile [b.ml] and fail. + + $ dune build consumer/.consumer.eobjs/byte/dune__exe__Consumer.cmo diff --git a/test/blackbox-tests/test-cases/per-module-lib-deps/mixed-per-module-preprocess.t/run.t b/test/blackbox-tests/test-cases/per-module-lib-deps/mixed-per-module-preprocess.t/run.t new file mode 100644 index 00000000000..b0d598d129c --- /dev/null +++ b/test/blackbox-tests/test-cases/per-module-lib-deps/mixed-per-module-preprocess.t/run.t @@ -0,0 +1,91 @@ +Reproducer for the soundness bug Copilot flagged in +[Lib_index.create]: an unwrapped lib's [tight_eligible] +membership is set on "any entry has [Some _]", which under +per-module preprocessing can be true while *some* of the lib's +entries have [m_opt = None]. The classification fold then +silently drops the [None] entries' modules from the consumer's +compile-rule deps when those modules are reached via +[cross_lib_tight_set]'s BFS expansion. + +Setup. [mylib] is unwrapped with two modules. [a] uses default +(no preprocessing) — entry [(a, mylib, Some _)]. [b] uses +[(staged_pps ...)] — entry [(b, mylib, None)], since no +[.pp.ml] is statically known for staged pps. +[a]'s source references [B], so [cross_lib_tight_set]'s BFS +walks from [a]'s ocamldep into [B]. The consumer references +only [A], so [wrapped_libs_referenced] (which is keyed off the +consumer's direct ocamldep, [referenced]) does NOT add [mylib] +to [must_glob_set]. The mixed-entry bug therefore reaches the +classification fold via [tight_set], and the fold drops [B] +because [tight_modules_per_lib] skips its [None] entry. + + $ cat > dune-project < (lang dune 3.0) + > EOF + +A no-op staged ppx, modeled on +test/blackbox-tests/test-cases/staged-pps-relative-directory-gh8158.t. +The driver copies its input verbatim, so the staged stage's +[.pp.ml] equals the input source. + + $ mkdir ppx + $ cat > ppx/dune < (library + > (name ppx_noop) + > (kind ppx_rewriter) + > (ppx.driver (main Ppx_noop.main))) + > EOF + $ cat > ppx/ppx_noop.ml < let main () = + > let n = Array.length Sys.argv in + > if n < 2 then assert false; + > let input = Sys.argv.(n - 2) in + > let output = Sys.argv.(n - 1) in + > Filename.quote_command "cp" [input; output] + > |> Sys.command + > |> exit + > EOF + +[mylib] is unwrapped with two modules; only [b] is staged. +[a]'s interface mentions [B.t], so [cross_lib_tight_set]'s BFS +walks from [a]'s ocamldep into [b]'s name AND the consumer's +own compile genuinely loads [b.cmi] (otherwise the bug wouldn't +manifest at the consumer's compile site). + + $ mkdir mylib + $ cat > mylib/dune < (library + > (name mylib) + > (wrapped false) + > (preprocess (per_module ((staged_pps ppx_noop) b)))) + > EOF + $ cat > mylib/a.mli < val identity : B.t -> B.t + > EOF + $ cat > mylib/a.ml < let identity (x : B.t) = x + > EOF + $ cat > mylib/b.ml < type t = int + > let zero : t = 0 + > EOF + +[consumer] references only [A] in source — never names [B] — +but its call to [A.identity] forces ocamlc to load [B.cmi] to +resolve [B.t]. + + $ mkdir consumer + $ cat > consumer/dune < (executable (name consumer) (libraries mylib)) + > EOF + $ cat > consumer/consumer.ml < let _ = A.identity 0 + > EOF + +Sandboxed build forces the missing-dep bug to surface +deterministically: the sandbox is populated only from the +compile-rule's declared deps, so a missing dep on +[mylib/.mylib.objs/byte/b.cmi] fails the build with a +"no such file" error rather than a silent race. + + $ dune build --sandbox=copy consumer/consumer.exe diff --git a/test/blackbox-tests/test-cases/per-module-lib-deps/wrapped-from-vlib-soundness.t b/test/blackbox-tests/test-cases/per-module-lib-deps/wrapped-from-vlib-soundness.t new file mode 100644 index 00000000000..a2f0a56edb6 --- /dev/null +++ b/test/blackbox-tests/test-cases/per-module-lib-deps/wrapped-from-vlib-soundness.t @@ -0,0 +1,152 @@ +Regression guards for soundness, with a forward-looking pin on +current behaviour, all when a consumer depends on an implementation +that inherits its `(wrapped ...)` setting from the virtual library +(the implementation does not redeclare `wrapped`). + +`vlib` declares `(wrapped true)` with a virtual module `virt_iface` +and concrete siblings `helper` and `unused`. `impl` implements +`vlib` without redeclaring `wrapped`. The executable `main` depends +on `impl` and reaches `virt_iface` and `helper` via the vlib +wrapper: `Vlib.Virt_iface.x` and `Vlib.Helper.h`. `main` does not +reference `unused`. + +The implementation's closure includes `virt_iface`'s impl and +`vlib`'s concrete modules. `main`'s compile rule must therefore +cover `vlib__Virt_iface.cmi` and `vlib__Helper.cmi`. Any future +per-module narrowing that treats inherited-wrapped libraries as +ordinary local libraries must still keep that coverage; otherwise +a change to either module's interface fails to invalidate `main`. + + $ make_dune_project 3.24 + + $ mkdir vlib impl consumer + + $ cat > vlib/dune < (library + > (name vlib) + > (wrapped true) + > (virtual_modules virt_iface)) + > EOF + $ cat > vlib/virt_iface.mli < val x : string + > EOF + $ cat > vlib/helper.ml < let h = "h" + > let z = 42 + > EOF + $ cat > vlib/helper.mli < val h : string + > EOF + $ cat > vlib/unused.ml < let u = "u" + > let w = "w" + > EOF + $ cat > vlib/unused.mli < val u : string + > EOF + + $ cat > impl/dune < (library + > (name impl) + > (implements vlib)) + > EOF + $ cat > impl/virt_iface.ml < let x = "impl" + > let z = 42 + > EOF + + $ cat > consumer/dune < (executable + > (name main) + > (libraries impl)) + > EOF + $ cat > consumer/main.ml < let () = print_string Vlib.Virt_iface.x; print_string Vlib.Helper.h + > EOF + + $ dune build @check + +Glob coverage on `main.cmi`'s compile rule: + + $ dune rules --root . --format=json --deps '%{cmi:consumer/main}' | + > jq -r 'include "dune"; .[] | depsGlobs | "\(.dir_kind) \(.dir) \(.predicate)"' + In_build_dir _build/default/impl/.impl.objs/byte *.cmi + +Case 1 (soundness): edit `helper`'s interface to expose `z`. `main` +reaches `helper` through the vlib wrapper; the compile-rule deps +must cover `vlib__Helper.cmi`, so `main` rebuilds: + + $ cat > vlib/helper.mli < val h : string + > val z : int + > EOF + $ dune build @check + $ dune trace cat | jq -s 'include "dune"; [.[] | targetsMatchingFilter(test("consumer/\\.main\\.eobjs/byte/"))]' + [ + { + "target_files": [ + "_build/default/consumer/.main.eobjs/byte/dune__exe__Main.cmi", + "_build/default/consumer/.main.eobjs/byte/dune__exe__Main.cmti" + ] + }, + { + "target_files": [ + "_build/default/consumer/.main.eobjs/byte/dune__exe__Main.cmo", + "_build/default/consumer/.main.eobjs/byte/dune__exe__Main.cmt" + ] + } + ] + +Case 2 (soundness): edit `virt_iface`'s interface to expose `z`. +`main` reaches `virt_iface` through the vlib wrapper; the compile- +rule deps must cover `vlib__Virt_iface.cmi`, so `main` rebuilds: + + $ cat > vlib/virt_iface.mli < val x : string + > val z : int + > EOF + $ dune build @check + $ dune trace cat | jq -s 'include "dune"; [.[] | targetsMatchingFilter(test("consumer/\\.main\\.eobjs/byte/"))]' + [ + { + "target_files": [ + "_build/default/consumer/.main.eobjs/byte/dune__exe__Main.cmi", + "_build/default/consumer/.main.eobjs/byte/dune__exe__Main.cmti" + ] + }, + { + "target_files": [ + "_build/default/consumer/.main.eobjs/byte/dune__exe__Main.cmo", + "_build/default/consumer/.main.eobjs/byte/dune__exe__Main.cmt" + ] + } + ] + +Case 3 (forward-looking pin on current behaviour): edit `unused`'s +interface to expose `w`. `main` does not reference `unused`, so +under a future per-module narrowing this edit would not rebuild +`main`. Today, the per-library filter rebuilds `main` anyway +because `impl`'s `.cmi` glob covers every module of the +implementation's closure. Promote when per-module narrowing within +a library lands. + + $ cat > vlib/unused.mli < val u : string + > val w : string + > EOF + $ dune build @check + $ dune trace cat | jq -s 'include "dune"; [.[] | targetsMatchingFilter(test("consumer/\\.main\\.eobjs/byte/"))]' + [ + { + "target_files": [ + "_build/default/consumer/.main.eobjs/byte/dune__exe__Main.cmi", + "_build/default/consumer/.main.eobjs/byte/dune__exe__Main.cmti" + ] + }, + { + "target_files": [ + "_build/default/consumer/.main.eobjs/byte/dune__exe__Main.cmo", + "_build/default/consumer/.main.eobjs/byte/dune__exe__Main.cmt" + ] + } + ] diff --git a/test/blackbox-tests/test-cases/per-module-lib-deps/wrapped-transition-soundness.t b/test/blackbox-tests/test-cases/per-module-lib-deps/wrapped-transition-soundness.t new file mode 100644 index 00000000000..08b4118ba01 --- /dev/null +++ b/test/blackbox-tests/test-cases/per-module-lib-deps/wrapped-transition-soundness.t @@ -0,0 +1,105 @@ +Regression guard for wrapped-library soundness, with a forward- +looking pin on current behaviour, both under +`(wrapped (transition ...))`. + +`wrapped_lib` uses `(wrapped (transition ...))` with inner modules +`inner_a` and `inner_b` plus a hand-written wrapper module that +aliases both. The library `consumer` depends on `wrapped_lib` and +writes `Wrapped_lib.Inner_a.x` — naming the wrapper and one inner +module but not the other. + +The wrapper's `.cmi` only carries an alias name; the type lives in +the inner module's mangled artifact `wrapped_lib__Inner_a.cmi` (not +the `inner_a.cmi` transition shim). So `consumer`'s compile rule +must cover `wrapped_lib__Inner_a.cmi` alongside `wrapped_lib.cmi`. +Any future per-module narrowing of compile-rule deps must keep that +coverage; otherwise a change to `inner_a`'s interface fails to +invalidate `consumer`. + + $ make_dune_project 3.24 + + $ cat > dune < (library + > (name wrapped_lib) + > (wrapped (transition "use Wrapped_lib.X instead of X")) + > (modules wrapped_lib inner_a inner_b)) + > (library + > (name consumer) + > (modules consumer) + > (libraries wrapped_lib)) + > EOF + + $ cat > wrapped_lib.ml < module Inner_a = Inner_a + > module Inner_b = Inner_b + > EOF + $ cat > inner_a.ml < let x = "a" + > let z = 42 + > EOF + $ cat > inner_a.mli < val x : string + > EOF + $ cat > inner_b.ml < let y = "b" + > let w = "w" + > EOF + $ cat > inner_b.mli < val y : string + > EOF + + $ cat > consumer.ml < let _ = Wrapped_lib.Inner_a.x + > EOF + + $ dune build @check + +Glob coverage on `consumer.cmi`'s compile rule: + + $ dune rules --root . --format=json --deps '%{cmi:consumer}' | + > jq -r 'include "dune"; .[] | depsGlobs | "\(.dir_kind) \(.dir) \(.predicate)"' + In_build_dir _build/default/.wrapped_lib.objs/byte *.cmi + +Case 1 (soundness): edit `inner_a`'s interface to expose `z`. +`consumer` reaches `inner_a` through the wrapper `Wrapped_lib`; +the compile-rule deps must cover `wrapped_lib__Inner_a.cmi`, so +`consumer` rebuilds: + + $ cat > inner_a.mli < val x : string + > val z : int + > EOF + $ dune build @check + $ dune trace cat | jq -s 'include "dune"; [.[] | targetsMatchingFilter(test("\\.consumer\\.objs/byte/consumer\\.cm"))]' + [ + { + "target_files": [ + "_build/default/.consumer.objs/byte/consumer.cmi", + "_build/default/.consumer.objs/byte/consumer.cmo", + "_build/default/.consumer.objs/byte/consumer.cmt" + ] + } + ] + +Case 2 (forward-looking pin on current behaviour): edit `inner_b`'s +interface to expose `w`. `consumer` does not reference `inner_b`, +so under a future per-module narrowing this edit would not rebuild +`consumer`. Today, the per-library filter rebuilds `consumer` +anyway because `wrapped_lib`'s `.cmi` glob covers every module. +Promote when per-module narrowing within a library lands. + + $ cat > inner_b.mli < val y : string + > val w : string + > EOF + $ dune build @check + $ dune trace cat | jq -s 'include "dune"; [.[] | targetsMatchingFilter(test("\\.consumer\\.objs/byte/consumer\\.cm"))]' + [ + { + "target_files": [ + "_build/default/.consumer.objs/byte/consumer.cmi", + "_build/default/.consumer.objs/byte/consumer.cmo", + "_build/default/.consumer.objs/byte/consumer.cmt" + ] + } + ] From a75f81a037fa2d34ccbbf6a2725a56e99297a790 Mon Sep 17 00:00:00 2001 From: Robin Bate Boerop Date: Sun, 17 May 2026 16:39:09 -0700 Subject: [PATCH 2/8] fix: BFS-expand through dep libs' stanza -open modules Closes a soundness gap in the per-module narrowing pipeline missed by the wrapped / ppx-runtime / virtual-impl recoveries: when a dependency library's stanza injects [-open M] via [(flags (...))], its source can reference [M]'s identifiers without naming [M], and ocamldep emits no token to drive the cross-library walker. Reported by RyanJamesStewart on #14517 with this fixture: an unwrapped [middle] depending on unwrapped [prelude] with [(flags (:standard -open Prelude))], exposing [val pick : unit -> color] (where [color] resolves through the open to [Prelude.color]). The consumer pattern-matches the result against bare constructors. The compile genuinely needs [prelude.cmi] to resolve the constructors; the BFS over [ocamldep -modules] cannot reach [prelude] (no syntactic [Prelude] token on either side); the three existing recoveries do not catch it ([prelude] is not wrapped, not a ppx-runtime lib, not a virtual-impl). [Module_compilation.cross_lib_tight_set]: - Add [~mode] (the consumer's compile mode) so we can expand a dep lib's stanza flags via [Ocaml_flags.get]. - Extend [read_entry_deps] to compute [read_stanza_opens] for the visited lib and union the result into the BFS frontier. Localised in the BFS rather than the initial-frontier computation so the reachability rule reads as: a module is reachable iff the consumer references it, or some reached module's ocamldep names it, or some reached module's owning lib stanza-opens it. Returns empty for [Spec.standard] (short-circuits external libs). [Lib_info]: - Add [stanza_flags : Dune_lang.Ocaml_flags.Spec.t] field plus accessor. Local libs carry their stanza's [conf.buildable.flags]; external libs ([findlib], [dune_package]) carry [Spec.standard]. Regression test: [cross-lib-open-flag-barrier.t]. Fails on L4 and L5 head before this patch (Unbound constructor Green under [--sandbox=copy]); passes after. --- src/dune_rules/dune_package.ml | 1 + src/dune_rules/findlib.ml | 1 + src/dune_rules/lib_info.ml | 5 ++ src/dune_rules/lib_info.mli | 7 +++ src/dune_rules/module_compilation.ml | 30 +++++++-- src/dune_rules/stanzas/library.ml | 1 + .../cross-lib-open-flag-barrier.t | 61 +++++++++++++++++++ 7 files changed, 101 insertions(+), 5 deletions(-) create mode 100644 test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-open-flag-barrier.t diff --git a/src/dune_rules/dune_package.ml b/src/dune_rules/dune_package.ml index 9e0cf07c7de..d2e0de5a9ac 100644 --- a/src/dune_rules/dune_package.ml +++ b/src/dune_rules/dune_package.ml @@ -336,6 +336,7 @@ module Lib = struct ~jsoo_runtime ~wasmoo_runtime ~preprocess + ~stanza_flags:Dune_lang.Ocaml_flags.Spec.standard ~enabled ~virtual_deps ~dune_version diff --git a/src/dune_rules/findlib.ml b/src/dune_rules/findlib.ml index f6be5b83aa8..f760c3caa82 100644 --- a/src/dune_rules/findlib.ml +++ b/src/dune_rules/findlib.ml @@ -268,6 +268,7 @@ let to_dune_library (t : Findlib.Package.t) ~dir_contents ~ext_lib ~external_loc ~jsoo_runtime ~wasmoo_runtime ~preprocess + ~stanza_flags:Dune_lang.Ocaml_flags.Spec.standard ~enabled ~virtual_deps ~dune_version diff --git a/src/dune_rules/lib_info.ml b/src/dune_rules/lib_info.ml index 9b6ecd8dbf9..947872b7c99 100644 --- a/src/dune_rules/lib_info.ml +++ b/src/dune_rules/lib_info.ml @@ -323,6 +323,7 @@ type 'path t = ; allow_unused_libraries : (Loc.t * Lib_name.t) list ; preprocess : Preprocess.With_instrumentation.t Preprocess.Per_module.t Compilation_mode.By_mode.t + ; stanza_flags : Dune_lang.Ocaml_flags.Spec.t ; enabled : Enabled_status.t Memo.t ; virtual_deps : (Loc.t * Lib_name.t) list ; dune_version : Dune_lang.Syntax.Version.t option @@ -354,6 +355,7 @@ let parameters t = t.parameters let requires t ~for_ = Compilation_mode.By_mode.get t.requires ~for_ let requires_by_mode t = t.requires let preprocess t ~for_ = Compilation_mode.By_mode.get t.preprocess ~for_ +let stanza_flags t = t.stanza_flags let ppx_runtime_deps t = t.ppx_runtime_deps let allow_unused_libraries t = t.allow_unused_libraries let sub_systems t = t.sub_systems @@ -435,6 +437,7 @@ let create ~jsoo_runtime ~wasmoo_runtime ~preprocess + ~stanza_flags ~enabled ~virtual_deps ~dune_version @@ -476,6 +479,7 @@ let create ; jsoo_runtime ; wasmoo_runtime ; preprocess + ; stanza_flags ; enabled ; virtual_deps ; dune_version @@ -575,6 +579,7 @@ let to_dyn ; jsoo_runtime ; wasmoo_runtime ; preprocess = _ + ; stanza_flags = _ ; enabled = _ ; virtual_deps ; dune_version diff --git a/src/dune_rules/lib_info.mli b/src/dune_rules/lib_info.mli index eb5113b9945..f3972d2613d 100644 --- a/src/dune_rules/lib_info.mli +++ b/src/dune_rules/lib_info.mli @@ -170,6 +170,12 @@ val preprocess -> for_:Compilation_mode.t -> Preprocess.With_instrumentation.t Preprocess.Per_module.t +(** Unexpanded [(flags ...)] from the library's stanza. [Spec.standard] for + libraries assembled from META / dune-package files. The per-module narrowing + pipeline uses this to recover [-open]-induced cross-library [.cmi] reads + that ocamldep cannot see. *) +val stanza_flags : _ t -> Dune_lang.Ocaml_flags.Spec.t + val sub_systems : _ t -> Sub_system_info.t Sub_system_name.Map.t val enabled : _ t -> Enabled_status.t Memo.t val orig_src_dir : 'path t -> 'path option @@ -239,6 +245,7 @@ val create -> preprocess: Preprocess.With_instrumentation.t Preprocess.Per_module.t Compilation_mode.By_mode.t + -> stanza_flags:Dune_lang.Ocaml_flags.Spec.t -> enabled:Enabled_status.t Memo.t -> virtual_deps:(Loc.t * Lib_name.t) list -> dune_version:Dune_lang.Syntax.Version.t option diff --git a/src/dune_rules/module_compilation.ml b/src/dune_rules/module_compilation.ml index 501cd31d955..f3a0e8a0399 100644 --- a/src/dune_rules/module_compilation.ml +++ b/src/dune_rules/module_compilation.ml @@ -26,18 +26,38 @@ let module_kind_is_filterable m = [lookup_tight_entries], terminating chains through them. The [Module.t] supplied here is the post-pp form (constructed in [build_lib_index]), so ocamldep runs on the dep lib's [.pp.ml] (action / non-staged Pps) or directly - on the source (no preprocessing / future-syntax). *) -let cross_lib_tight_set ~sandbox ~sctx ~lib_index ~initial_refs = + on the source (no preprocessing / future-syntax). + + Each visited lib's stanza [(flags (... -open Foo))] also extends the + frontier: a dep lib whose stanza opens [Foo] can use [Foo]'s identifiers + without naming [Foo] in its source, so ocamldep on the dep lib's source + produces no token to walk through. The stanza-open names are the missing + edges. *) +let cross_lib_tight_set ~sandbox ~sctx ~lib_index ~mode ~initial_refs = let open Action_builder.O in + let read_stanza_opens lib = + let info = Lib.info lib in + let spec = Lib_info.stanza_flags info in + if Dune_lang.Ocaml_flags.Spec.equal spec Dune_lang.Ocaml_flags.Spec.standard + then Action_builder.return Module_name.Set.empty + else ( + let dir = Lib_info.src_dir info |> Path.as_in_build_dir_exn in + let* ocaml_flags = + Action_builder.of_memo (Ocaml_flags_db.ocaml_flags sctx ~dir spec) + in + let+ flag_strings = Ocaml_flags.get ocaml_flags mode in + Ocaml_flags.extract_open_module_names flag_strings) + in let read_entry_deps (lib, m) = let obj_dir = Lib.info lib |> Lib_info.obj_dir |> Obj_dir.as_local_exn in let* impl_deps = Ocamldep.read_immediate_deps_raw_of ~sandbox ~sctx ~obj_dir ~ml_kind:Impl m in - let+ intf_deps = + let* intf_deps = Ocamldep.read_immediate_deps_raw_of ~sandbox ~sctx ~obj_dir ~ml_kind:Intf m in - Module_name.Set.union impl_deps intf_deps + let+ stanza_opens = read_stanza_opens lib in + Module_name.Set.union impl_deps (Module_name.Set.union intf_deps stanza_opens) in let rec loop ~seen ~frontier = if Module_name.Set.is_empty frontier @@ -190,7 +210,7 @@ let lib_deps_for_module ~cctx ~obj_dir ~for_ ~dep_graph ~opaque ~cm_kind ~ml_kin in let must_glob_set = Lib.Set.of_list must_glob_libs in let* tight_set = - cross_lib_tight_set ~sandbox ~sctx ~lib_index ~initial_refs:referenced + cross_lib_tight_set ~sandbox ~sctx ~lib_index ~mode ~initial_refs:referenced in (* Classify each lib in [all_libs]: - lib has [None]-entry referenced (mixed-entry or wrapped) → glob (covers the None entries' [.cmi]s); - diff --git a/src/dune_rules/stanzas/library.ml b/src/dune_rules/stanzas/library.ml index 48404d80d5c..060556d9072 100644 --- a/src/dune_rules/stanzas/library.ml +++ b/src/dune_rules/stanzas/library.ml @@ -642,6 +642,7 @@ let to_lib_info ~jsoo_runtime ~wasmoo_runtime ~preprocess + ~stanza_flags:conf.buildable.flags ~enabled ~virtual_deps ~dune_version diff --git a/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-open-flag-barrier.t b/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-open-flag-barrier.t new file mode 100644 index 00000000000..dc34fcad80b --- /dev/null +++ b/test/blackbox-tests/test-cases/per-module-lib-deps/cross-lib-open-flag-barrier.t @@ -0,0 +1,61 @@ +Regression guard for cross-library transitive [.cmi] reads through +an intermediate library whose stanza injects [-open] on the leaf. +The intermediate's interface mentions a type from a leaf library +that the consumer never names syntactically; the intermediate's +own source also never names the leaf, because [-open] hides it. + +[prelude] defines [color = Red | Green | Blue]. [middle] depends +on [prelude] and is compiled with [(flags (:standard -open +Prelude))], exposing [val pick : unit -> color] (where [color] +resolves through the open to [Prelude.color]). [consumer] depends +on both libraries and pattern-matches on the result of +[Middle.pick] against the bare constructors [Green], [Red], [Blue]. + +The consumer's compile genuinely needs [prelude.cmi] so the OCaml +compiler can resolve the constructors back to [Prelude.color]. The +per-module BFS over [ocamldep -modules] cannot find this dep: +ocamldep on [middle.{ml,mli}] reports no token [Prelude] (the +source uses the open), and ocamldep on [consumer.ml] reports only +[Middle]. None of the three [must_glob_libs] recovery branches +catch [prelude]: it is not wrapped, not a ppx-runtime lib, not a +virtual-impl. + + $ make_dune_project 3.23 + + $ mkdir prelude middle consumer + + $ cat > prelude/dune < (library (name prelude) (wrapped false)) + > EOF + $ cat > prelude/prelude.ml < type color = Red | Green | Blue + > EOF + + $ cat > middle/dune < (library + > (name middle) + > (wrapped false) + > (libraries prelude) + > (flags (:standard -open Prelude))) + > EOF + $ cat > middle/middle.mli < val pick : unit -> color + > EOF + $ cat > middle/middle.ml < let pick () = Green + > EOF + + $ cat > consumer/dune < (executable (name consumer) (libraries middle prelude)) + > EOF + $ cat > consumer/consumer.ml < let () = match Middle.pick () with + > | Green -> print_endline "g" + > | Red | Blue -> print_endline "nb" + > EOF + +Sandbox-forced build of the consumer succeeds: [prelude]'s [.cmi] +is declared as a compile-rule dep through the per-module filter's +cross-lib walk and is therefore present in the sandbox. + + $ dune build --sandbox=copy consumer/consumer.exe From 8caa412acc4323105e6de1f1ececc8e6d10fb9ac Mon Sep 17 00:00:00 2001 From: Robin Bate Boerop Date: Wed, 13 May 2026 10:22:53 -0700 Subject: [PATCH 3/8] feat: per-module [-I]/[-H] include flags via [kept_libs] MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [Compilation_context.filtered_include_flags]: new function returning the [-I]/[-H] flags restricted to [kept_libs]. The cctx's [requires_compile] and [requires_hidden] are each filtered by [Lib.Set.mem kept_libs]; the result is built as a single [Command.Args.t] under [Action_builder]. No caching yet — each call recomputes; a follow-up adds the cache. [Module_compilation.lib_deps_for_module]: the tight branch was already threading [kept_libs] through the classification fold (it had been unused at L4-L5). Now wired into [filtered_include_flags]; the returned pair is [(filtered_include_flags, tight_deps + glob_deps)] instead of [(cctx_includes_for_cm_kind (), …)]. Behavioural effect: a consumer module's compile command sees [-I] / [-H] only for libraries its ocamldep reference set actually reaches. Adding an unreferenced sibling to the cctx's [(libraries ...)] no longer changes the consumer module's compile command, so the rule does not re-execute. Tests: - [per-module-include-flags.t]: promoted — [-I] for the unreferenced [unrelated_lib] no longer appears in the consumer's compile rule. - [add-unreferenced-sibling-lib.t]: promoted — adding an unreferenced sibling lib produces no rebuild for consumer modules. Signed-off-by: Robin Bate Boerop --- src/dune_rules/compilation_context.ml | 18 +++++++++ src/dune_rules/compilation_context.mli | 11 ++++++ src/dune_rules/module_compilation.ml | 8 ++-- .../add-unreferenced-sibling-lib.t | 39 +++++-------------- .../per-module-include-flags.t | 21 ++++------ 5 files changed, 52 insertions(+), 45 deletions(-) diff --git a/src/dune_rules/compilation_context.ml b/src/dune_rules/compilation_context.ml index 6de2be794e4..defe044df9d 100644 --- a/src/dune_rules/compilation_context.ml +++ b/src/dune_rules/compilation_context.ml @@ -340,6 +340,24 @@ let create let for_ t = t.for_ +let filtered_include_flags t ~cm_kind ~kept_libs = + let lib_mode = Lib_mode.of_cm_kind cm_kind in + let open Action_builder.O in + let* direct_requires = Resolve.Memo.read t.requires_compile in + let+ hidden_requires = Resolve.Memo.read t.requires_hidden in + let direct_filtered = List.filter direct_requires ~f:(Lib.Set.mem kept_libs) in + let hidden_filtered = List.filter hidden_requires ~f:(Lib.Set.mem kept_libs) in + let project = Scope.project t.scope in + let lib_config = t.ocaml.lib_config in + Lib_flags.L.include_flags + ~project + ~direct_libs:direct_filtered + ~hidden_libs:hidden_filtered + lib_mode + lib_config + |> Command.Args.memo +;; + let alias_and_root_module_flags = let extra = [ "-w"; "-49" ] in fun base -> Ocaml_flags.append_common base extra |> Ocaml_flags.append_nostdlib diff --git a/src/dune_rules/compilation_context.mli b/src/dune_rules/compilation_context.mli index c917486e0cf..1075762651d 100644 --- a/src/dune_rules/compilation_context.mli +++ b/src/dune_rules/compilation_context.mli @@ -63,6 +63,17 @@ val requires_hidden : t -> Lib.t list Resolve.Memo.t val requires_compile : t -> Lib.t list Resolve.Memo.t val parameters : t -> Module_name.t list Resolve.Memo.t val includes : t -> Command.Args.without_targets Command.Args.t Lib_mode.Cm_kind.Map.t + +(** Include flags ([-I]/[-H]) for compiling a module against [kept_libs]. The + cctx's [requires_compile] and [requires_hidden] are each restricted to + libraries in [kept_libs]; the kept direct entries become [-I], the kept + hidden entries become [-H]. *) +val filtered_include_flags + : t + -> cm_kind:Lib_mode.Cm_kind.t + -> kept_libs:Lib.Set.t + -> Command.Args.without_targets Command.Args.t Action_builder.t + val lib_index : t -> Lib_file_deps.Lib_index.t Resolve.Memo.t (** [true] iff any library in the compilation context's direct or hidden diff --git a/src/dune_rules/module_compilation.ml b/src/dune_rules/module_compilation.ml index f3a0e8a0399..329545444da 100644 --- a/src/dune_rules/module_compilation.ml +++ b/src/dune_rules/module_compilation.ml @@ -225,7 +225,7 @@ let lib_deps_for_module ~cctx ~obj_dir ~for_ ~dep_graph ~opaque ~cm_kind ~ml_kin lib_index ~referenced_modules:tight_set in - let tight_deps, glob_libs, _kept_libs = + let tight_deps, glob_libs, kept_libs = List.fold_left all_libs ~init:(Dep.Set.empty, [], Lib.Set.empty) @@ -246,8 +246,10 @@ let lib_deps_for_module ~cctx ~obj_dir ~for_ ~dep_graph ~opaque ~cm_kind ~ml_kin else td, lib :: gl, Lib.Set.add kl lib)) in let glob_deps = Lib_file_deps.deps_of_entries ~opaque ~cm_kind glob_libs in - Action_builder.return - (cctx_includes_for_cm_kind (), Dep.Set.union tight_deps glob_deps) + let+ include_flags = + Compilation_context.filtered_include_flags cctx ~cm_kind ~kept_libs + in + include_flags, Dep.Set.union tight_deps glob_deps ;; let lib_cm_deps ~cctx ~cm_kind ~ml_kind ~mode m = diff --git a/test/blackbox-tests/test-cases/per-module-lib-deps/add-unreferenced-sibling-lib.t b/test/blackbox-tests/test-cases/per-module-lib-deps/add-unreferenced-sibling-lib.t index 7ad07e032cb..d0bcc28a18a 100644 --- a/test/blackbox-tests/test-cases/per-module-lib-deps/add-unreferenced-sibling-lib.t +++ b/test/blackbox-tests/test-cases/per-module-lib-deps/add-unreferenced-sibling-lib.t @@ -1,20 +1,17 @@ -Observational baseline: adding a library to a stanza's -[(libraries ...)] list currently rebuilds every module in the -stanza, including modules that reference nothing from the new -library. The [-I]/[-H] flags on each module's compiler invocation -are derived from the cctx-wide library list rather than from the -module's own ocamldep reference set, so adding an unreferenced -library still changes every consumer's compile-action hash and -invalidates the rule cache. +Adding a library to a stanza's [(libraries ...)] list does not +rebuild stanza modules that reference nothing from the new +library. The per-module [-I]/[-H] include filter narrows each +module's compile-action hash to the libraries its ocamldep +reference set actually reaches, so an unreferenced added lib +leaves every other consumer's cache key unchanged. [consumer_lib] starts with [(libraries existing_dep)] and two modules: [consumes_dep] references [Existing_dep_module]; [unrelated_module] references nothing from [existing_dep] or any other library. After the initial build, [added_lib] is added to the stanza's [(libraries ...)] list. Neither [consumes_dep] nor -[unrelated_module] uses anything from [added_lib], yet both -rebuild. Records the rebuild counts on each so a future change -can flip them to 0. +[unrelated_module] uses anything from [added_lib], so neither +rebuilds — the trace shows empty target lists for both. $ cat > dune-project < (lang dune 3.23) @@ -72,22 +69,6 @@ Add [added_lib] to [consumer_lib]'s [(libraries ...)]. Neither $ dune build @check $ dune trace cat | jq -s 'include "dune"; [.[] | targetsMatchingFilter(test("consumes_dep"))]' - [ - { - "target_files": [ - "_build/default/.consumer_lib.objs/byte/consumes_dep.cmi", - "_build/default/.consumer_lib.objs/byte/consumes_dep.cmo", - "_build/default/.consumer_lib.objs/byte/consumes_dep.cmt" - ] - } - ] + [] $ dune trace cat | jq -s 'include "dune"; [.[] | targetsMatchingFilter(test("unrelated_module"))]' - [ - { - "target_files": [ - "_build/default/.consumer_lib.objs/byte/unrelated_module.cmi", - "_build/default/.consumer_lib.objs/byte/unrelated_module.cmo", - "_build/default/.consumer_lib.objs/byte/unrelated_module.cmt" - ] - } - ] + [] diff --git a/test/blackbox-tests/test-cases/per-module-lib-deps/per-module-include-flags.t b/test/blackbox-tests/test-cases/per-module-lib-deps/per-module-include-flags.t index bb474822490..51eea2c23d1 100644 --- a/test/blackbox-tests/test-cases/per-module-lib-deps/per-module-include-flags.t +++ b/test/blackbox-tests/test-cases/per-module-lib-deps/per-module-include-flags.t @@ -1,15 +1,12 @@ -Observational baseline: a consumer module's compile command -currently carries [-I] flags for every library in the cctx-wide -[(libraries ...)] closure, regardless of which libraries the -module's source actually references. +A consumer module's compile command carries [-I] flags only for +the libraries its ocamldep reference set actually reaches, not +for every library in the cctx-wide [(libraries ...)] closure. [consumer_lib] declares two library dependencies, [dep_lib] and [unrelated_lib], but its only module references just [Dep_module]. -A future per-module filter could observe via ocamldep that the -module references nothing from [unrelated_lib] and drop that -library's objdir from the [-I] path. This test records today's -[-I] contents so that a future filter improvement can flip the -asserted array to a single entry. +The per-module filter observes via ocamldep that the module +references nothing from [unrelated_lib] and drops that library's +objdir from the [-I] path; only [dep_lib]'s objdir survives. $ cat > dune-project < (lang dune 3.0) @@ -40,12 +37,10 @@ asserted array to a single entry. Inspect the [-I] flags on [consumer_module]'s [.cmo] compile rule. Filter to objdir-shaped paths so we don't print the consumer's own [.consumer_lib.objs/byte] entry — that's always present and -not what this test is about. Today both [dep_lib]'s objdir and -[unrelated_lib]'s objdir appear: +not what this test is about. Only [dep_lib]'s objdir appears: $ dune rules --root . --format=json _build/default/.consumer_lib.objs/byte/consumer_module.cmo \ > | jq 'include "dune"; .[] | [ruleActionFlagValues("-I") | select(test("\\.dep_lib\\.objs|\\.unrelated_lib\\.objs"))]' [ - ".dep_lib.objs/byte", - ".unrelated_lib.objs/byte" + ".dep_lib.objs/byte" ] From aa8b99f532eb93e5472b317efdd47f27daac9ad6 Mon Sep 17 00:00:00 2001 From: Robin Bate Boerop Date: Wed, 13 May 2026 10:25:19 -0700 Subject: [PATCH 4/8] perf: cache filtered include flags per [(lib_mode, kept_libs)] MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [Compilation_context.Filtered_includes] caches the [Action_builder.t] returned by [filtered_include_flags] keyed on [(lib_mode, kept_libs)]. Two modules in the same cctx that reach the same set of kept libs share one builder; [Action_builder.memoize] dedupes its evaluation. Cache key omits the cctx's [requires_compile] / [requires_hidden] — they're immutable on the cctx from [create]. The [for_module_generated_at_link_time] exception, where derived cctxs could in principle alter the closure, takes [can_filter = false] in [lib_deps_for_module] and so never reaches this function. [Filtered_includes.Key]: [lib_mode] + [kept_libs : Lib.t list] (the caller passes a sorted list via [Lib.Set.to_list], canonicalising for the cache). [equal] and [hash] derived from the same; [Repr]-derived [to_dyn] for diagnostics. [Lib_mode.hash]: new — used by [Filtered_includes.Key.hash]. Three constants for the three variants ([Ocaml Byte], [Ocaml Native], [Melange]). Signed-off-by: Robin Bate Boerop --- src/dune_lang/lib_mode.ml | 6 ++ src/dune_lang/lib_mode.mli | 1 + src/dune_rules/compilation_context.ml | 82 ++++++++++++++++++++++----- 3 files changed, 75 insertions(+), 14 deletions(-) diff --git a/src/dune_lang/lib_mode.ml b/src/dune_lang/lib_mode.ml index 09229cd2fb5..4b3c5c110f3 100644 --- a/src/dune_lang/lib_mode.ml +++ b/src/dune_lang/lib_mode.ml @@ -11,6 +11,12 @@ let equal x y = | Melange, Melange -> true ;; +let hash = function + | Ocaml Byte -> 0 + | Ocaml Native -> 1 + | Melange -> 2 +;; + let decode = let open Decoder in enum [ "byte", Ocaml Byte; "native", Ocaml Native; "melange", Melange ] diff --git a/src/dune_lang/lib_mode.mli b/src/dune_lang/lib_mode.mli index 7975ae286fa..7f69dfa793d 100644 --- a/src/dune_lang/lib_mode.mli +++ b/src/dune_lang/lib_mode.mli @@ -6,6 +6,7 @@ type t = val decode : t Decoder.t val equal : t -> t -> bool +val hash : t -> int val to_dyn : t -> Dyn.t module Cm_kind : sig diff --git a/src/dune_rules/compilation_context.ml b/src/dune_rules/compilation_context.ml index defe044df9d..7197f307ed1 100644 --- a/src/dune_rules/compilation_context.ml +++ b/src/dune_rules/compilation_context.ml @@ -33,6 +33,44 @@ module Includes = struct let empty = Lib_mode.Cm_kind.Map.make_all Command.Args.empty end +(* Key omits [t.requires_compile] / [t.requires_hidden] because they're + immutable on the cctx from [create]. The exception — + [for_module_generated_at_link_time]'s derived cctxs — takes the + [can_filter = false] arm in [lib_deps_for_module] and so never reaches this + cache. *) +module Filtered_includes = struct + module Key = struct + type t = + { lib_mode : Lib_mode.t + ; kept_libs : Lib.Set.t + } + + let equal a b = + Lib_mode.equal a.lib_mode b.lib_mode && Lib.Set.equal a.kept_libs b.kept_libs + ;; + + let hash { lib_mode; kept_libs } = + Lib.Set.fold kept_libs ~init:(Lib_mode.hash lib_mode) ~f:(fun lib acc -> + acc lxor Lib.hash lib) + ;; + + let repr = + Repr.record + "Filtered_includes.Key" + [ Repr.field "lib_mode" (Repr.abstract Lib_mode.to_dyn) ~get:(fun t -> t.lib_mode) + ; Repr.field "kept_libs" (Repr.abstract Lib.Set.to_dyn) ~get:(fun t -> + t.kept_libs) + ] + ;; + + let to_dyn = Repr.to_dyn repr + end + + type t = (Key.t, Command.Args.without_targets Command.Args.t Action_builder.t) Table.t + + let create () : t = Table.create (module Key) 16 +end + type opaque = | Explicit of bool | Inherit_from_settings @@ -81,6 +119,7 @@ type t = ; loc : Loc.t option ; ocaml : Ocaml_toolchain.t ; for_ : Compilation_mode.t + ; filtered_includes : Filtered_includes.t } let loc t = t.loc @@ -335,6 +374,7 @@ let create ; ocaml ; instances ; for_ + ; filtered_includes = Filtered_includes.create () } ;; @@ -342,20 +382,34 @@ let for_ t = t.for_ let filtered_include_flags t ~cm_kind ~kept_libs = let lib_mode = Lib_mode.of_cm_kind cm_kind in - let open Action_builder.O in - let* direct_requires = Resolve.Memo.read t.requires_compile in - let+ hidden_requires = Resolve.Memo.read t.requires_hidden in - let direct_filtered = List.filter direct_requires ~f:(Lib.Set.mem kept_libs) in - let hidden_filtered = List.filter hidden_requires ~f:(Lib.Set.mem kept_libs) in - let project = Scope.project t.scope in - let lib_config = t.ocaml.lib_config in - Lib_flags.L.include_flags - ~project - ~direct_libs:direct_filtered - ~hidden_libs:hidden_filtered - lib_mode - lib_config - |> Command.Args.memo + let cache_key = { Filtered_includes.Key.lib_mode; kept_libs } in + match Table.find t.filtered_includes cache_key with + | Some builder -> builder + | None -> + (* Cache the [Action_builder.t] (not the resolved args) up-front at + rule-construction time so all compile rules sharing this + [(lib_mode, kept_libs)] share one builder; [Action_builder.memoize] then + dedupes its evaluation. Mirrors the cache pattern in + [Ocamldep.read_immediate_deps_words]. *) + let builder = + let open Action_builder.O in + let* direct_requires = Resolve.Memo.read t.requires_compile in + let+ hidden_requires = Resolve.Memo.read t.requires_hidden in + let direct_filtered = List.filter direct_requires ~f:(Lib.Set.mem kept_libs) in + let hidden_filtered = List.filter hidden_requires ~f:(Lib.Set.mem kept_libs) in + let project = Scope.project t.scope in + let lib_config = t.ocaml.lib_config in + Lib_flags.L.include_flags + ~project + ~direct_libs:direct_filtered + ~hidden_libs:hidden_filtered + lib_mode + lib_config + |> Command.Args.memo + in + let builder = Action_builder.memoize "filtered_include_flags" builder in + Table.set t.filtered_includes cache_key builder; + builder ;; let alias_and_root_module_flags = From 1653697fb031084d439abafe6fbb44b7d6a77c74 Mon Sep 17 00:00:00 2001 From: Robin Bate Boerop Date: Wed, 13 May 2026 10:28:23 -0700 Subject: [PATCH 5/8] perf: cache per-cctx raw ocamldep references by (obj_name, kind) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [Compilation_context.Raw_refs] caches the [Action_builder.t] computed for each ocamldep raw-deps read inside a cctx. Two consumer modules that share trans_deps (or a consumer and one of its trans deps that share an [obj_name + ml_kind]) get the same builder. The cache short-circuits before constructing the builder; on hit, no allocation. [Raw_refs.Key] distinguishes the two read patterns the per-module filter uses: [Consumer] (the cctx-driving module's own deps, keyed by [ml_kind]) and [Transitive] (a dep module's deps, keyed by [cm_kind] because the impl/intf gating in [need_impl_deps_of] varies by cm_kind on the [Cmx]/opaque path). Conservatively-distinct keying — never collapse two semantically-different reads under one cache cell. [Compilation_context.cached_raw_refs t ~key ~compute] is the thin public surface: lookup, compute on miss, store, return the builder. [Module_compilation.lib_deps_for_module]: wraps the inline [read_dep_m_raw] body that the BFS uses for both the consumer's own and each trans dep's raw refs. No semantic change — the cache only deduplicates builder construction across calls within the same cctx. Signed-off-by: Robin Bate Boerop --- src/dune_rules/compilation_context.ml | 76 ++++++++++++++++++++++++++ src/dune_rules/compilation_context.mli | 25 +++++++++ src/dune_rules/module_compilation.ml | 32 +++++++---- 3 files changed, 123 insertions(+), 10 deletions(-) diff --git a/src/dune_rules/compilation_context.ml b/src/dune_rules/compilation_context.ml index 7197f307ed1..fa52d8edac9 100644 --- a/src/dune_rules/compilation_context.ml +++ b/src/dune_rules/compilation_context.ml @@ -33,6 +33,71 @@ module Includes = struct let empty = Lib_mode.Cm_kind.Map.make_all Command.Args.empty end +module Raw_refs = struct + module Key = struct + type t = + | Consumer of + { obj_name : Module_name.Unique.t + ; ml_kind : Ml_kind.t + } + | Transitive of + { obj_name : Module_name.Unique.t + ; cm_kind : Lib_mode.Cm_kind.t + } + + let cm_kind_tag : Lib_mode.Cm_kind.t -> int = function + | Ocaml Cmi -> 0 + | Ocaml Cmo -> 1 + | Ocaml Cmx -> 2 + | Melange Cmi -> 3 + | Melange Cmj -> 4 + ;; + + let ml_kind_tag : Ml_kind.t -> int = function + | Intf -> 0 + | Impl -> 1 + ;; + + let equal a b = + match a, b with + | Consumer a, Consumer b -> + Module_name.Unique.equal a.obj_name b.obj_name + && ml_kind_tag a.ml_kind = ml_kind_tag b.ml_kind + | Transitive a, Transitive b -> + Module_name.Unique.equal a.obj_name b.obj_name + && cm_kind_tag a.cm_kind = cm_kind_tag b.cm_kind + | Consumer _, Transitive _ | Transitive _, Consumer _ -> false + ;; + + let hash = function + | Consumer { obj_name; ml_kind } -> Poly.hash (0, obj_name, ml_kind_tag ml_kind) + | Transitive { obj_name; cm_kind } -> Poly.hash (1, obj_name, cm_kind_tag cm_kind) + ;; + + let repr = + let open Repr in + let obj_name_repr = view string ~to_:Module_name.Unique.to_string in + let ml_kind_repr = view string ~to_:Ml_kind.to_string in + let cm_kind_repr = abstract Lib_mode.Cm_kind.to_dyn in + variant + "Raw_refs.Key" + [ case "Consumer" (pair obj_name_repr ml_kind_repr) ~proj:(function + | Consumer { obj_name; ml_kind } -> Some (obj_name, ml_kind) + | Transitive _ -> None) + ; case "Transitive" (pair obj_name_repr cm_kind_repr) ~proj:(function + | Transitive { obj_name; cm_kind } -> Some (obj_name, cm_kind) + | Consumer _ -> None) + ] + ;; + + let to_dyn = Repr.to_dyn repr + end + + type t = (Key.t, Module_name.Set.t Action_builder.t) Table.t + + let create () : t = Table.create (module Key) 64 +end + (* Key omits [t.requires_compile] / [t.requires_hidden] because they're immutable on the cctx from [create]. The exception — [for_module_generated_at_link_time]'s derived cctxs — takes the @@ -120,6 +185,7 @@ type t = ; ocaml : Ocaml_toolchain.t ; for_ : Compilation_mode.t ; filtered_includes : Filtered_includes.t + ; raw_refs : Raw_refs.t } let loc t = t.loc @@ -375,11 +441,21 @@ let create ; instances ; for_ ; filtered_includes = Filtered_includes.create () + ; raw_refs = Raw_refs.create () } ;; let for_ t = t.for_ +let cached_raw_refs t ~key ~compute = + match Table.find t.raw_refs key with + | Some builder -> builder + | None -> + let builder = compute () in + Table.set t.raw_refs key builder; + builder +;; + let filtered_include_flags t ~cm_kind ~kept_libs = let lib_mode = Lib_mode.of_cm_kind cm_kind in let cache_key = { Filtered_includes.Key.lib_mode; kept_libs } in diff --git a/src/dune_rules/compilation_context.mli b/src/dune_rules/compilation_context.mli index 1075762651d..c640df9f16c 100644 --- a/src/dune_rules/compilation_context.mli +++ b/src/dune_rules/compilation_context.mli @@ -64,6 +64,31 @@ val requires_compile : t -> Lib.t list Resolve.Memo.t val parameters : t -> Module_name.t list Resolve.Memo.t val includes : t -> Command.Args.without_targets Command.Args.t Lib_mode.Cm_kind.Map.t +module Raw_refs : sig + module Key : sig + type t = + | Consumer of + { obj_name : Module_name.Unique.t + ; ml_kind : Ml_kind.t + } + | Transitive of + { obj_name : Module_name.Unique.t + ; cm_kind : Lib_mode.Cm_kind.t + } + end +end + +(** Memoise the raw-refs [Action_builder.t] computed for each [Raw_refs.Key.t] + within this cctx. [compute ()] is invoked only on cache miss; subsequent + callers with the same key get the cached builder back. The cache + short-circuits before allocating, so siblings sharing [trans_deps] don't + redo construction. *) +val cached_raw_refs + : t + -> key:Raw_refs.Key.t + -> compute:(unit -> Module_name.Set.t Action_builder.t) + -> Module_name.Set.t Action_builder.t + (** Include flags ([-I]/[-H]) for compiling a module against [kept_libs]. The cctx's [requires_compile] and [requires_hidden] are each restricted to libraries in [kept_libs]; the kept direct entries become [-I], the kept diff --git a/src/dune_rules/module_compilation.ml b/src/dune_rules/module_compilation.ml index 329545444da..136ba4699b3 100644 --- a/src/dune_rules/module_compilation.ml +++ b/src/dune_rules/module_compilation.ml @@ -141,21 +141,33 @@ let lib_deps_for_module ~cctx ~obj_dir ~for_ ~dep_graph ~opaque ~cm_kind ~ml_kin | Ocaml (Cmi | Cmo) | Melange _ -> false in let read_dep_m_raw dep_m ~is_consumer = - let* impl_deps = - if need_impl_deps_of dep_m ~is_consumer - then + let key : Compilation_context.Raw_refs.Key.t = + let obj_name = Module.obj_name dep_m in + if is_consumer + then Consumer { obj_name; ml_kind } + else Transitive { obj_name; cm_kind } + in + Compilation_context.cached_raw_refs cctx ~key ~compute:(fun () -> + let* impl_deps = + if need_impl_deps_of dep_m ~is_consumer + then + Ocamldep.read_immediate_deps_raw_of + ~sandbox + ~sctx + ~obj_dir + ~ml_kind:Impl + dep_m + else Action_builder.return Module_name.Set.empty + in + let+ intf_deps = Ocamldep.read_immediate_deps_raw_of ~sandbox ~sctx ~obj_dir - ~ml_kind:Impl + ~ml_kind:Intf dep_m - else Action_builder.return Module_name.Set.empty - in - let+ intf_deps = - Ocamldep.read_immediate_deps_raw_of ~sandbox ~sctx ~obj_dir ~ml_kind:Intf dep_m - in - Module_name.Set.union impl_deps intf_deps + in + Module_name.Set.union impl_deps intf_deps) in let* m_raw = read_dep_m_raw m ~is_consumer:true in let* trans_raw = From f5e94263e48593456c740f86276e7402d75b4a24 Mon Sep 17 00:00:00 2001 From: Robin Bate Boerop Date: Wed, 13 May 2026 10:51:54 -0700 Subject: [PATCH 6/8] perf: memoise [Lib.closure] keyed on (linking, for_, libs) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The per-module filter calls [Lib.closure] twice per consumer module (once for [direct_libs], once for [must_glob_libs]) on each compile rule. Across a cctx, many modules pass overlapping inputs to these closures; without memoisation every call re-traverses the dependency graph. [Lib.closure] is now defined as [Memo.exec] over a [Memo.create] keyed on [(bool * Compilation_mode.t * t list)]. The list-of-libs key is order- and multiplicity-sensitive, so callers that share inputs need to canonicalise (sort by [Lib.compare]) for maximum cache reuse — [lib_deps_for_module] already does this at both call sites. A docstring on [val closure] notes the requirement. Signed-off-by: Robin Bate Boerop --- src/dune_rules/lib.ml | 50 +++++++++++++++++++++++++++++++++++++----- src/dune_rules/lib.mli | 4 ++++ 2 files changed, 49 insertions(+), 5 deletions(-) diff --git a/src/dune_rules/lib.ml b/src/dune_rules/lib.ml index a8630f413db..4a0a76faa7e 100644 --- a/src/dune_rules/lib.ml +++ b/src/dune_rules/lib.ml @@ -2198,11 +2198,51 @@ end = struct ;; end -let closure l ~linking = - let forbidden_libraries = Map.empty in - if linking - then Resolve_names.linking_closure_with_overlap_checks None l ~forbidden_libraries - else Resolve_names.compile_closure_with_overlap_checks None l ~forbidden_libraries +let closure = + let memo = + let module Input = struct + type nonrec t = bool * Compilation_mode.t * t list + + let equal (l, m, libs) (l', m', libs') = + Bool.equal l l' && Compilation_mode.equal m m' && List.equal equal libs libs' + ;; + + let hash_for_ = function + | Compilation_mode.Ocaml -> 0 + | Melange -> 1 + ;; + + let hash (linking, for_, libs) = + Tuple.T3.hash + Bool.hash + hash_for_ + (List.hash (fun lib -> Id.hash lib.unique_id)) + (linking, for_, libs) + ;; + + let to_dyn = Dyn.opaque + end + in + Memo.create + "lib-closure" + ~input:(module Input) + (fun (linking, for_, l) -> + let forbidden_libraries = Map.empty in + if linking + then + Resolve_names.linking_closure_with_overlap_checks + None + l + ~forbidden_libraries + ~for_ + else + Resolve_names.compile_closure_with_overlap_checks + None + l + ~forbidden_libraries + ~for_) + in + fun l ~linking ~for_ -> Memo.exec memo (linking, for_, l) ;; let descriptive_closure (l : lib list) ~with_pps ~for_ : lib list Memo.t = diff --git a/src/dune_rules/lib.mli b/src/dune_rules/lib.mli index 1ab86f84d21..ec7bc3dbe5e 100644 --- a/src/dune_rules/lib.mli +++ b/src/dune_rules/lib.mli @@ -217,6 +217,10 @@ end (** {1 Transitive closure} *) +(** Memoized. The memo key is order-sensitive on the input list, so for callers + that share inputs across invocations (e.g. the hot path in + [Module_compilation.lib_deps_for_module]), a canonical sort (by + [Lib.compare]) maximises cache reuse. *) val closure : t list -> linking:bool -> for_:Compilation_mode.t -> t list Resolve.Memo.t (** [descriptive_closure ~with_pps libs] computes the smallest set of libraries From f5435749d1038d694708e4f9cb3a37799b2d214c Mon Sep 17 00:00:00 2001 From: Robin Bate Boerop Date: Fri, 15 May 2026 20:58:22 -0700 Subject: [PATCH 7/8] doc: per-module narrowing algorithm reference MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add [doc/dev/per-module-narrowing.md] describing the per-module library file dependency narrowing introduced in #14492 (split into PRs #14513..#14521 as layers L1..L9): - The motivation and soundness model. - The [can_filter] precondition and [has_virtual_impl] early-out. - The narrowing pipeline: read ocamldep raw refs → [referenced] → [Lib.closure] → cross-library BFS → classification → emit per-lib deps and filtered include flags. - The data structures used ([Lib_index], the per-cctx [cached_raw_refs] / [Filtered_includes] / [Lib.closure] memos). - Soundness fallbacks (wrapped libs, virtual impls, ppx runtime). - A source map locating each concern in [src/dune_rules/]. - A layer-by-layer summary of #14513..#14521. Signed-off-by: Robin Bate Boerop --- doc/dev/per-module-narrowing.md | 583 ++++++++++++++++++++++++++++++++ 1 file changed, 583 insertions(+) create mode 100644 doc/dev/per-module-narrowing.md diff --git a/doc/dev/per-module-narrowing.md b/doc/dev/per-module-narrowing.md new file mode 100644 index 00000000000..7ac3b2afc18 --- /dev/null +++ b/doc/dev/per-module-narrowing.md @@ -0,0 +1,583 @@ +# Per-module library dependency narrowing + +This document describes the algorithm Dune uses to narrow the +library-file dependency set of each compile rule on a per-module basis. +The narrowing was introduced in #14492 (split into PRs #14513–#14521, +referred to internally as layers L1..L9). It supersedes the prior +behaviour in which every compile rule depended on the full glob of +every interface file of every library in the compilation context's +dependency closure. + +## Motivation + +In the pre-narrowing world, a single compile rule for module `M` in +some stanza depended on the glob of every `.cmi` (and, conditionally, +`.cmx`) of every library reachable through `requires`. The glob is +correct — recompiling `M` must see fresh interfaces — but it is +maximally broad. A change to any interface in any reachable library +invalidated `M`'s compile rule, even when `M` does not mention the module +whose interface changed. + +For incremental builds in code bases with many libraries and broad +`requires` graphs, this over-invalidation produced large, avoidable +recompilation cascades. The narrowing computes, for each compile rule, +a subset of those files that the consumer module can actually be +affected by — by inspecting what its `ocamldep` says it references and +walking the cross-library reference graph until convergence. Files +outside that subset are dropped from the rule's dependency set; their +unrelated edits no longer trigger a recompile. + +The narrowing must be *sound*: every file the compiler can read while +processing `M` must remain in the dependency set, or builds will be +non-deterministic and incremental builds will silently use stale +artefacts. The algorithm therefore conservatively widens to the +original glob in a small set of edge cases described under "Soundness +recovery" below. + +## The algorithm at a glance + +For each compile rule for a module `M` in compilation context `cctx`, +parameterised by the compile artefact kind (`Lib_mode.Cm_kind.t` — one +of `Ocaml Cmi`, `Ocaml Cmo`, `Ocaml Cmx`, `Melange Cmi`, `Melange Cmj`), +the source-language kind (`Ml_kind.t` — `Impl` or `Intf`), and the mode +(`Lib_mode.t`): + +1. If a small set of preconditions fails (`can_filter = false`), fall + back to the cctx-wide glob — the same dep set every compile rule + would have had prior to this work. This avoids needing soundness + arguments for module kinds that the narrowing was not designed for, + and lets `Melange` paths pass through unchanged. + +2. Otherwise, if the consumer cctx already carries a virtual library + in `requires` (`has_virtual_impl = true`), fall back to the + cctx-wide glob. Virtual-impl deps require analysis the BFS does not + currently do. + +3. Otherwise, run the narrowing: + 1. Read the consumer module's own `ocamldep` raw references (impl + and intf), and those of every module in `M`'s within-cctx + transitive `dep_graph`. + 2. Union them with the `-open` flag modules. The result is + `referenced : Module_name.Set.t`. + 3. Use the cctx's `Lib_index` (built once per cctx) to partition + the cctx's library closure into libraries whose entry modules + appear in `referenced` (`tight`) and libraries whose `None`- + entries appear in `referenced` (`non_tight`). + 4. Compute the `Lib.closure` of `tight ∪ non_tight ∪ + pps_runtime_libs` to expand the **lib set** over the library + `requires` DAG, picking up libs reached only via transparent + aliases that ocamldep cannot see. + 5. Run a BFS over the **module-reference graph**: starting from + `referenced`, look up `(lib, entry)` pairs in the lib index + and read each entry's `ocamldep` raw refs, extending the + frontier until it converges. The fixpoint `tight_set` is the + set of *module names* (not libs) transitively reachable. + 6. Re-run `Lib_index.filter_libs_with_modules` with `tight_set` in + place of `referenced`, producing a fresh `(tight_modules, + non_tight_set)` partition over the BFS-expanded name set. + 7. Compute `must_glob_set` for the libraries that must glob for + soundness (wrapped-library wrappers reached transparently, and + `ppx_runtime_libraries` whose modules are post-pp invisible). + 8. Fold over the closed lib set, classifying each lib into one of + four buckets and emitting either specific-file deps (for tight + libs) or a full glob (for non-tight libs). + 9. Compute the include flags scoped to the `kept_libs` set — + libraries that contributed at least one dep — so that `-I`/`-H` + reflect only what the consumer can actually reach. + +The output is `(include_args, dep_set)`, which the compile rule +consumes via `Action_builder.dyn_deps`. + +## The `can_filter` precondition + +The entry point for the narrowing is +`Module_compilation.lib_deps_for_module` +(`src/dune_rules/module_compilation.ml`). Before any narrowing runs, +the function computes a `can_filter` boolean conjunction: + +- The compile is `Ocaml _` (Melange has its own cm-kind story and is + not narrowed). +- The supplied `dep_graph` is the real one for this cctx (its `dir` + equals the cctx's `obj_dir`), not a `Dep_graph.dummy` — synthesised + / link-time-generated / alias / root modules have no usable + trans-deps and short-circuit. +- The `dep_graph` contains `M`. Modules synthesised outside the stanza + (e.g. those handed to `ocamlc_i` for `-i` extraction in unrelated + flows) are also rejected here. +- `M`'s `Module.kind` is filterable: not `Root`, `Wrapped_compat`, + `Impl_vmodule`, `Virtual`, or `Parameter`. These kinds have bespoke + dep stories handled elsewhere (`Dep_rules`, `Virtual_rules`). +- `M` has a source file of the requested `ml_kind` (`Impl` or `Intf`). +- The consumer cctx is itself not a `Virtual_rules.is_virtual_or_ + parameter` cctx. Consumer-side virtual-impl behaviour is owned by + `Dep_rules`. + +If any condition fails, the function returns + +```ocaml +(cctx_includes_for_cm_kind (), + Lib_file_deps.deps_of_entries ~opaque ~cm_kind libs) +``` + +... the cctx-wide glob, exactly equivalent to the prior behaviour. The +narrowing applies only when all conditions hold. + +## The `has_virtual_impl` early-out + +When `can_filter` holds, the next check is + +```ocaml +let* has_virtual_impl = + Resolve.Memo.read (Compilation_context.has_virtual_impl cctx) +in +if has_virtual_impl then + Action_builder.return + (cctx_includes_for_cm_kind (), + Lib_file_deps.deps_of_entries ~opaque ~cm_kind libs) +else ... +``` + +`has_virtual_impl` is true when any library in the cctx's +`requires_compile ∪ requires_hidden` is itself a virtual-library +implementation. The BFS does not currently analyse virtual impls' +contribution to the cctx's namespace; for soundness, all such cctxs +fall back to the cctx-wide glob. The decision is cctx-level, computed +once at `Compilation_context.create` and stored as a +`bool Resolve.t Memo.Lazy.t`. + +## The narrowing pipeline + +When `can_filter` holds and `has_virtual_impl` is false, the function +proceeds to the narrowing pipeline. + +### Step 1: read ocamldep raw refs for consumer + trans deps + +```ocaml +let* trans_deps = Dep_graph.deps_of dep_graph m in +let need_impl_deps_of dep_m ~is_consumer = + if is_consumer + then (match ml_kind with Impl -> true | Intf -> false) + else + (not (Module.has dep_m ~ml_kind:Intf)) + || + match cm_kind with + | Ocaml Cmx -> not opaque + | Ocaml (Cmi | Cmo) | Melange _ -> false +in +``` + +The function reads `M`'s impl + intf raw refs, plus the impl + intf +raw refs of every module in `M`'s transitive within-stanza +dependencies. The selectivity table (which `.ml`-side refs to read) +matches the compiler's own conditional behaviour: + +| `dep_m` is | `cm_kind` | `opaque` | read `.ml`? | +| ----------------------- | ----------- | -------- | ------------ | +| consumer (`M` itself) | any | any | iff `Impl` | +| trans_dep, no `.mli` | any | any | yes | +| trans_dep, has `.mli` | `Cmx` | false | yes (inline) | +| trans_dep, has `.mli` | `Cmx` | true | no | +| trans_dep, has `.mli` | `Cmi`/`Cmo` | any | no | + +Each ocamldep raw-ref read is wrapped through +`Compilation_context.cached_raw_refs` (L8), keyed on +`Raw_refs.Key.t = Consumer { obj_name; ml_kind } | Transitive +{ obj_name; cm_kind }`. The cache short-circuits before allocating the +underlying `Action_builder.t`, so two siblings sharing a transitive +dep only construct the dependency-reading builder once per cctx per +key. + +The intf side is always read; the impl side is gated by +`need_impl_deps_of`. The union of impl+intf for each module is the +module's *raw references* — every module name that ocamldep saw it +mention. + +### Step 2: compute `referenced` + +```ocaml +let* m_raw = read_dep_m_raw m ~is_consumer:true in +let* trans_raw = + union_module_name_sets_mapped trans_deps + ~f:(read_dep_m_raw ~is_consumer:false) +in +let all_raw = Module_name.Set.union m_raw trans_raw in +let* flags = Ocaml_flags.get (Compilation_context.flags cctx) mode in +let open_modules = Ocaml_flags.extract_open_module_names flags in +let referenced = Module_name.Set.union all_raw open_modules in +``` + +`open_modules` are module names brought into scope by `-open` flags +that ocamldep does not see (because they are command-line, not source +syntax). Without them the consumer can mention a module by short name +that the raw refs would miss, opening a soundness gap. Their union +with `all_raw` produces `referenced` — the set of module names this +compile of `M` can possibly resolve. + +### Step 3: first lib classification + +```ocaml +let { Lib_file_deps.Lib_index.tight; non_tight } = + Lib_file_deps.Lib_index.filter_libs_with_modules + lib_index + ~referenced_modules:referenced +in +``` + +The cctx-level `Lib_index` (built once at `cctx` creation by +`Compilation_context.build_lib_index`) is a structure indexing each +library's entry modules with each module name. `filter_libs_with_ +modules` walks `referenced` against the index and partitions: + +- `tight : Module.t list Lib.Map.t` — libraries whose `Some`-entry + modules appear in `referenced`. The mapped list is the specific + set of entries referenced. These are candidates for per-module + deps (`deps_of_entry_modules`). +- `non_tight : Lib.Set.t` — libraries whose `None`-entry modules + appear (wrapped locals, externals, or unwrapped locals with some + staged-pps / instrumentation-only entries). These must glob. + +A library with mixed `Some`/`None` entries can appear in both — the +caller globs it from the `non_tight` side, since the `None` entries +require glob coverage. + +### Step 4: include `pps_runtime_libs` and compute `Lib.closure` + +```ocaml +let* pps_runtime_libs = + Resolve.Memo.read (Compilation_context.pps_runtime_libs cctx) +in +let direct_libs = + List.sort_uniq ~compare:Lib.compare + (Lib.Map.keys tight @ Lib.Set.to_list non_tight @ pps_runtime_libs) +in +let* all_libs = + Resolve.Memo.read (Lib.closure direct_libs ~linking:false ~for_) +in +``` + +`pps_runtime_libs` introduce module references through post-pp source +that ocamldep cannot see — they are folded into the closure input so +the classification fold sees them. `sort_uniq` canonicalises the +input list because `Lib.closure`'s memo key (L9) is order- and +multiplicity-sensitive. + +`Lib.closure` adds libraries reachable through transparent aliases +(re-exports) that ocamldep does not surface — for instance, when a +library `A` re-exports modules from `B` via an alias, references to +`B.M` in source compile to references to `A.M` and ocamldep records +only `A`. The closure pulls `B` in. + +### Step 5: compute `must_glob_set` + +```ocaml +let wrapped_referenced = + Lib_file_deps.Lib_index.wrapped_libs_referenced + lib_index ~referenced_modules:referenced +in +let* must_glob_libs = + Resolve.Memo.read + (Lib.closure + (List.sort_uniq ~compare:Lib.compare + (Lib.Set.to_list wrapped_referenced @ pps_runtime_libs)) + ~linking:false ~for_) +in +let must_glob_set = Lib.Set.of_list must_glob_libs in +``` + +Two classes of libraries must always glob even when their entry +modules appear in `tight`: + +- **Wrapped library wrappers reached transparently.** When the + consumer mentions `Lib.M`, the source-level reference is to the + wrapper module `Lib`, but the compiled output of `M` lives at + `lib__M`. The cross-library walker reads ocamldep on the wrapper + module, which only surfaces references to the wrapper's direct + children. A consumer that reaches `Lib`'s wrapper can reach any + module of `Lib` via Foo's `(open ...)` or `Lib.Bar.x` syntax; the + walker cannot enumerate those at narrowing time. Solution: glob + `Lib.closure(wrapped_referenced)` — every library transitively + reachable from any referenced wrapper. + +- **`ppx_runtime_libraries`.** ppx-rewritten output may reference + modules from a runtime library that the consumer never names in + its source. ocamldep on the pre-pp source does not see those + references. Glob the closure to cover. + +The union `Lib.Set.of_list must_glob_libs` is the *must-glob set*; +every member is forced onto the glob path regardless of its entry +classification. + +### Step 6: BFS to fixpoint + +```ocaml +let* tight_set = + cross_lib_tight_set + ~sandbox ~sctx ~lib_index ~initial_refs:referenced +in +``` + +`cross_lib_tight_set` runs a BFS over the cross-library reference +graph: + +```ocaml +let rec loop ~seen ~frontier = + if Module_name.Set.is_empty frontier then return seen + else + let pairs = + Module_name.Set.fold frontier ~init:[] ~f:(fun name acc -> + Lib_file_deps.Lib_index.lookup_tight_entries lib_index name @ acc) + in + let* discovered = + union_module_name_sets_mapped pairs ~f:read_entry_deps + in + let seen = Module_name.Set.union seen frontier in + let frontier = Module_name.Set.diff discovered seen in + loop ~seen ~frontier +``` + +Each frontier iteration: + +1. For each module name in the frontier, look up the `(lib, entry)` + tight-eligible pairs via `Lib_file_deps.Lib_index.lookup_tight_ + entries`. The lookup skips entries with `None`-module (wrapped + locals, externals) — those are non-tight-eligible and terminate + chains through them. +2. For each `(lib, entry)`, read the entry module's impl + intf + ocamldep raw refs. +3. Union all the discovered names into `discovered`. +4. Add the current frontier to `seen`; the new frontier is + `discovered \ seen`. + +The post-pp module map (built by `Compilation_context.build_lib_ +index`) ensures that the entry module passed to ocamldep is the form +the dep lib's own compile pipeline produces — so the walker reads +either the `.pp.ml` (for action-style preprocessors or non-staged +pps), or the source directly (for no-preprocessing and future-syntax +extensions, which Dune handles at parse time). + +The walker terminates because the universe of module names is +finite (bounded by the cctx's transitive library set) and `seen` +only grows. The cost is proportional to the number of distinct +entry modules transitively reachable from `referenced`. + +### Step 7: second lib classification + +```ocaml +let { Lib_file_deps.Lib_index.tight = tight_modules; + non_tight = non_tight_set } = + Lib_file_deps.Lib_index.filter_libs_with_modules + lib_index ~referenced_modules:tight_set +in +``` + +Re-partition the index using the converged `tight_set` (which is a +superset of `referenced`). This is a separate classification from +step 3 because the BFS may have brought new module names into scope +that flip a library from "unreached" to "tight" or change the +specific entries referenced. + +### Step 8: classify and emit per-lib deps + +```ocaml +let tight_deps, glob_libs, kept_libs = + List.fold_left all_libs ~init:(Dep.Set.empty, [], Lib.Set.empty) + ~f:(fun (td, gl, kl) lib -> + if Lib.Set.mem must_glob_set lib || Lib.Set.mem non_tight_set lib + then td, lib :: gl, Lib.Set.add kl lib + else ( + match Lib.Map.find tight_modules lib with + | Some modules -> + ( Dep.Set.union td + (Lib_file_deps.deps_of_entry_modules + ~opaque ~cm_kind lib modules) + , gl + , Lib.Set.add kl lib ) + | None -> + if Lib_file_deps.Lib_index.is_tight_eligible lib_index lib + then td, gl, kl + else td, lib :: gl, Lib.Set.add kl lib)) +in +let glob_deps = Lib_file_deps.deps_of_entries ~opaque ~cm_kind glob_libs in +``` + +Each library in `all_libs` (the closure-expanded set) is bucketed: + +- **Must-glob**: lib is in `must_glob_set` (wrapped-referenced or + pps-runtime), or has a `None`-entry referenced → emit the full lib + glob via `deps_of_entries`. Added to `kept_libs`. + +- **Tight**: lib has a `Some`-entry referenced and is not must-glob → + emit `deps_of_entry_modules` for just those referenced entries. + Added to `kept_libs`. + +- **Unreached + tight-eligible**: lib has no referenced entries and is + tight-eligible (has at least one `Some`-entry) → drop entirely. The + link rule still pulls it in via `requires_link` for runtime + linkage. The compile rule does not need it. + +- **Unreached + not tight-eligible**: lib has no referenced entries + but is not tight-eligible (wrapped local, external) → emit glob. + Added to `kept_libs`. + +The `kept_libs` accumulator captures every library that contributes +*any* dep to this compile rule. It is the precise set against which +include flags should be scoped (step 9). + +### Step 9: filtered include flags + +```ocaml +let+ include_flags = + Compilation_context.filtered_include_flags cctx ~cm_kind ~kept_libs +in +include_flags, Dep.Set.union tight_deps glob_deps +``` + +`Compilation_context.filtered_include_flags` (L6+L7) restricts the +cctx's `requires_compile` and `requires_hidden` lists to libraries in +`kept_libs`, then computes `-I` / `-H` over those subsets. The result +is memoised per `(lib_mode, kept_libs : Lib.Set.t)` (L7) so that two +modules with the same `kept_libs` share evaluation. + +The final return value is the include args paired with the total dep +set. `Module_compilation.lib_cm_deps` wraps the function with +`Action_builder.dyn_deps`, which registers `dep_set` as the rule's +dynamic deps and returns just the include args for the command line. + +## Data structures + +### `Lib_file_deps.Lib_index.t` + +(`src/dune_rules/lib_file_deps.{ml,mli}`.) Built once per cctx by +`Compilation_context.build_lib_index ~super_context ~libs ~for_`. The +input is a list of `(Module_name.t, Lib.t, Module.t option)` triples: +the entry module's name, the owning library, and the entry's +`Module.t` if available (`None` for wrapped locals and externals). + +The index supports: + +- `filter_libs_with_modules ~referenced_modules` — partition into + tight (per-lib `Module.t list` for the referenced entries) and + non_tight (libs with `None`-entry references). +- `lookup_tight_entries name` — the `(lib, entry)` pairs whose entry + name is `name`, skipping `no_ocamldep`-tagged libs and `None`-entry + rows. Used by the BFS frontier expansion. +- `is_tight_eligible lib` — true iff lib has at least one `Some`- + entry. Used in step 8 to distinguish "drop" from "must glob". +- `wrapped_libs_referenced ~referenced_modules` — the set of wrapped + local libraries whose wrapper module name appears. Used to compute + the wrapped-soundness must-glob input in step 5. + +### `Compilation_context.cached_raw_refs` (L8) + +A per-cctx `Hashtbl` keyed by `Raw_refs.Key.t` that memoises the +`Module_name.Set.t Action_builder.t` for a given module's ocamldep +raw refs. Two distinct keys: + +- `Consumer { obj_name; ml_kind }` — used when reading the consumer + module `M`'s own refs (`ml_kind` distinguishes `Impl` from `Intf` + contexts; the choice of what to read varies per + `need_impl_deps_of`). +- `Transitive { obj_name; cm_kind }` — used when reading a + `trans_deps` member's refs. + +Within a cctx, two compile rules computing narrowing for siblings +that share a transitive dep both hit the same key, paying the +underlying ocamldep read once. + +### `Compilation_context.filtered_includes` (L7) + +A per-cctx `Hashtbl` keyed by `Filtered_includes.Key.t = +{ lib_mode : Lib_mode.t; kept_libs : Lib.Set.t }`. Caches the include +flags Action_builder per `(lib_mode, kept_libs)` tuple. Two modules +within the same cctx whose narrowing produces identical `kept_libs` +share the include-flag computation. + +### `Lib.closure` memo (L9) + +`Lib.closure` is wrapped in a memo keyed by `(linking, for_, libs)` +where `libs` is the input lib list. Two narrowing calls within the +same cctx (or across cctxs) that produce identical `direct_libs` +share the closure computation. + +### `Dep_graph` per cctx, per `Ml_kind` + +(`src/dune_rules/dep_rules.ml`.) Built per cctx as +`Ml_kind.Dict.t Dep_graph.t`. `Dep_graph.deps_of dep_graph m` returns +the within-cctx transitive `Module.t list` for module `m`. Used at +step 1 to enumerate trans_deps before reading their ocamldep raw +refs. + +## Soundness recovery and known edge cases + +Three categories of fallback to the glob: + +1. **Module kinds outside `module_kind_is_filterable`**: `Root`, + `Wrapped_compat`, `Impl_vmodule`, `Virtual`, `Parameter`. Each has + a bespoke dep story handled elsewhere; the `can_filter` gate + rejects them. + +2. **Consumer cctx with `has_virtual_impl = true`**: the cctx itself + has a virtual-impl in `requires`. The BFS currently lacks the + analysis to safely narrow under that condition; the entire cctx + falls back. + +3. **Per-library wrapped or pps-runtime soundness**: handled within + the narrowing by `must_glob_set` (step 5 + step 8). The + classification fold force-globs these libraries even when their + tight classification would otherwise narrow them. + +`Melange` paths bypass the narrowing entirely at the `can_filter` +check; the cm_kind machinery there is different and L9 leaves it +unchanged. + +## Cost characteristics + +Per consumer module, the narrowing does: + +- One ocamldep raw-ref read for the consumer (cached by L8). +- One ocamldep raw-ref read per trans-dep module (cached by L8 — + the read is shared with sibling consumers that share a trans-dep). +- One `Lib.closure` (memoised by L9 — shared with sibling consumers + whose `direct_libs` are identical). +- One BFS over the cross-library reference graph (state machine runs + per module; the individual ocamldep reads inside it hit L8 across + modules). +- Two `filter_libs_with_modules` calls (step 3 and step 7). +- One `filtered_include_flags` lookup (memoised by L7 — shared with + siblings whose `kept_libs` match). + +The visible per-module Dune-level functions are individually small. +The cost is in their *downstream effects* — `Module_name.Set` unions, +`Lib.Set` / `Lib.Map` operations, `Dep.Set` constructions, +`Action_builder` bind chains, and the minor-GC oldification of the +intermediate sets. + +On a null build, the narrowing yields no benefit (no `.cmi` changed, +so no module would have rebuilt either way). The work runs anyway. +The narrowing trades null-build wall time for incremental-build +recompile reduction. + +## Layer-by-layer construction (#14492) + +For reference, the algorithm above is the cumulative result of: + +- **L1** (#14513): test infrastructure prerequisite (merged). +- **L2** (#14514): cctx fields `Lib_index`, `has_virtual_impl`, + `pps_runtime_libs`. +- **L3** (#14515): scaffold — route lib file deps through a per- + module function (`lib_deps_for_module`) that initially returns the + same cctx-wide glob, preparing the call site. +- **L4** (#14516): the BFS itself (`cross_lib_tight_set`, + `lib_deps_for_module` body); the `can_filter` gate is added. +- **L5** (#14517): soundness recovery — + `has_virtual_impl` early-out, `pps_runtime_libs` fold-in, + `must_glob_set` from wrapped-referenced libs. +- **L6** (#14518): `filtered_include_flags` per `kept_libs`. +- **L7** (#14519): `Filtered_includes.t` per-cctx cache keyed by + `(lib_mode, kept_libs)`. +- **L8** (#14520): `cached_raw_refs` per-cctx cache for ocamldep + raw-ref `Action_builder.t`s. +- **L9** (#14521): `Lib.closure` memo keyed by `(linking, for_, + libs)`. + +Each layer can be read in isolation in the PR stack. From 356728c25715418394851a914fa3e42caa09b81b Mon Sep 17 00:00:00 2001 From: Robin Bate Boerop Date: Sun, 17 May 2026 16:42:05 -0700 Subject: [PATCH 8/8] doc: extend narrowing reference with stanza-open BFS edge Documents the fourth class of soundness recovery added in L5: the BFS's per-iteration step now extends the frontier with the modules named by each visited library's stanza [-open] flags, in addition to the entry's impl + intf ocamldep raw refs. The reachability rule the BFS computes becomes a three-way disjunction (consumer references it; reached module's ocamldep names it; reached module's owning lib stanza-opens it). Updates the [cross_lib_tight_set] code snippet's signature (now threads [~mode]), the per-iteration description, the "Soundness recovery and known edge cases" list (adds a fourth class), the L5 layer-summary line, and the cost-characteristics list (adds the per-visited-lib stanza-flags expansion). --- doc/dev/per-module-narrowing.md | 51 ++++++++++++++++++++++++++++++--- 1 file changed, 47 insertions(+), 4 deletions(-) diff --git a/doc/dev/per-module-narrowing.md b/doc/dev/per-module-narrowing.md index 7ac3b2afc18..340453e90fe 100644 --- a/doc/dev/per-module-narrowing.md +++ b/doc/dev/per-module-narrowing.md @@ -312,7 +312,7 @@ classification. ```ocaml let* tight_set = cross_lib_tight_set - ~sandbox ~sctx ~lib_index ~initial_refs:referenced + ~sandbox ~sctx ~lib_index ~mode ~initial_refs:referenced in ``` @@ -342,12 +342,37 @@ Each frontier iteration: entries`. The lookup skips entries with `None`-module (wrapped locals, externals) — those are non-tight-eligible and terminate chains through them. -2. For each `(lib, entry)`, read the entry module's impl + intf - ocamldep raw refs. +2. For each `(lib, entry)`, the helper `read_entry_deps` returns the + union of three contributions: + - the entry module's impl ocamldep raw refs; + - the entry module's intf ocamldep raw refs; + - the modules named by `-open` flags injected through the + entry's owning library's stanza `(flags (...))`. The stanza + flags are expanded via `Ocaml_flags_db.ocaml_flags sctx ~dir` + against the consumer's `mode`, and the `-open M` tokens are + extracted with `Ocaml_flags.extract_open_module_names`. The + short-circuit `Dune_lang.Ocaml_flags.Spec.equal spec standard` + avoids the expansion for libraries that carry the default + (`Spec.standard`), which includes every external lib. 3. Union all the discovered names into `discovered`. 4. Add the current frontier to `seen`; the new frontier is `discovered \ seen`. +The stanza-opens contribution closes a soundness gap that the three +must-glob recoveries (step 5) and the syntactic ocamldep walk +together cannot cover. When a dependency library's stanza injects +`-open Foo`, its source can reference `Foo`'s identifiers without +ever naming `Foo` syntactically, so ocamldep on the dep lib's source +produces no token to drive the walker — and `Foo`'s defining +library may then be silently dropped from the consumer's compile +even though the consumer transitively needs its `.cmi`. Treating +the stanza opens as a third class of frontier edge ensures the BFS +reaches those libraries through the very same fixpoint that handles +ocamldep-visible references. The reachability rule the BFS computes +is: a module is reachable from the consumer iff the consumer +references it, OR some reached module's ocamldep names it, OR some +reached module's owning lib stanza-opens it. + The post-pp module map (built by `Compilation_context.build_lib_ index`) ensures that the entry module passed to ocamldep is the form the dep lib's own compile pipeline produces — so the walker reads @@ -526,6 +551,16 @@ Three categories of fallback to the glob: classification fold force-globs these libraries even when their tight classification would otherwise narrow them. +4. **Stanza-injected `-open` on dep libraries**: handled inside the + BFS itself (step 6). When the walker visits a module of library + `L`, it extends the frontier with the modules named by `L`'s + stanza `-open` flags in addition to the module's ocamldep raw + refs. Without this, a consumer that pattern-matches a type from + a library reached only via an intermediate's stanza `-open` (the + type being inherited into the intermediate's `.mli` via the + open) would silently drop the leaf library's `.cmi` from its + compile rule. + `Melange` paths bypass the narrowing entirely at the `can_filter` check; the cm_kind machinery there is different and L9 leaves it unchanged. @@ -542,6 +577,12 @@ Per consumer module, the narrowing does: - One BFS over the cross-library reference graph (state machine runs per module; the individual ocamldep reads inside it hit L8 across modules). +- Per visited entry, an `Ocaml_flags_db.ocaml_flags` expansion of the + owning lib's stanza `(flags ...)` followed by + `Ocaml_flags.extract_open_module_names`, when the lib's + `stanza_flags` differs from `Spec.standard`. External libs always + carry `Spec.standard`, so the expansion is skipped for them. Each + expansion is memoised through `Ocaml_flags_db`'s underlying Memo. - Two `filter_libs_with_modules` calls (step 3 and step 7). - One `filtered_include_flags` lookup (memoised by L7 — shared with siblings whose `kept_libs` match). @@ -571,7 +612,9 @@ For reference, the algorithm above is the cumulative result of: `lib_deps_for_module` body); the `can_filter` gate is added. - **L5** (#14517): soundness recovery — `has_virtual_impl` early-out, `pps_runtime_libs` fold-in, - `must_glob_set` from wrapped-referenced libs. + `must_glob_set` from wrapped-referenced libs, plus stanza-open + BFS expansion (each visited lib's stanza `-open` modules join + the frontier). - **L6** (#14518): `filtered_include_flags` per `kept_libs`. - **L7** (#14519): `Filtered_includes.t` per-cctx cache keyed by `(lib_mode, kept_libs)`.