Skip to content

Do Not Merge: Integration Branch for GT4Py Next#1

Draft
philip-paul-mueller wants to merge 59 commits intomainfrom
gt4py-next-integration
Draft

Do Not Merge: Integration Branch for GT4Py Next#1
philip-paul-mueller wants to merge 59 commits intomainfrom
gt4py-next-integration

Conversation

@philip-paul-mueller
Copy link
Copy Markdown

@philip-paul-mueller philip-paul-mueller commented Apr 30, 2025

This is the PR/branch that GT4Py.Next uses to pull DaCe.
It is essentially DaCe main together with our fixes that, for various reasons have not made it yet into DaCe main.

The process for updating this branch is is as follows there are no exceptions:

  • You start with current DaCe main.
  • Then you include the PR that enables automatic Python index update, by squash merge it.
  • Then squash merge the PRs that are listed below, check if they have been merged into DaCe proper and if so remove them from the list.
  • Then update the version.py file in the dace/ subfolder. Make sure that there is no new line at the end. For next we are using the epoch 43, cartesian would use 42. As version number the date is used. Thus the version (for next) would look something like: '43!YYYY.MM.DD'.
  • Force push your changes to this branch (gt4py-next-integration).
  • Create a tag with the pattern __gt4py-next-integration_YYYY_MM_DD and push it as well.
  • Make sure that the workflow has been triggered.

Afterwards you have to update GT4Py's pyproject.toml file.
For this you have to update the version requirement of DaCe in the dace-next group at the beginning of the file to the version you just created, i.e. change it to dace==43!YYYY.MM.DD.
Then you have to update the the source in the uv specific parts of the file, there you have to change the source to the new tag you have just created.
Then you have to update the uv look by running uv sync --extra next --group dace-next, if you have installed the precommit hooks then this will be done automatically.

NOTE: Once PR#2423 has been merged the second step, i.e. adapting the tag in the uv specific parts is no longer needed.

On top of DaCe/main we are using the following PRs:
No open PRs currently, all changes are in dace main.

No Longer Needed

@philip-paul-mueller philip-paul-mueller marked this pull request as draft April 30, 2025 10:04
philip-paul-mueller added a commit to GridTools/gt4py that referenced this pull request Apr 30, 2025
Instead of pulling directly from the official DaCe repo, we now (for the
time being) pull from [this
PR](GridTools/dace#1).
This became necessary as we have a lot of open PR in DaCe and need some
custom fixes (that can in their current form not be merged into DaCe).
In the long term however, we should switch back to the main DaCe repo.
@philip-paul-mueller philip-paul-mueller force-pushed the gt4py-next-integration branch 3 times, most recently from 964e84b to 2d85437 Compare May 26, 2025 05:22
@philip-paul-mueller philip-paul-mueller force-pushed the gt4py-next-integration branch 2 times, most recently from 88c99f4 to d779cd1 Compare June 10, 2025 11:50
@edopao edopao force-pushed the gt4py-next-integration branch from d779cd1 to 4f40029 Compare June 12, 2025 12:46
@edopao edopao force-pushed the gt4py-next-integration branch from 87c77ef to c2a4e42 Compare June 27, 2025 14:08
@edopao edopao force-pushed the gt4py-next-integration branch from 178037a to 9114985 Compare July 14, 2025 08:42
@philip-paul-mueller philip-paul-mueller changed the title Do Not Merge: Integration Branch for GT4Py Do Not Merge: Integration Branch for GT4Py Next Jul 15, 2025
@philip-paul-mueller philip-paul-mueller force-pushed the gt4py-next-integration branch 4 times, most recently from 33b63a1 to 2417e09 Compare July 21, 2025 07:42
@philip-paul-mueller philip-paul-mueller force-pushed the gt4py-next-integration branch 4 times, most recently from 3472895 to bed3b0e Compare July 24, 2025 07:24
romanc and others added 4 commits January 30, 2026 21:28
`MapFusion` is deprecated and `MapFusionVertical` should be used
instead. `MapFusion` remains in the cobase as a compatibility layer for
backwards compatibility.

This PR changes the way how `MapFusion` is deprecated. Previously, a
warning would be emitted whenever the class was loaded (e.g. when access
from `dace/transformation/dataflow/__init__.py`. Depending on how
verbose pytest is configured, one would thus see a deprecation warning
even if `MapFusion` was never actually used/instantiated. The PR
suggests to only emit a warning upon class init.

The unrelated change in `tests/buffer_tiling_test.py` (unused import) is
because I used test in there to make sure the warning disappears.

/cc @philip-paul-mueller FYI

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
In particular, this get rid of a bunch of if/else blocks with breaking
changes in python 3.8 and 3.9.

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
Co-authored-by: Roman Cattaneo <>
The changes include reimplementing `DepthCounter` class for AST
traversal, updating tasklet analysis functions to support both work and
depth metrics, extending the SDFG analysis to include interstate edge
work/depth, and adding tests to validate the new logic.
This includes memlets that target symbols, symbol assignment, and
reading/writing arrays or scalars without an appropriate memlet.
romanc and others added 10 commits February 3, 2026 15:18
…2291)

Upstream tests in NDSL started showing the following warning

```none
UserWarning: You passed `None` as `argnames` to `CompiledSDFG`, but the
SDFG you passed has positional arguments. This is allowed but deprecated.
```

I started digging and discovered that the same warning is also present
in DaCe tests (e..g ` test_nested_duplicate_callback()` in test file
`callback_autodetect_test.py`.

This PR passes along the `argnames` from the `DaceProgram` of the parser
through `load_precompiled_sdfg` such that the information is available
in the `CompiledSDFG` class and the warning is silenced.

Co-authored-by: Roman Cattaneo <>
…#2294)

This PR enables enumerations to contain attributes via definition as a
dataclass. It is also better than the previous `aenum`-based approach
for type checkers and IDEs, as it transparently keeps the enumeration
members. This feature will be useful for nesting attributes and methods
into the classes themselves, improving extensibility. Also enables
support for dataclass serialization/deserialization, and removes `aenum`
as a requirement.

The syntax is as follows (for example):

```python
from dace.attr_enum import ExtensibleAttributeEnum
from enum import auto

class ScheduleType(ExtensibleAttributeEnum):
    Default = auto()  #: Scope-default parallel schedule
    Sequential = auto()  #: Sequential code (single-thread)
    MPI = auto()  #: MPI processes

    @DataClass(frozen=True)
    class CPU_Multicore:
        omp_schedule_type: OMPScheduleType = OMPScheduleType.Default

    # ...
```
Setting `CPU_Multicore = CPUData` to an external dataclass is also
possible.

As a result, `ScheduleType.CPU_Multicore` is now a _template_ enum
member, and `CPU_Multicore(OMPScheduleType.Static)` is an instance.
Registering a new template externally looks like:
```python
ScheduleType.register_template("CPU_Multicore", CPUData)
```
A student had a problem because `np.int8` maps to `char`. Char can be
either unsigned or signed according to the C++ standard
(https://en.cppreference.com/w/cpp/language/types.html). I propose we
either use `int8_t` directly or `signed char`, I updated the dictionary
according to this proposal.
After a brief discussion `subsets.Indices` were deprecated last week
with PR spcl#2282. Since then, many Dace
tests emit warnings because of remaining usage of `Indices` in the
offset member functions of `subsets.Range`.

This PR suggests to adapt `Range.from_indices()` to add support for a
sequence of (symbolic) numbers or strings (as suggested in Mattermost).
This allows to remove the remaining usage of `subsets.Indices`
constructors in the DaCe codebase, which gets rid of a bunch of warnings
emitted in test or upstream/user code.

Only hickup that I had doing this was the function `_add_read_slice()` ,
called from `visit_Subscript()` of the `ProgramVisitor` in `newast.py` .
That function would check for subsets to be either ranges or indices.
And if subsets were indices, we'd go another way. That code path
separation is apparently loosely tied to some other place in the
codebase because we'd get errors if we were going the sub-optimal
ranges-path with indices. I do now check if ranges are indices and set
the flag accordingly. That seems to fix issues in tests.

I've also checked (manually) all other cases where we'd go a different
code path in case subsets are indices. There are some and the remaining
ones all "upgrade" indices to ranges. They can be removed once we remove
the deprecated `Indices` class.

---------

Co-authored-by: Roman Cattaneo <>
Co-authored-by: Tal Ben-Nun <tbennun@gmail.com>
This PR replaces `is_start_state` -> `is_start_block` because the former
is deprecated. The PR is part of an ongoing effort to reduce warnings
emitted in tests.

Unrelated to this change, the PR removes unused imports and fixes a
couple of typos in changed files.

---------

Co-authored-by: Roman Cattaneo <>
Updated GitHub Actions dependencies to the latest versions. No breakage
is expected. I've checked the logs and most of them just updated to
node20, which is a breaking changes because it requires an up-to-date
runner. Since we rely GitHub's runners, this should be no problem.

Co-authored-by: Roman Cattaneo <>
Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>
Reduces the number of warnings
The default configuration of CUDA MPS does not support the number of
pytest workers (32) used by the CI job. Besides, CUDA MPS is not needed
because the GPU is not configured in exclusive mode.
A student gets a CMake error on some older version.

```
CMake Error at CMakeLists.txt:191 (if):
if given arguments:
     "3.28.3" "VERSION_LESS"
Unknown arguments specified
```

I think it is better to just check if the variable is defined before, so
as not have a missing argument error.
`MapFusionVertical` must create new data (reduced versions of the
intermediate data) and hence name it. Before it was using a naming
scheme based on the node id, which might not be stable.
The new scheme uses the name of the intermediate data and guarantees
stable names for exclusive intermediate nodes and for shared
intermediate nodes under the condition that they are only involved in
one MapFusionVertical operation.

---------

Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>
@edopao edopao force-pushed the gt4py-next-integration branch from 87cc16f to c702c0a Compare February 12, 2026 10:11
romanc and others added 13 commits February 12, 2026 10:12
spcl#2298)

Get the number of warnings in tests down by avoiding usage of
`state.add_*` functions like `state.add_array(...)`.

---------

Co-authored-by: Roman Cattaneo <>
The `ControlFlowReachability` pass gets prohibitively expensive in
particular graphs. Updating from `v1/maintenance` to current `main`, we
have seen `simplify()` runtimes of 10-15 minutes where previous runtime
was in the order of magnitude of tens of seconds.

The slowdown turned out to be caused by not caching closures per region.
Some of our graphs generate large, nested control flow regions (if
statements) from iterative solvers with conditional returns that we map
to if/else blocks with a boolean mask. In such a scenario, having four
layers of nestedness is easily achieved and then `_region_closure()`
gets called again and again for previously calculated closures of
regions. Because of the transitive requirement of theses closures,
control flow regions nested deep will have to "go up" an re-evaluate the
same closures for "upper" regions again and again. This PR suggest a
simple cache of closures per region to avoid this duplicate evaluation.

Co-authored-by: Roman Cattaneo <>
Reduces SDFG size when serialized, using the following methods:

* Non-human-readable JSON dumping by default
* Consolidating file names in DebugInfo to be per-SDFG
* Reducing the size of the DebugInfo JSON object based on fields
* Saving transformation history set to off by default
This PR suggest to write a `CACHEDIR.TAG` file into the program folder.
The tag is an attempt to signal (e.g. to backup software) that the
containing folder contains no archival value, see
http://www.brynosaurus.com/cachedir/.

While the convention started with cache directories for things like
thumbnails of a webbrowser, I'd argue the same argumentation (no
archival value, frequent changes, un-suiteable to be located in
`/var/cache` or `/tmp`) apply for build folders.

Instead of writing the file by hand, we could also a library like
https://pypi.org/project/cachedir-tag/.

---------

Co-authored-by: Roman Cattaneo <>
Set is not hashable doesn't work with lru cache decorator, I propose
using FrozenSet here.
Some parts of DaCe are currently relying on `six`, a python2 / python3
compatibility library. Given that DaCe is only supporting python 3.10 -
3.14 now, I think we don't need the `six` dependency anymore.
Following up on PR spcl#2312, this PR
proposes to save compressed SDFGs in the program folder. As discussed in
the last meeting:

- no change to the API, i.e. we keep keyword arguments of `sdfg.save()`
as they are.
- save `program.sdfg` as compressed `program.sdfgz` inside the program
folder
- make sure that changes are backwards compatible and that the program
folder is still found regardless of `program.sdfg` or `program.sdfgz`

---------

Co-authored-by: Roman Cattaneo <>
If we unroll a top-level for CFG, then the connectivity might be broken,
I added a unit test and the fix to it.

Replace dict on loop does not properly update the init statement, which
can be exposed by loop unrolling when parent loop parameter is used
inside in an inner loop. I fixed it and added a unit test in loop unroll
that would expose it.
`Range` subsets have a `reorder()` function that re-orders the
dimensions in-place. So far, it only re-ordered the ranges, but not the
tile sizes (which are stored in a separate list). This PR makes sure
both, ranges and tile sizes, are re-ordered according to the given
permutation. The PR adds a simple test case.
@edopao edopao force-pushed the gt4py-next-integration branch from c702c0a to 4efefa7 Compare March 31, 2026 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants