Skip to content

Add fesom2_ci image with OASIS and XIOS preinstalled#3

Merged
JanStreffing merged 11 commits into
masterfrom
feat/fesom2-ci
May 1, 2026
Merged

Add fesom2_ci image with OASIS and XIOS preinstalled#3
JanStreffing merged 11 commits into
masterfrom
feat/fesom2-ci

Conversation

@JanStreffing
Copy link
Copy Markdown
Contributor

@JanStreffing JanStreffing commented May 1, 2026

Summary

Adds a new fesom2_ci/Dockerfile so the FESOM/fesom2 build-test workflow can move off the externally-owned ghcr.io/suvarchal/fesom2-ci image onto a FESOM-org-owned one published by this repo's existing docker-publish.yml.

The image reproduces the toolchain currently provided by ghcr.io/suvarchal/fesom2-ci (ubuntu 22.04 + GCC + OpenMPI + NetCDF/HDF5 + LAPACK + OASIS3-MCT prebuilt to /oasis) and additionally installs XIOS 2.5 at /xios. The XIOS layer is the actual reason for the new image — FESOM/fesom2 PR #901 adds an io_xios_field_is_active-driven I/O cadence path which currently has no CI coverage; once this image exists, that PR can flip on a new xios matrix entry that runs cmake --preset xios against XIOS_ROOT=$PWD/../xios.

The recipe (apt list, env vars, OASIS clone+cmake) was reverse-engineered from the existing ghcr.io/suvarchal/fesom2-ci:latest image config layer to keep behavior identical for the four existing matrix cells (default, coupled, coupled_yac, recom, ifs_interface). XIOS is built with make_xios --arch GCC_LINUX --netcdf_lib netcdf4_seq — sequential NetCDF is sufficient because CI only build-and-link-tests the XIOS path, it does not run XIOS at runtime.

Notes

  • OASIS source: gitlab.dkrz.de/ec-earth/oasis3-mct.git (matches what the existing image is presumed to use).
  • XIOS source: gitlab.dkrz.de/ec-earth/xios-2.5-ece.git, the FESOM-validated 2.5 fork. git ls-remote confirms public-readable.
  • This PR does not delete or alter the existing fesom2_test/, fesom2_test_refactoring/, or pyfesom2/ directories.

Reproduces the toolchain currently provided by ghcr.io/suvarchal/fesom2-ci
(ubuntu 22.04 + gfortran/openmpi/netcdf/hdf5/lapack + OASIS3-MCT cmake-built
to /oasis) and adds an XIOS 2.5 install at /xios so the FESOM XIOS-on
build path can be exercised in CI. The repo's existing docker-publish.yml
auto-discovers this directory and publishes ghcr.io/fesom/fesom2_docker:fesom2_ci-*.
@JanStreffing JanStreffing requested a review from koldunovn May 1, 2026 17:05
@JanStreffing JanStreffing marked this pull request as draft May 1, 2026 17:06
@JanStreffing JanStreffing self-assigned this May 1, 2026
The upstream arch-GCC_LINUX.env hard-codes $HOME/hdf5 and $HOME/netcdf4
which doesn't exist in this image. Sourcing it from make_xios overrides
any ENV we set in the Dockerfile, so we have to rewrite the file in place
to point at the Ubuntu serial library paths before invoking make_xios.
GHA runners can't reach gitlab.dkrz.de (the original Dockerfile failed at
'git clone --depth 1 ...oasis3-mct.git' with exit 128). Suvarchal's image
worked around this by COPYing source from the build context, so do the same:
commit OASIS3-MCT (4.7 MB) and a trimmed XIOS 2.5 (20 MB) tree next to the
Dockerfile, switch from 'git clone' to 'COPY'.

XIOS trimming to fit the repo:
  - extern/boost (65 MB) -> use system libboost-dev via --use_extern_boost
  - extern/blitz (3 MB)  -> use system libblitz0-dev via --use_extern_blitz
  - tools/FCM_OLD        -> use FCM_NEW via --fcm new
  - tools/archive/, inputs/, doc/, generic_testcase/, xios_test_suite/
    -> dead weight for a build-only image

Source: gitlab.dkrz.de/ec-earth/{oasis3-mct,xios-2.5-ece} HEAD as of today,
mirrored from a Levante checkout (which can reach DKRZ).
The dkrz ec-earth/oasis3-mct.git fork is the older Makefile-only OASIS3-MCT
(no CMakeLists.txt at the root), which is why the previous Dockerfile build
failed with 'cmake ../' exit 1. The CMake-enabled OASIS used by the awiesm3
stack lives at git.smhi.se/jan.streffing/oasis3-mct-5; baking in the
local_combined_fixes branch (HEAD 6ca32854) since that's what FESOM2 in
awiesm3-develop is currently linked against.

Net commit size for fesom2_ci/: 25 MB -> 31 MB.
XIOS's bld.cfg hardcodes -I${PWD}/extern/{boost,blitz}/include in cppflags,
fppflags and cflags. --use_extern_boost/blitz only rm+symlink the dirs to
.void_dir without rewriting bld.cfg, so cpp -I points at empty dirs and
the build fails. Easier to bake the headers in (extern/boost is 65 MB of
header-only boost; extern/blitz is 3 MB) than to monkey-patch bld.cfg.

Net commit size for fesom2_ci/: 31 MB -> 99 MB. Still well under github's
per-file 100 MB limit (largest file: ~80 KB).
xios root has a .gitignore meant for the upstream developer build flow,
exclusing 'lib/', 'inc/', 'obj/', etc. Those patterns are unanchored and
match anywhere in the tree, so 'git add fesom2_ci/xios' silently dropped
tools/FCM_NEW/lib/, which contains the FCM::CLI Perl module fcm needs.
Result: fcm bombed with 'Can't locate FCM/CLI.pm in @inc' inside docker.

Drop the .gitignore entirely (irrelevant for a baked-in source), restore
FCM_NEW/lib from a fresh upstream clone, and pick up a handful of
extern/blitz Makefiles that the same .gitignore had hidden.
…cancel siblings

The pyfesom2 image's py37 conda env no longer resolves (the workflow had
been auto-disabled for >60 days, dependencies have moved), and that
failure was cancelling the in-flight fesom2_ci build. Independent matrix
cells make CI signal usable again.
XIOS 2.5 uses C++11 lambdas as std::sort comparators (e.g. in
distribute_file_server2.cpp:60). Upstream arch-GCC_LINUX.fcm sets
'-ansi -w' which forces C++98 mode and rejects 'template argument uses
local type' for the lambda. Mirror what arch-ESMTOOLS_generic_oasis_gcc
does and add -std=c++11; the trailing -std= overrides the C++ side of
-ansi while keeping it harmless on .c files.
…3.11

The cloned ci/requirements-py37.yml could no longer be solved by mamba:
fsspec==2021.06.1 has no conda-forge build for any actively supported
Python, and python<=3.9 + scipy<=1.9 fall outside what the channel still
keeps. Overwrite the file in place with an unpinned env targeting Python
3.11 before mamba env update. Keeps the conda+mamba+pip layout of the
image intact for downstream users.
Recent conda versions refuse 'conda install' from repo.anaconda.com/pkgs/main
and pkgs/r non-interactively until the user accepts the TOS, which is
impossible in docker build (CondaToSNonInteractiveError). miniforge is the
community installer that defaults to conda-forge, has no TOS gate, and ships
mamba in the base env so we don't need 'conda install -y mamba' anymore.
@JanStreffing JanStreffing marked this pull request as ready for review May 1, 2026 19:33
@JanStreffing JanStreffing merged commit 1964431 into master May 1, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant