diff --git a/bfabric_app_runner/doc-fragments/01-deploying-python.md b/bfabric_app_runner/doc-fragments/01-deploying-python.md new file mode 100644 index 000000000..674b354f5 --- /dev/null +++ b/bfabric_app_runner/doc-fragments/01-deploying-python.md @@ -0,0 +1,45 @@ +## Deploying python apps + +To deploy a python app using uv: + +```bash +uv lock -U +uv export --no-emit-project --format pylock.toml > pylock.toml +uv build +``` + +- This creates a .whl file and a pylock.toml file. +- For a reproducible environment you can now specify these two files. +- The .whl file will contain your code and no dependencies. +- The pylock.toml file will reproducibly specify the dependencies. -> Caveat, the file has to be named `pylock.toml` or acoording to the standards. This might be improved later to give us more flexibility on our end here. + +These files should be copied into a versioned directory in the server/repo. + +This information can now be referenced in the YAML for instance this is an example (but you will have to change paths and variables for your use case): + +```yaml +bfabric: + app_runner: 0.1.0 +versions: + - version: + - 4.7.8.dev2 + commands: + dispatch: + type: python_env + pylock: /home/bfabric/slurmworker/config/A375_MZMINE/dist/${app.version}/pylock.toml + local_extra_deps: + - /home/bfabric/slurmworker/config/A375_MZMINE/dist/${app.version}/mzmine_app-${app.version}-py3-none-any.whl + command: -m mzmine_app.integrations.bfabric.dispatch + process: + type: python_env + pylock: /home/bfabric/slurmworker/config/A375_MZMINE/dist/${app.version}/pylock.toml + local_extra_deps: + - /home/bfabric/slurmworker/config/A375_MZMINE/dist/${app.version}/mzmine_app-${app.version}-py3-none-any.whl + command: -m mzmine_app.integrations.bfabric.process + env: + MZMINE_CONTAINER_TAG: "4.7.8.p1" + MZMINE_DATA_PATH: /home/bfabric/mzmine + prepend_paths: + - /home/bfabric/slurmworker/config/A375_MZMINE/bin + - /home/bfabric/slurmworker/bin +``` diff --git a/bfabric_app_runner/doc-fragments/02-app-runner-inputs-architecture.md b/bfabric_app_runner/doc-fragments/02-app-runner-inputs-architecture.md new file mode 100644 index 000000000..cec6b9a89 --- /dev/null +++ b/bfabric_app_runner/doc-fragments/02-app-runner-inputs-architecture.md @@ -0,0 +1,150 @@ +Note: this snippet was llm generated + +# BFabric App Runner Input Handling Architecture + +## Overview + +The bfabric_app_runner implements a robust, two-phase input processing pipeline designed for handling diverse input types in a consistent and extensible manner. This document analyzes the current architecture and provides guidelines for extending it. + +## Architecture Design + +### Two-Phase Pipeline + +The input handling system follows a clear separation of concerns: + +1. **Resolution Phase** (`inputs/resolve/`): Converts various input specifications to standardized resolved types +2. **Preparation Phase** (`inputs/prepare/`): Takes resolved inputs and prepares them in the working directory + +This design provides several benefits: +- Clean separation between "what to process" and "how to process it" +- Consistent handling across different input types +- Easy testing and validation at each phase +- Clear extension points for new input types + +### Type System + +The system uses discriminated unions with Pydantic for robust type checking: + +```python +ResolvedInput = ResolvedFile | ResolvedStaticFile | ResolvedDirectory +``` + +Each resolved type contains: +- `type`: Literal discriminator for type safety +- `filename`: Target path in working directory +- Type-specific metadata for processing + +### Current Resolved Types + +1. **`ResolvedFile`**: Regular files with source locations + - Supports local and SSH sources + - Handles file copying/linking operations + - Checksum validation support + +2. **`ResolvedStaticFile`**: In-memory content written to files + - String or bytes content + - Direct file writing + - No source location needed + +3. **`ResolvedDirectory`**: Directory inputs (partially implemented) + - Supports local and SSH sources + - Archive extraction (zip) + - File filtering (include/exclude patterns) + - Directory structure manipulation (strip_root) + +## Implementation Patterns + +### Resolver Pattern + +The `Resolver` class uses a consistent pattern for handling different input types: + +```python +def resolve_inputs(self, inputs_spec: InputsSpec) -> ResolvedInputs: + resolved_inputs = {} + + # Group specs by type and delegate to specialized resolvers + for spec_type, specs in self._group_specs_by_type(inputs_spec).items(): + match spec_type: + case "file": + resolved_inputs.update(self._resolve_file_specs(specs)) + case "static_file": + resolved_inputs.update(self._resolve_static_file_specs(specs)) + # Pattern continues for each type... +``` + +### Preparation Pattern + +The preparation phase uses pattern matching for type-safe dispatch: + +```python +def _prepare_input_files(input_files: ResolvedInputs, working_dir: Path, ssh_user: str | None): + for input_file in input_files.inputs.values(): + match input_file: + case ResolvedFile(): + prepare_resolved_file(file=input_file, working_dir=working_dir, ssh_user=ssh_user) + case ResolvedStaticFile(): + prepare_resolved_static_file(file=input_file, working_dir=working_dir) + case ResolvedDirectory(): + prepare_resolved_directory(file=input_file, working_dir=working_dir, ssh_user=ssh_user) +``` + +## Extensibility Design + +### Adding New Input Types + +The architecture is designed for easy extension. To add a new input type: + +1. **Define Input Spec**: Create a new spec class in `specs/inputs/` +2. **Add Resolved Type**: Define the resolved representation in `resolved_inputs.py` +3. **Implement Resolver**: Add resolver function following the established pattern +4. **Implement Preparation**: Add preparation function for the new type +5. **Update Dispatch**: Add pattern matching cases in resolver and preparation + +### Design Principles + +1. **Consistency**: All input types follow the same processing pattern +2. **Type Safety**: Discriminated unions prevent runtime type errors +3. **Separation**: Clear boundaries between resolution and preparation +4. **Extensibility**: New types can be added without modifying existing code +5. **Testability**: Each phase can be tested independently + +## Current Implementation Status + +### Completed Components + +- **Type System**: All resolved types are defined +- **Dispatch Infrastructure**: Pattern matching is in place +- **File Types**: `ResolvedFile` and `ResolvedStaticFile` are fully implemented +- **Integration**: All components work together seamlessly + +### Directory Support Status + +The directory support infrastructure is largely complete: + +- ✅ **`ResolvedDirectory` Type**: Fully defined with rich metadata +- ✅ **Preparation Dispatch**: Pattern matching case exists +- ✅ **Preparation Function**: Stub exists but raises `NotImplementedError` +- ❌ **Input Spec**: No directory input spec type +- ❌ **Resolver**: No resolver for directory specs +- ❌ **Implementation**: Preparation function not implemented + +This indicates that directory support was planned from the beginning but never completed. + +## Complexity Assessment + +### Current Complexity + +The system handles moderate complexity well: + +- **Input Spec Types**: 7 different spec types +- **Source Types**: Local files, SSH, bfabric resources, static content +- **Operations**: Copy, link, write, checksum validation +- **Error Handling**: Comprehensive validation and error reporting + +### Design Quality Indicators + +1. **Consistent Patterns**: All input types follow the same processing flow +2. **Clear Abstractions**: Well-defined interfaces between components +3. **Type Safety**: Strong typing prevents common errors +4. **Extensible Design**: Easy to add new input types +5. **Testable**: Each component can be tested in isolation diff --git a/bfabric_app_runner/doc-fragments/03-command-python-env.md b/bfabric_app_runner/doc-fragments/03-command-python-env.md new file mode 100644 index 000000000..a2b080001 --- /dev/null +++ b/bfabric_app_runner/doc-fragments/03-command-python-env.md @@ -0,0 +1,96 @@ +# CommandPythonEnv System Documentation + +## Overview + +CommandPythonEnv is a system for creating, managing, and executing Python virtual environments. It supports both cached (persistent) and ephemeral (temporary) environments, with mechanisms for dependency installation, environment provisioning, and command execution. + +## Environment Paths + +### Base Cache Directory + +- Primary location: `$XDG_CACHE_HOME/bfabric_app_runner/` (defaults to `~/.cache/bfabric_app_runner/` if XDG_CACHE_HOME is not set) + +### Environment Types + +1. **Cached Environments** + + - Path: `$XDG_CACHE_HOME/bfabric_app_runner/envs/` + - The environment hash is generated based on: + - Hostname + - Python version + - Absolute path to pylock file + - Modification time of pylock file + - Absolute paths of any local extra dependencies (if present) + +2. **Ephemeral Environments** + + - Path: `$XDG_CACHE_HOME/bfabric_app_runner/ephemeral/env_` + - Created as temporary directories + - Cleaned up after use + +### Environment Structure + +- Python executable: `/bin/python` +- Bin directory: `/bin/` +- Provisioned marker: `/.provisioned` +- Lock file (for cached envs): `.lock` + +## Core Logic Flow + +1. **Environment Selection** + + - If `refresh` flag is enabled → Use ephemeral environment + - Otherwise → Use cached environment + +2. **Environment Resolution** + + - For cached environments: + + - Generate environment hash + - Check if environment exists at hash path + - Use file locking to prevent race conditions + - Provision if not already provisioned + + - For ephemeral environments: + + - Create a new temporary directory + - Always provision from scratch + - Clean up after use + +3. **Environment Provisioning Process** + + - Create virtual environment using `uv venv` with specified Python version + - Install dependencies from pylock file using `uv pip install` + - Install any local extra dependencies with `--no-deps` (if specified) + - Mark as provisioned by creating `.provisioned` file + +4. **Command Execution** + + - Use the environment's Python executable to run the command + - Add the environment's bin directory to the PATH + - Execute with any additional arguments passed to the command + +## Key Behaviors + +1. **Caching Strategy** + + - Environments are identified by their hash, allowing reuse + - File locking prevents concurrent provisioning of the same environment + - The `.provisioned` marker ensures partially-provisioned environments are not used + +2. **Refresh Mode** + + - When enabled, creates a new ephemeral environment for each execution + - Ensures clean environments for testing or when dependencies need updating + - Automatically cleans up after execution + +3. **Path Management** + + - Environment's bin directory is prepended to PATH during execution + - Additional prepend_paths can be specified in the command + +4. **Dependency Installation** + + - Uses `uv pip install` for fast, reliable dependency installation + - Supports reinstallation of packages with the refresh flag + - Handles local extra dependencies separately with `--no-deps` diff --git a/bfabric_app_runner/doc-fragments/04-app-config.md b/bfabric_app_runner/doc-fragments/04-app-config.md new file mode 100644 index 000000000..99b42463a --- /dev/null +++ b/bfabric_app_runner/doc-fragments/04-app-config.md @@ -0,0 +1,90 @@ +# BFabric App Configuration + +This document describes how to specify a bfabric-app-runner application in an app.yml file. This file is understood by the bfabric-app-runner submitter integration in B-Fabric, and the path to it is specified as the "program" in the b-fabric executable. + +## Configuration Structure + +The configuration is defined in YAML format with the following main sections: + +### App Runner Version + +```yaml +bfabric: + app_runner: 0.2.1 +``` + +The `app_runner` version (e.g., `0.2.1`) specifies which version to pull from PyPI. + +### Application Versions + +Multiple application versions can be defined, each with their own command definitions. The version can be specified in bfabric with the `application_version` key parameter. + +#### Release Versions + +The YAML defines versions of the application, where each version identifier should be unique. To avoid configuration duplication, multiple versions can use the same definition with template variables available: + +```yaml +versions: + - version: + - 4.7.8.dev3 + - 4.7.8.dev4 + - 4.7.8.dev8 + - 4.7.8.dev9 +``` + +For release versions, the application uses pre-built wheel files and pylock dependency specifications located in the distribution directory. The `${app.version}` variable is substituted with the actual version number in file paths. + +#### Development Version + +It can be very useful to add a development version for testing purposes. This version can be named anything (not just `devel`), and each person can have their own development version: + +```yaml + - version: + - devel +``` + +The development version loads the application directly from the source code path and includes the `refresh: True` option to enable dynamic reloading during development. + +### Commands + +Each version defines two main commands: + +- **dispatch**: Handles job dispatching operations +- **process**: Executes the actual processing tasks + +Both commands use Python environments with specified dependency locks and can include environment variables and path modifications. + +## Build Process + +Application packages are created using the following uv commands: + +1. **Lock dependencies**: `uv lock -U` +2. **Export pylock**: `uv export --format pylock.toml --no-export-project > pylock.toml` +3. **Build wheel**: `uv build` + +The resulting wheel and pylock files are then copied into the slurmworker configuration directory and managed with git-lfs. + +## Validation + +The slurmworker repository contains a noxfile that allows running `nox` to validate all app YAML files for validity, which can be useful when updating configurations. + +## Configuration Parameters + +### Command Types + +Commands can be of different types: + +- `python_env`: Recommended for reproducible Python environments. This ensures that the app will be deployed exactly as developed without further modifications. +- `exec`: For simple shell scripts (refer to the app runner documentation for details) + +### Parameters for python_env Commands + +- `pylock`: Path to the Python dependency lock file +- `command`: Python module command to execute +- `local_extra_deps`: Additional local dependencies (wheels or source paths) + +### Optional Parameters + +- `refresh`: Enable dynamic reloading (development only) +- `env`: Environment variables to set (can include application-specific variables) +- `prepend_paths`: Additional paths to prepend to PATH environment variable diff --git a/bfabric_app_runner/doc-fragments/05-app-runner-overview-20250724.md b/bfabric_app_runner/doc-fragments/05-app-runner-overview-20250724.md new file mode 100644 index 000000000..4fa2d0ed6 --- /dev/null +++ b/bfabric_app_runner/doc-fragments/05-app-runner-overview-20250724.md @@ -0,0 +1,23 @@ +**App Runner Notes:** + +- **Update recommended**: If you need to debug an App Runner app, I recommend defining `bfabric_runner: 0.2.1` in the app's app.yml if this isn't already the case (spectronaut, mzmine). + - If you have an old Makefile, it should be sufficient to run: "uv venv -p 3.13 && source .venv/bin/activate && uv pip install bfabric-app-runner" and then continue with make help. +- With version 0.2.1, the Makefile automatically provides the correct version of bfabric-app-runner using uv. + - "make help" gives you information about everything following + - "make dispatch" loads initial information from bfabric, determines which resources are needed. generates inputs.yml files + - "make inputs" loads the required files based on inputs.yml + - "make process" the actual work, e.g. snakemake + - "make stage" uploads the results +- In slurmworker you can run nox to validate that the YAML files are correctly structured (`nox`, or, `uv tool run nox`). +- For problems: + - A step temporarily failed: "make run-all" or "make process", "make stage", etc. + - A change can be made in an intermediate step: Edit the file and run "make process". Sometimes you need to delete more files. + - The app needs to be completely changed: in this case I recommend adapting the "devel" version (see mzmine or spectronaut app.yml), assigning the path to your own path and editing the code there. Either create a new workunit again, or, "uv tool run bfabric-app-runner@0.2.1 prepare workunit --force-app-version devel" and specify the name of your version under "devel"). +- App Yaml Format + - https://github.com/fgcz/bfabricPy/blob/8584e2b17b1c560f43699db2215944798e5500bb/bfabric_app_runner/doc-fragments/04-app-config.md + - (this link is not permanent, because I'm currently collecting snippets for the new documentation) +- App Runner Interface: + - "uv tool run bfabric-app-runner@0.2.1" or "uv tool run bfabric-app-runner@0.2.1 --help" should already explain quite a bit + - "uv tool run bfabric-app-runner@0.2.1 prepare workunit" to prepare the folder with Makefile + - "uv tool run bfabric-app-runner@0.2.1 run workunit" is used in the Slurm script, runs the app from start to finish. + - There are more commands but I personally only use these and the Makefile.