fgcz · leoschwarz · Jun 27, 2025 · Jun 27, 2025 · Jun 27, 2025 · Jun 27, 2025
diff --git a/bfabric_app_runner/doc-fragments/01-deploying-python.md b/bfabric_app_runner/doc-fragments/01-deploying-python.md
@@ -0,0 +1,45 @@
+## Deploying python apps
+
+To deploy a python app using uv:
+
+```bash
+uv lock -U
+uv export --no-emit-project --format pylock.toml > pylock.toml
+uv build
+```
+
+- This creates a .whl file and a pylock.toml file.
+- For a reproducible environment you can now specify these two files.
+- The .whl file will contain your code and no dependencies.
+- The pylock.toml file will reproducibly specify the dependencies. -> Caveat, the file has to be named `pylock.toml` or acoording to the standards. This might be improved later to give us more flexibility on our end here.
+
+These files should be copied into a versioned directory in the server/repo.
+
+This information can now be referenced in the YAML for instance this is an example (but you will have to change paths and variables for your use case):
+
+```yaml
+bfabric:
+  app_runner: 0.1.0
+versions:
+  - version:
+      - 4.7.8.dev2
+    commands:
+      dispatch:
+        type: python_env
+        pylock: /home/bfabric/slurmworker/config/A375_MZMINE/dist/${app.version}/pylock.toml
+        local_extra_deps:
+          - /home/bfabric/slurmworker/config/A375_MZMINE/dist/${app.version}/mzmine_app-${app.version}-py3-none-any.whl
+        command: -m mzmine_app.integrations.bfabric.dispatch
+      process:
+        type: python_env
+        pylock: /home/bfabric/slurmworker/config/A375_MZMINE/dist/${app.version}/pylock.toml
+        local_extra_deps:
+          - /home/bfabric/slurmworker/config/A375_MZMINE/dist/${app.version}/mzmine_app-${app.version}-py3-none-any.whl
+        command: -m mzmine_app.integrations.bfabric.process
+        env:
+          MZMINE_CONTAINER_TAG: "4.7.8.p1"
+          MZMINE_DATA_PATH: /home/bfabric/mzmine
+        prepend_paths:
+          - /home/bfabric/slurmworker/config/A375_MZMINE/bin
+          - /home/bfabric/slurmworker/bin
+```
diff --git a/bfabric_app_runner/doc-fragments/02-app-runner-inputs-architecture.md b/bfabric_app_runner/doc-fragments/02-app-runner-inputs-architecture.md
@@ -0,0 +1,150 @@
+Note: this snippet was llm generated
+
+# BFabric App Runner Input Handling Architecture
+
+## Overview
+
+The bfabric_app_runner implements a robust, two-phase input processing pipeline designed for handling diverse input types in a consistent and extensible manner. This document analyzes the current architecture and provides guidelines for extending it.
+
+## Architecture Design
+
+### Two-Phase Pipeline
+
+The input handling system follows a clear separation of concerns:
+
+1. **Resolution Phase** (`inputs/resolve/`): Converts various input specifications to standardized resolved types
+2. **Preparation Phase** (`inputs/prepare/`): Takes resolved inputs and prepares them in the working directory
+
+This design provides several benefits:
+- Clean separation between "what to process" and "how to process it"
+- Consistent handling across different input types
+- Easy testing and validation at each phase
+- Clear extension points for new input types
+
+### Type System
+
+The system uses discriminated unions with Pydantic for robust type checking:
+
+```python
+ResolvedInput = ResolvedFile | ResolvedStaticFile | ResolvedDirectory
+```
+
+Each resolved type contains:
+- `type`: Literal discriminator for type safety
+- `filename`: Target path in working directory
+- Type-specific metadata for processing
+
+### Current Resolved Types
+
+1. **`ResolvedFile`**: Regular files with source locations
+   - Supports local and SSH sources
+   - Handles file copying/linking operations
+   - Checksum validation support
+
+2. **`ResolvedStaticFile`**: In-memory content written to files
+   - String or bytes content
+   - Direct file writing
+   - No source location needed
+
+3. **`ResolvedDirectory`**: Directory inputs (partially implemented)
+   - Supports local and SSH sources
+   - Archive extraction (zip)
+   - File filtering (include/exclude patterns)
+   - Directory structure manipulation (strip_root)
+
+## Implementation Patterns
+
+### Resolver Pattern
+
+The `Resolver` class uses a consistent pattern for handling different input types:
+
+```python
+def resolve_inputs(self, inputs_spec: InputsSpec) -> ResolvedInputs:
+    resolved_inputs = {}
+
+    # Group specs by type and delegate to specialized resolvers
+    for spec_type, specs in self._group_specs_by_type(inputs_spec).items():
+        match spec_type:
+            case "file":
+                resolved_inputs.update(self._resolve_file_specs(specs))
+            case "static_file":
+                resolved_inputs.update(self._resolve_static_file_specs(specs))
+            # Pattern continues for each type...
+```
+
+### Preparation Pattern
+
+The preparation phase uses pattern matching for type-safe dispatch:
+
+```python
+def _prepare_input_files(input_files: ResolvedInputs, working_dir: Path, ssh_user: str | None):
+    for input_file in input_files.inputs.values():
+        match input_file:
+            case ResolvedFile():
+                prepare_resolved_file(file=input_file, working_dir=working_dir, ssh_user=ssh_user)
+            case ResolvedStaticFile():
+                prepare_resolved_static_file(file=input_file, working_dir=working_dir)
+            case ResolvedDirectory():
+                prepare_resolved_directory(file=input_file, working_dir=working_dir, ssh_user=ssh_user)
+```
+
+## Extensibility Design
+
+### Adding New Input Types
+
+The architecture is designed for easy extension. To add a new input type:
+
+1. **Define Input Spec**: Create a new spec class in `specs/inputs/`
+2. **Add Resolved Type**: Define the resolved representation in `resolved_inputs.py`
+3. **Implement Resolver**: Add resolver function following the established pattern
+4. **Implement Preparation**: Add preparation function for the new type
+5. **Update Dispatch**: Add pattern matching cases in resolver and preparation
+
+### Design Principles
+
+1. **Consistency**: All input types follow the same processing pattern
+2. **Type Safety**: Discriminated unions prevent runtime type errors
+3. **Separation**: Clear boundaries between resolution and preparation
+4. **Extensibility**: New types can be added without modifying existing code
+5. **Testability**: Each phase can be tested independently
+
+## Current Implementation Status
+
+### Completed Components
+
+- **Type System**: All resolved types are defined
+- **Dispatch Infrastructure**: Pattern matching is in place
+- **File Types**: `ResolvedFile` and `ResolvedStaticFile` are fully implemented
+- **Integration**: All components work together seamlessly
+
+### Directory Support Status
+
+The directory support infrastructure is largely complete:
+
+- ✅ **`ResolvedDirectory` Type**: Fully defined with rich metadata
+- ✅ **Preparation Dispatch**: Pattern matching case exists
+- ✅ **Preparation Function**: Stub exists but raises `NotImplementedError`
+- ❌ **Input Spec**: No directory input spec type
+- ❌ **Resolver**: No resolver for directory specs
+- ❌ **Implementation**: Preparation function not implemented
+
+This indicates that directory support was planned from the beginning but never completed.
+
+## Complexity Assessment
+
+### Current Complexity
+
+The system handles moderate complexity well:
+
+- **Input Spec Types**: 7 different spec types
+- **Source Types**: Local files, SSH, bfabric resources, static content
+- **Operations**: Copy, link, write, checksum validation
+- **Error Handling**: Comprehensive validation and error reporting
+
+### Design Quality Indicators
+
+1. **Consistent Patterns**: All input types follow the same processing flow
+2. **Clear Abstractions**: Well-defined interfaces between components
+3. **Type Safety**: Strong typing prevents common errors
+4. **Extensible Design**: Easy to add new input types
+5. **Testable**: Each component can be tested in isolation
diff --git a/bfabric_app_runner/doc-fragments/03-command-python-env.md b/bfabric_app_runner/doc-fragments/03-command-python-env.md
@@ -0,0 +1,96 @@
+# CommandPythonEnv System Documentation
+
+## Overview
+
+CommandPythonEnv is a system for creating, managing, and executing Python virtual environments. It supports both cached (persistent) and ephemeral (temporary) environments, with mechanisms for dependency installation, environment provisioning, and command execution.
+
+## Environment Paths
+
+### Base Cache Directory
+
+- Primary location: `$XDG_CACHE_HOME/bfabric_app_runner/` (defaults to `~/.cache/bfabric_app_runner/` if XDG_CACHE_HOME is not set)
+
+### Environment Types
+
+1. **Cached Environments**
+
+    - Path: `$XDG_CACHE_HOME/bfabric_app_runner/envs/<environment_hash>`
+    - The environment hash is generated based on:
+        - Hostname
+        - Python version
+        - Absolute path to pylock file
+        - Modification time of pylock file
+        - Absolute paths of any local extra dependencies (if present)
+
+2. **Ephemeral Environments**
+
+    - Path: `$XDG_CACHE_HOME/bfabric_app_runner/ephemeral/env_<random_suffix>`
+    - Created as temporary directories
+    - Cleaned up after use
+
+### Environment Structure
+
+- Python executable: `<env_path>/bin/python`
+- Bin directory: `<env_path>/bin/`
+- Provisioned marker: `<env_path>/.provisioned`
+- Lock file (for cached envs): `<env_path>.lock`
+
+## Core Logic Flow
+
+1. **Environment Selection**
+
+    - If `refresh` flag is enabled → Use ephemeral environment
+    - Otherwise → Use cached environment
+
+2. **Environment Resolution**
+
+    - For cached environments:
+
+        - Generate environment hash
+        - Check if environment exists at hash path
+        - Use file locking to prevent race conditions
+        - Provision if not already provisioned
+
+    - For ephemeral environments:
+
+        - Create a new temporary directory
+        - Always provision from scratch
+        - Clean up after use
+
+3. **Environment Provisioning Process**
+
+    - Create virtual environment using `uv venv` with specified Python version
+    - Install dependencies from pylock file using `uv pip install`
+    - Install any local extra dependencies with `--no-deps` (if specified)
+    - Mark as provisioned by creating `.provisioned` file
+
+4. **Command Execution**
+
+    - Use the environment's Python executable to run the command
+    - Add the environment's bin directory to the PATH
+    - Execute with any additional arguments passed to the command
+
+## Key Behaviors
+
+1. **Caching Strategy**
+
+    - Environments are identified by their hash, allowing reuse
+    - File locking prevents concurrent provisioning of the same environment
+    - The `.provisioned` marker ensures partially-provisioned environments are not used
+
+2. **Refresh Mode**
+
+    - When enabled, creates a new ephemeral environment for each execution
+    - Ensures clean environments for testing or when dependencies need updating
+    - Automatically cleans up after execution
+
+3. **Path Management**
+
+    - Environment's bin directory is prepended to PATH during execution
+    - Additional prepend_paths can be specified in the command
+
+4. **Dependency Installation**
+
+    - Uses `uv pip install` for fast, reliable dependency installation
+    - Supports reinstallation of packages with the refresh flag
+    - Handles local extra dependencies separately with `--no-deps`
diff --git a/bfabric_app_runner/doc-fragments/04-app-config.md b/bfabric_app_runner/doc-fragments/04-app-config.md
@@ -0,0 +1,90 @@
+# BFabric App Configuration
+
+This document describes how to specify a bfabric-app-runner application in an app.yml file. This file is understood by the bfabric-app-runner submitter integration in B-Fabric, and the path to it is specified as the "program" in the b-fabric executable.
+
+## Configuration Structure
+
+The configuration is defined in YAML format with the following main sections:
+
+### App Runner Version
+
+```yaml
+bfabric:
+  app_runner: 0.2.1
+```
+
+The `app_runner` version (e.g., `0.2.1`) specifies which version to pull from PyPI.
+
+### Application Versions
+
+Multiple application versions can be defined, each with their own command definitions. The version can be specified in bfabric with the `application_version` key parameter.
+
+#### Release Versions
+
+The YAML defines versions of the application, where each version identifier should be unique. To avoid configuration duplication, multiple versions can use the same definition with template variables available:
+
+```yaml
+versions:
+  - version:
+      - 4.7.8.dev3
+      - 4.7.8.dev4
+      - 4.7.8.dev8
+      - 4.7.8.dev9
+```
+
+For release versions, the application uses pre-built wheel files and pylock dependency specifications located in the distribution directory. The `${app.version}` variable is substituted with the actual version number in file paths.
+
+#### Development Version
+
+It can be very useful to add a development version for testing purposes. This version can be named anything (not just `devel`), and each person can have their own development version:
+
+```yaml
+  - version:
+      - devel
+```
+
+The development version loads the application directly from the source code path and includes the `refresh: True` option to enable dynamic reloading during development.
+
+### Commands
+
+Each version defines two main commands:
+
+- **dispatch**: Handles job dispatching operations
+- **process**: Executes the actual processing tasks
+
+Both commands use Python environments with specified dependency locks and can include environment variables and path modifications.
+
+## Build Process
+
+Application packages are created using the following uv commands:
+
+1. **Lock dependencies**: `uv lock -U`
+2. **Export pylock**: `uv export --format pylock.toml --no-export-project > pylock.toml`
+3. **Build wheel**: `uv build`
+
+The resulting wheel and pylock files are then copied into the slurmworker configuration directory and managed with git-lfs.
+
+## Validation
+
+The slurmworker repository contains a noxfile that allows running `nox` to validate all app YAML files for validity, which can be useful when updating configurations.
+
+## Configuration Parameters
+
+### Command Types
+
+Commands can be of different types:
+
+- `python_env`: Recommended for reproducible Python environments. This ensures that the app will be deployed exactly as developed without further modifications.
+- `exec`: For simple shell scripts (refer to the app runner documentation for details)
+
+### Parameters for python_env Commands
+
+- `pylock`: Path to the Python dependency lock file
+- `command`: Python module command to execute
+- `local_extra_deps`: Additional local dependencies (wheels or source paths)
+
+### Optional Parameters
+
+- `refresh`: Enable dynamic reloading (development only)
+- `env`: Environment variables to set (can include application-specific variables)
+- `prepend_paths`: Additional paths to prepend to PATH environment variable