Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions bfabric_app_runner/doc-fragments/01-deploying-python.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
## Deploying python apps

To deploy a python app using uv:

```bash
uv lock -U
uv export --no-emit-project --format pylock.toml > pylock.toml
uv build
```

- This creates a .whl file and a pylock.toml file.
- For a reproducible environment you can now specify these two files.
- The .whl file will contain your code and no dependencies.
- The pylock.toml file will reproducibly specify the dependencies. -> Caveat, the file has to be named `pylock.toml` or acoording to the standards. This might be improved later to give us more flexibility on our end here.

These files should be copied into a versioned directory in the server/repo.

This information can now be referenced in the YAML for instance this is an example (but you will have to change paths and variables for your use case):

```yaml
bfabric:
app_runner: 0.1.0
versions:
- version:
- 4.7.8.dev2
commands:
dispatch:
type: python_env
pylock: /home/bfabric/slurmworker/config/A375_MZMINE/dist/${app.version}/pylock.toml
local_extra_deps:
- /home/bfabric/slurmworker/config/A375_MZMINE/dist/${app.version}/mzmine_app-${app.version}-py3-none-any.whl
command: -m mzmine_app.integrations.bfabric.dispatch
process:
type: python_env
pylock: /home/bfabric/slurmworker/config/A375_MZMINE/dist/${app.version}/pylock.toml
local_extra_deps:
- /home/bfabric/slurmworker/config/A375_MZMINE/dist/${app.version}/mzmine_app-${app.version}-py3-none-any.whl
command: -m mzmine_app.integrations.bfabric.process
env:
MZMINE_CONTAINER_TAG: "4.7.8.p1"
MZMINE_DATA_PATH: /home/bfabric/mzmine
prepend_paths:
- /home/bfabric/slurmworker/config/A375_MZMINE/bin
- /home/bfabric/slurmworker/bin
```
150 changes: 150 additions & 0 deletions bfabric_app_runner/doc-fragments/02-app-runner-inputs-architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
Note: this snippet was llm generated

# BFabric App Runner Input Handling Architecture

## Overview

The bfabric_app_runner implements a robust, two-phase input processing pipeline designed for handling diverse input types in a consistent and extensible manner. This document analyzes the current architecture and provides guidelines for extending it.

## Architecture Design

### Two-Phase Pipeline

The input handling system follows a clear separation of concerns:

1. **Resolution Phase** (`inputs/resolve/`): Converts various input specifications to standardized resolved types
2. **Preparation Phase** (`inputs/prepare/`): Takes resolved inputs and prepares them in the working directory

This design provides several benefits:
- Clean separation between "what to process" and "how to process it"
- Consistent handling across different input types
- Easy testing and validation at each phase
- Clear extension points for new input types

### Type System

The system uses discriminated unions with Pydantic for robust type checking:

```python
ResolvedInput = ResolvedFile | ResolvedStaticFile | ResolvedDirectory
```

Each resolved type contains:
- `type`: Literal discriminator for type safety
- `filename`: Target path in working directory
- Type-specific metadata for processing

### Current Resolved Types

1. **`ResolvedFile`**: Regular files with source locations
- Supports local and SSH sources
- Handles file copying/linking operations
- Checksum validation support

2. **`ResolvedStaticFile`**: In-memory content written to files
- String or bytes content
- Direct file writing
- No source location needed

3. **`ResolvedDirectory`**: Directory inputs (partially implemented)
- Supports local and SSH sources
- Archive extraction (zip)
- File filtering (include/exclude patterns)
- Directory structure manipulation (strip_root)

## Implementation Patterns

### Resolver Pattern

The `Resolver` class uses a consistent pattern for handling different input types:

```python
def resolve_inputs(self, inputs_spec: InputsSpec) -> ResolvedInputs:
resolved_inputs = {}

# Group specs by type and delegate to specialized resolvers
for spec_type, specs in self._group_specs_by_type(inputs_spec).items():
match spec_type:
case "file":
resolved_inputs.update(self._resolve_file_specs(specs))
case "static_file":
resolved_inputs.update(self._resolve_static_file_specs(specs))
# Pattern continues for each type...
```

### Preparation Pattern

The preparation phase uses pattern matching for type-safe dispatch:

```python
def _prepare_input_files(input_files: ResolvedInputs, working_dir: Path, ssh_user: str | None):
for input_file in input_files.inputs.values():
match input_file:
case ResolvedFile():
prepare_resolved_file(file=input_file, working_dir=working_dir, ssh_user=ssh_user)
case ResolvedStaticFile():
prepare_resolved_static_file(file=input_file, working_dir=working_dir)
case ResolvedDirectory():
prepare_resolved_directory(file=input_file, working_dir=working_dir, ssh_user=ssh_user)
```

## Extensibility Design

### Adding New Input Types

The architecture is designed for easy extension. To add a new input type:

1. **Define Input Spec**: Create a new spec class in `specs/inputs/`
2. **Add Resolved Type**: Define the resolved representation in `resolved_inputs.py`
3. **Implement Resolver**: Add resolver function following the established pattern
4. **Implement Preparation**: Add preparation function for the new type
5. **Update Dispatch**: Add pattern matching cases in resolver and preparation

### Design Principles

1. **Consistency**: All input types follow the same processing pattern
2. **Type Safety**: Discriminated unions prevent runtime type errors
3. **Separation**: Clear boundaries between resolution and preparation
4. **Extensibility**: New types can be added without modifying existing code
5. **Testability**: Each phase can be tested independently

## Current Implementation Status

### Completed Components

- **Type System**: All resolved types are defined
- **Dispatch Infrastructure**: Pattern matching is in place
- **File Types**: `ResolvedFile` and `ResolvedStaticFile` are fully implemented
- **Integration**: All components work together seamlessly

### Directory Support Status

The directory support infrastructure is largely complete:

- ✅ **`ResolvedDirectory` Type**: Fully defined with rich metadata
- ✅ **Preparation Dispatch**: Pattern matching case exists
- ✅ **Preparation Function**: Stub exists but raises `NotImplementedError`
- ❌ **Input Spec**: No directory input spec type
- ❌ **Resolver**: No resolver for directory specs
- ❌ **Implementation**: Preparation function not implemented

This indicates that directory support was planned from the beginning but never completed.

## Complexity Assessment

### Current Complexity

The system handles moderate complexity well:

- **Input Spec Types**: 7 different spec types
- **Source Types**: Local files, SSH, bfabric resources, static content
- **Operations**: Copy, link, write, checksum validation
- **Error Handling**: Comprehensive validation and error reporting

### Design Quality Indicators

1. **Consistent Patterns**: All input types follow the same processing flow
2. **Clear Abstractions**: Well-defined interfaces between components
3. **Type Safety**: Strong typing prevents common errors
4. **Extensible Design**: Easy to add new input types
5. **Testable**: Each component can be tested in isolation
96 changes: 96 additions & 0 deletions bfabric_app_runner/doc-fragments/03-command-python-env.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# CommandPythonEnv System Documentation

## Overview

CommandPythonEnv is a system for creating, managing, and executing Python virtual environments. It supports both cached (persistent) and ephemeral (temporary) environments, with mechanisms for dependency installation, environment provisioning, and command execution.

## Environment Paths

### Base Cache Directory

- Primary location: `$XDG_CACHE_HOME/bfabric_app_runner/` (defaults to `~/.cache/bfabric_app_runner/` if XDG_CACHE_HOME is not set)

### Environment Types

1. **Cached Environments**

- Path: `$XDG_CACHE_HOME/bfabric_app_runner/envs/<environment_hash>`
- The environment hash is generated based on:
- Hostname
- Python version
- Absolute path to pylock file
- Modification time of pylock file
- Absolute paths of any local extra dependencies (if present)

2. **Ephemeral Environments**

- Path: `$XDG_CACHE_HOME/bfabric_app_runner/ephemeral/env_<random_suffix>`
- Created as temporary directories
- Cleaned up after use

### Environment Structure

- Python executable: `<env_path>/bin/python`
- Bin directory: `<env_path>/bin/`
- Provisioned marker: `<env_path>/.provisioned`
- Lock file (for cached envs): `<env_path>.lock`

## Core Logic Flow

1. **Environment Selection**

- If `refresh` flag is enabled → Use ephemeral environment
- Otherwise → Use cached environment

2. **Environment Resolution**

- For cached environments:

- Generate environment hash
- Check if environment exists at hash path
- Use file locking to prevent race conditions
- Provision if not already provisioned

- For ephemeral environments:

- Create a new temporary directory
- Always provision from scratch
- Clean up after use

3. **Environment Provisioning Process**

- Create virtual environment using `uv venv` with specified Python version
- Install dependencies from pylock file using `uv pip install`
- Install any local extra dependencies with `--no-deps` (if specified)
- Mark as provisioned by creating `.provisioned` file

4. **Command Execution**

- Use the environment's Python executable to run the command
- Add the environment's bin directory to the PATH
- Execute with any additional arguments passed to the command

## Key Behaviors

1. **Caching Strategy**

- Environments are identified by their hash, allowing reuse
- File locking prevents concurrent provisioning of the same environment
- The `.provisioned` marker ensures partially-provisioned environments are not used

2. **Refresh Mode**

- When enabled, creates a new ephemeral environment for each execution
- Ensures clean environments for testing or when dependencies need updating
- Automatically cleans up after execution

3. **Path Management**

- Environment's bin directory is prepended to PATH during execution
- Additional prepend_paths can be specified in the command

4. **Dependency Installation**

- Uses `uv pip install` for fast, reliable dependency installation
- Supports reinstallation of packages with the refresh flag
- Handles local extra dependencies separately with `--no-deps`
90 changes: 90 additions & 0 deletions bfabric_app_runner/doc-fragments/04-app-config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# BFabric App Configuration

This document describes how to specify a bfabric-app-runner application in an app.yml file. This file is understood by the bfabric-app-runner submitter integration in B-Fabric, and the path to it is specified as the "program" in the b-fabric executable.

## Configuration Structure

The configuration is defined in YAML format with the following main sections:

### App Runner Version

```yaml
bfabric:
app_runner: 0.2.1
```

The `app_runner` version (e.g., `0.2.1`) specifies which version to pull from PyPI.

### Application Versions

Multiple application versions can be defined, each with their own command definitions. The version can be specified in bfabric with the `application_version` key parameter.

#### Release Versions

The YAML defines versions of the application, where each version identifier should be unique. To avoid configuration duplication, multiple versions can use the same definition with template variables available:

```yaml
versions:
- version:
- 4.7.8.dev3
- 4.7.8.dev4
- 4.7.8.dev8
- 4.7.8.dev9
```

For release versions, the application uses pre-built wheel files and pylock dependency specifications located in the distribution directory. The `${app.version}` variable is substituted with the actual version number in file paths.

#### Development Version

It can be very useful to add a development version for testing purposes. This version can be named anything (not just `devel`), and each person can have their own development version:

```yaml
- version:
- devel
```

The development version loads the application directly from the source code path and includes the `refresh: True` option to enable dynamic reloading during development.

### Commands

Each version defines two main commands:

- **dispatch**: Handles job dispatching operations
- **process**: Executes the actual processing tasks

Both commands use Python environments with specified dependency locks and can include environment variables and path modifications.

## Build Process

Application packages are created using the following uv commands:

1. **Lock dependencies**: `uv lock -U`
2. **Export pylock**: `uv export --format pylock.toml --no-export-project > pylock.toml`
3. **Build wheel**: `uv build`

The resulting wheel and pylock files are then copied into the slurmworker configuration directory and managed with git-lfs.

## Validation

The slurmworker repository contains a noxfile that allows running `nox` to validate all app YAML files for validity, which can be useful when updating configurations.

## Configuration Parameters

### Command Types

Commands can be of different types:

- `python_env`: Recommended for reproducible Python environments. This ensures that the app will be deployed exactly as developed without further modifications.
- `exec`: For simple shell scripts (refer to the app runner documentation for details)

### Parameters for python_env Commands

- `pylock`: Path to the Python dependency lock file
- `command`: Python module command to execute
- `local_extra_deps`: Additional local dependencies (wheels or source paths)

### Optional Parameters

- `refresh`: Enable dynamic reloading (development only)
- `env`: Environment variables to set (can include application-specific variables)
- `prepend_paths`: Additional paths to prepend to PATH environment variable
Loading