Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
199 changes: 164 additions & 35 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,28 +14,30 @@ Osde2e is End-to-end testing framework for Managed services for OSD/ROSA.
- Keep code simple and concise
- Use go language best practices

## Core Test Workflow
1. Load config (CLI flags → env vars → custom YAML → defaults)
2. Provision cluster (or use existing via CLUSTER_ID)
3. Health check (optional)
4. Run tests
5. Upgrade (optional)
6. Cleanup (optional)

## Key Files
- `pkg/common/config/config.go` - All configuration options (START HERE)
- `cmd/osde2e/main.go` - Entry point
## Configuration
- **Primary file**: `pkg/common/config/config.go` - All configuration options (START HERE)
- **Pattern**: const key + env var + default
- **Access**: `viper.GetString(config.SomeKey)`
- **Precedence**: CLI flags → env vars → custom YAML → defaults
- **Required env vars**: OCM_CLIENT_ID, OCM_CLIENT_SECRET, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
- **Optional env vars**:
- `CLUSTER_ID` - Reuse existing cluster
- `AD_HOC_TEST_IMAGES` - Override test images (comma-separated list)
- `TEST_SUITES_YAML` - YAML list of test suites with optional slack channels (format: `- image: ...\n slackChannel: ...`)

## Key Files & Components
- `cmd/osde2e/test/cmd.go` - CLI entry point
- `pkg/e2e/e2e.go` - Main orchestrator
- `pkg/common/cluster/clusterutil.go` - Cluster lifecycle
- `pkg/common/providers/` - Cloud provider implementations (OCM, ROSA)
- `pkg/common/cluster/healthchecks/` - Health validation logic
- `pkg/common/executor/executor.go` - Ad-hoc executor pods
- `pkg/common/runner/runner.go` - Runner pods
- `pkg/e2e/adhoctestimages/adhoctestimages.go` - Ad-hoc test suite driver
- `internal/llm/` - LLM/AI integration (Gemini)

## Common Patterns

### Configuration
- Everything in `config.go`: const key + env var + default
- Access via `viper.GetString(config.SomeKey)`
- Precedence: CLI > Env > Custom YAML > Default

### Providers
- Interface: `pkg/common/spi/`
- Registered in `main.go`
Expand All @@ -56,9 +58,6 @@ Osde2e is End-to-end testing framework for Managed services for OSD/ROSA.
4. Always use `gofumpt`, not `gofmt`
5. Check git status before committing

## Environment
- See `config.go` for complete list

## Architecture
```
osde2e
Expand All @@ -70,23 +69,153 @@ osde2e


## Before You Commit
```bash
gofumpt -w . # Format (not gofmt!)
make build # Compile
go test ./... -v # Test (integration tests need credentials)
```
- gofumpt all changed files
- run unit tests except test/ folder
- make build
- update README and AGENTS.md for changes made


## Testing Instructions

### E2E Tests
- **CLI**: `go run cmd/osde2e/main.go test --skip-health-check --skip-must-gather --skip-destroy-cluster --configs=rosa,sts,stage,ad-hoc-image`
- **IDE debugger**: VSCode (use `configs/local/example-launch.json`), GoLand (use `configs/local/example-e2e.run.xml`)

### Unit Tests
- Add or update unit tests for concrete implementation changes and new functionality
- Fix any test or type errors until the whole suite is green

## E2E-Suite Execution Flow

### Overview
The e2e-suite execution follows a multi-stage pipeline from test command initialization through cluster provisioning, test execution in isolated pods, multi-level artifact gathering, and cleanup.

### Execution Pipeline

1. **Command Initialization** (`cmd/osde2e/test/cmd.go:160-216`)
- Load configuration (CLI flags → env vars → YAML → defaults)
- Call `e2e.RunTests()` orchestrator

2. **Orchestrator Setup** (`pkg/e2e/e2e.go:42-97`)
```
NewOrchestrator() → Provision() → Execute() → AnalyzeLogs() →
PostProcessCluster() → Report() → Cleanup()
```

3. **Cluster Provisioning** (`pkg/common/cluster/clusterutil.go`)
- Load/reuse existing cluster or provision new via OCM provider
- Run health checks (CVO, nodes, operators, certs, daemonsets)
- Retrieve kubeconfig and configure cluster access

4. **Test Execution** (Two Patterns)

**Pattern A: Runner Pods** (`pkg/common/runner/runner.go`) - Traditional OpenShift test suites
- Get test image from ImageStream
- Create Job pod with git init containers for repo cloning
- Stream logs from all containers to `{reportDir}/{phase}/containerLogs/`
- Wait for completion with timeout

**Pattern B: Ad-Hoc Executor Pods** (`pkg/common/executor/executor.go`) - Modern test suites
- Create isolated namespace per test suite
- Deploy 2-container pod:
- `e2e-suite`: Runs test image, writes results to `/test-run-results`
- `pause-for-artifacts`: Keeps pod alive (`tail -f /dev/null`) for artifact collection
- Inject cluster metadata as env vars (OCM_CLUSTER_ID, CLOUD_PROVIDER_ID, etc.)
- Wait for e2e-suite container completion

5. **Multi-Level Artifact Gathering**

**Level 1: Pod Logs** (`pkg/common/runner/service.go:95-135`)
- Stream logs from all containers to individual files

**Level 2: Test Suite Results** (`pkg/common/executor/executor.go:302-398`)
- Fetch pod logs from e2e-suite container
- Execute `tar cf - /test-run-results` in pause container
- Stream and extract tar archive via SPDY protocol
- Process JUnit XML results

**Level 3: Cluster Diagnostics** (`pkg/common/cluster/clusterutil.go`)
- Run `oc adm must-gather` → `{reportDir}/must-gather`
- Inspect cluster state (projects, OLM)

**Level 4: Reports & Analysis** (`pkg/e2e/e2e.go:579-643`)
- Generate JUnit XML: `{phaseDir}/junit_{suffix}.xml`
- Create Konflux JSON report (if configured)
- Run AI-powered log analysis on failures → `{reportDir}/analysis/`

6. **Reporting & Notifications** (`pkg/e2e/e2e.go:271-302`)
- Upload all artifacts to S3 with presigned URLs
- Send Slack notifications with links:
- Per-test-image notifications (ad-hoc) with analysis results
- General failure notifications with aggregated results

7. **Cleanup** (`pkg/e2e/e2e.go:559-571`)
- Delete test namespaces (executor pods)
- Run must-gather and cluster inspection
- Update OCM cluster properties
- Delete cluster (unless `--skip-destroy-cluster` flag set)

### Configuration Levels

**Level 1: Main osde2e Test Command** (CLI/Environment)
- `configs/ad-hoc-image.yaml`: Test suite images, timeouts, Slack channels
- See "Configuration" section above for env vars and precedence

**Level 2: Executor E2E-Suite Job** (Pod - Injected by `pkg/common/executor/executor.go:186-268`)
- `OCM_CLUSTER_ID`: Target cluster identifier
- `OCM_ENV`: OCM environment (stage/production)
- `CLOUD_PROVIDER_ID`: Cloud provider (aws/gcp/azure)
- `CLOUD_PROVIDER_REGION`: Cluster region
- `OCM_CCS`: Customer Cloud Subscription flag
- `GINKGO_NO_COLOR`: Disable colored output
- Shared volume: EmptyDir at `/test-run-results` (e2e-suite writes, pause container serves)

## Exception Tests (Non-Executor Pattern)

A few legacy tests exist directly in `pkg/e2e/` that run without test-suite executor jobs. These are exceptions to the standard execution pattern.

## E2E Testing instructions
- Do not allow e2e test unless following env vars are set: AD_HOC_TEST_IMAGES|CLUSTER_ID|OCM_CLIENT_ID|OCM_CLIENT_SECRET|AWS_ACCESS_KEY_ID|AWS_SECRET_ACCESS_KEY
- To run via cli:
```bash
go run cmd/osde2e/main.go test --skip-health-check --skip-must-gather --skip-destroy-cluster --configs=rosa,sts,stage,ad-hoc-image"
````
- To run via IDE debugger, if IDE is VSCode, use configs/local/example-launch.json; if IDE is GoLand, use configs/local/example-e2e.run.xml

## Unit Testing instructions
- Add or update for unit tests for concrete implementation changes and new functionality, even if nobody asked.
- Fix any test or type errors until the whole suite is green.
### How They Work

**Direct Execution**
- Tests live in `pkg/e2e/` directory (e.g., `workloads`, `verify`, `operators`)
- Run as part of the main osde2e Ginkgo suite
- Execute directly in the runner pod without spawning executor pods
- Test output and results are fetched directly from the runner pod
- No need for artifact collection via pause containers

**Key Differences from Executor Pattern**
- No isolated namespace per test
- No 2-container pod architecture (e2e-suite + pause)
- No tar-based artifact fetching via SPDY
- Results captured directly through Ginkgo's reporting mechanisms

**Future Direction**
- Users should prefer test suites (executor pattern) for new tests
- These direct tests are maintained for backward compatibility
- New platform component tests should use boilerplate pattern instead

**Files**: `pkg/e2e/workloads/`, `pkg/e2e/verify/`, `pkg/e2e/operators/`

## Secret Management

Secrets from vault volumes are automatically loaded and propagated from the top-level osde2e pod to second-level test suite executor pods.

### How It Works

**In Prow Jobs & Progressive Delivery**
- Vault secrets are mounted as volumes via `--secret-file-locations` flag (e.g., `/secrets/vault`)
- At startup, osde2e reads all files from these mounted directories
- Each file becomes a secret: filename becomes the key, file contents become the value
- Secrets are stored in viper configuration and combined into a passthrough map

**Propagation to Test Pods**
- Top-level osde2e pod collects all secrets into a single map
- For each test suite execution, a Kubernetes Secret named `ci-secrets` is created in the test namespace
- The executor pod references this secret via `envFrom.secretRef`
- All secrets automatically become environment variables in the test suite container
- Test suites access secrets as standard environment variables

**Files**: `pkg/common/load/load.go` (vault loading), `pkg/common/executor/executor.go:158-177` (pod injection)

## PR instructions
- Title format: [<jira-ID>] <Title>