Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 0 additions & 14 deletions config/305-config-observability.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,20 +52,6 @@ data:
# Only applicable for grpc and http/protobuf protocols.
# metrics-export-interval: "30s"

# tracing-protocol specifies the trace export protocol.
# Supported values: "grpc", "http/protobuf", "none".
# Default is "none" (tracing disabled).
# tracing-protocol: "none"

# tracing-endpoint specifies the OTLP collector endpoint.
# Required when tracing-protocol is "grpc" or "http/protobuf".
# The OTEL_EXPORTER_OTLP_ENDPOINT env var takes precedence if set.
# tracing-endpoint: "http://otel-collector.observability.svc.cluster.local:4317"

# tracing-sampling-rate controls the fraction of traces sampled.
# 0.0 = none, 1.0 = all. Default is 0 (none).
# tracing-sampling-rate: "1.0"

Comment on lines -55 to -68

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the configmap keys are removed, I see no change in controller. Does that mean the configmap based tracing continue to work even with these changes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Knative tracing wrapper that consumed these _example keys is replaced; the new tracer in pkg/tracing/provider.go reads the OTel-standard env vars directly. The _example block was just documentation for keys nothing reads anymore.

This is not the same ConfigMap as pipelines-as-code (the main one), which holds the tracing-label-action|application|component operator label-name mappings.

On the "no change in controller" observation - we verified end-to-end that with neither OTEL_EXPORTER_OTLP_ENDPOINT nor OTEL_TRACES_SAMPLER set, PaC falls back to a noop tracer and emits no spans. If you're seeing tracing behavior unchanged from the pre-PR state, could you share how to reproduce so we can dig in?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By changes in controller, I intended to ask if the knative base (eventing/adapter) that pac uses, reads these observability configmap keys for any arbitrary configuration/operations. Ref.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - the Knative eventing-adapter still consumes config-observability for non-tracing observability (metrics/profiling/etc. via evadapter.NewObservabilityConfiguratorFromConfigMap() at the line you linked, logging is read from config-logging separately). What this PR changes is specifically the tracing portion: PaC's old Knative tracing wrapper (added in bd9f468) read tracing-protocol / tracing-endpoint / tracing-sampling-rate from this ConfigMap; the new pkg/tracing/provider.go reads the OTel-standard env vars directly and that wrapper is gone, so those three keys went with the _example block. The Knative-base reads for non-tracing observability are unchanged.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ci-operator as it looks breaking change and @theakshaypant is considerate about, can't we introduce env based configuration as fallback to configmap so that we're keeping both, considering other users may want to use configmap?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we read those keys, the tracing-sampling-rate value would have to map to either traceidratio (which preserves the flat-sampling behavior this PR is fixing) or parentbased_traceidratio (which silently changes existing users' behavior). It's a breaking change either way - a ConfigMap fallback wouldn't actually preserve the old behavior. We're going env-vars-only and calling out the break in the tracing doc.

@theakshaypant theakshaypant Jun 25, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a user still sets these keys in their configmap?
IIUC knative's sharedmain.SetupObservabilityOrDie is still called by the watcher which starts tracing. Ref sharedmain, Ref watcher
How would these "conflicting" tracing configs be running? Would this cause a leak or is it a functional bug? Or have I completely missed the mark 🀨

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this would be an issue. Both tracers initialize when an operator has set the Knative tracing keys. OpenTelemetry's setup runs after Knative's and overrides the global tracer provider, so spans flow through OpenTelemetry; Knative's tracer is configured but receives no spans.

There's now a startup warning when both are configured, the noopProvider helper was renamed to passthroughProvider (it never installed a noop globally), and the info-log messages no longer claim otherwise. The docs cover all this in a new When both are configured subsection.

Whether a leak or a functional bug, a tracer receiving no spans is wasted-resource either way. There's now a startup warning, the log messages are clearer, and the docs have a dedicated section for this case.

# runtime-profiling enables/disables the pprof profiling server on port 8008.
# Supported values: "enabled", "disabled" (default).
# runtime-profiling: "disabled"
Expand Down
55 changes: 32 additions & 23 deletions docs/content/docs/operations/tracing.md
Comment thread
theakshaypant marked this conversation as resolved.
Original file line number Diff line number Diff line change
Expand Up @@ -3,39 +3,48 @@ title: Distributed Tracing
weight: 5
---

This page describes how to enable OpenTelemetry distributed tracing for Pipelines-as-Code. When enabled, PaC emits trace spans for webhook event processing and PipelineRun lifecycle timing.
Pipelines-as-Code emits trace spans for webhook event processing and PipelineRun lifecycle timing.

## Enabling tracing

The ConfigMap `pipelines-as-code-config-observability` controls tracing configuration. It must exist in the same namespace as the Pipelines-as-Code controller and watcher deployments. See [config/305-config-observability.yaml](https://github.com/tektoncd/pipelines-as-code/blob/main/config/305-config-observability.yaml) for the full example.
Two configuration paths can enable tracing.

It contains the following tracing fields:
### Via OpenTelemetry environment variables

* `tracing-protocol`: Export protocol. Supported values: `grpc`, `http/protobuf`, `none`. Default is `none` (tracing disabled).
* `tracing-endpoint`: OTLP collector endpoint. Required when protocol is not `none`. The `OTEL_EXPORTER_OTLP_ENDPOINT` environment variable takes precedence if set.
* `tracing-sampling-rate`: Fraction of traces to sample. `0.0` = none, `1.0` = all. Default is `0`.
Set on the controller and watcher pods:

### Example
* `OTEL_EXPORTER_OTLP_ENDPOINT` - OTLP collector endpoint URL. Required.
* `OTEL_TRACES_SAMPLER` - Sampler family. Required. Supported: `always_on`, `always_off`, `traceidratio`, `parentbased_always_on`, `parentbased_always_off`, `parentbased_traceidratio`.
* `OTEL_TRACES_SAMPLER_ARG` - Numeric argument for ratio samplers. Example: `0.1` with `parentbased_traceidratio` samples 10% of root traces while keeping the chain coherent.
* `OTEL_EXPORTER_OTLP_PROTOCOL` (or traces-specific `OTEL_EXPORTER_OTLP_TRACES_PROTOCOL`) - OTLP transport: `grpc` or `http/protobuf`. Default: `grpc`.

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: pipelines-as-code-config-observability
namespace: pipelines-as-code
data:
tracing-protocol: grpc
tracing-endpoint: "http://otel-collector.observability.svc.cluster.local:4317"
tracing-sampling-rate: "1.0"
```
Both `OTEL_EXPORTER_OTLP_ENDPOINT` and `OTEL_TRACES_SAMPLER` must be set. Inbound `traceparent` headers on webhook requests are honored via the W3C TraceContext propagator. Changes take effect on the next pod restart.

Changes to `tracing-protocol`, `tracing-endpoint`, and `tracing-sampling-rate` require restarting the controller and watcher pods. The trace exporter is created once at startup from the ConfigMap values at that time. Set `tracing-protocol` to `none` or remove the tracing keys to disable tracing.
#### Sampler choice and chain coherency

The controller and watcher locate this ConfigMap by name via the `CONFIG_OBSERVABILITY_NAME` environment variable set in their deployment manifests. Operator-based installations may manage this differently; consult the operator documentation for details.
The `parentbased_*` sampler family honors the parent span's sample decision carried in the W3C `traceparent` flag bit. When every service in the delivery chain uses parent-based samplers, the root span's sampling decision propagates end to end: each service either keeps its spans or drops them based on what the root chose. Flat-rate samplers (`traceidratio` without parent-based) cause each service to roll independently, which at fractional sampling fragments the chain into orphaned spans whose `parent_spanID` references a span that was dropped. `parentbased_always_on` keeps everything; `parentbased_traceidratio` with a numeric argument samples a coherent fraction.

### Via Knative observability ConfigMap

Set in `pipelines-as-code-config-observability`:

* `tracing-protocol` - `grpc`, `http/protobuf`, `stdout`, or `none`. Default: `none`.
* `tracing-endpoint` - Collector endpoint for `grpc` or `http/protobuf`.
* `tracing-sampling-rate` - Sample fraction. Per-component independent.

Changes to Knative's tracing config require restarting the controller and watcher pods. The tracer is built once at startup.

### When both are configured

OpenTelemetry takes precedence: all spans flow through the OpenTelemetry exporter. The Knative tracer is initialized at startup but unused.

To use only OpenTelemetry, set `tracing-protocol: none` in `pipelines-as-code-config-observability`.

To use only Knative, unset `OTEL_EXPORTER_OTLP_ENDPOINT` on the controller and watcher pods.

## Emitted spans

The controller emits a `PipelinesAsCode:ProcessEvent` span for each webhook event. The watcher emits `waitDuration` and `executeDuration` spans for completed PipelineRuns.
The controller emits a `PipelinesAsCode:ProcessEvent` span for each webhook event. The watcher emits `waitDuration` and `executeDuration` spans for completed PipelineRuns. The OTel resource attribute `service.name` on all emitted spans is `pipelines-as-code`.

### Webhook event span (`PipelinesAsCode:ProcessEvent`)

Expand Down Expand Up @@ -103,13 +112,13 @@ Unlike the observability ConfigMap above (which requires a pod restart), changes

## Trace context propagation

When Pipelines-as-Code creates a PipelineRun, it sets the `tekton.dev/pipelinerunSpanContext` annotation with a JSON-encoded OTel TextMapCarrier containing the W3C `traceparent`. PaC tracing works independently β€” you get PaC spans regardless of whether Tekton Pipelines has tracing enabled.
When Pipelines-as-Code creates a PipelineRun, it sets the `tekton.dev/pipelinerunSpanContext` annotation with a JSON-encoded OTel TextMapCarrier containing the W3C `traceparent`. PaC tracing works independently - you get PaC spans regardless of whether Tekton Pipelines has tracing enabled.

If Tekton Pipelines is also configured with tracing pointing at the same collector, its reconciler spans appear as children of the PaC span, providing a single end-to-end trace from webhook receipt through task execution. See the [Tekton Pipelines tracing documentation](https://github.com/tektoncd/pipeline/blob/main/docs/developers/tracing.md) for Tekton's independent tracing setup.

## Deploying a trace collector

Pipelines-as-Code exports traces using the standard OpenTelemetry Protocol (OTLP). You need a running OTLP-compatible collector for the `tracing-endpoint` to point to. Common options include:
Pipelines-as-Code exports traces using the standard OpenTelemetry Protocol (OTLP). You need a running OTLP-compatible collector for `OTEL_EXPORTER_OTLP_ENDPOINT` to point to. Common options include:

* [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/) -- the vendor-neutral reference collector
* [Jaeger](https://www.jaegertracing.io/docs/latest/getting-started/) -- supports OTLP ingestion natively since v1.35
Expand Down
4 changes: 2 additions & 2 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ require (
github.com/tektoncd/pipeline v1.13.1
gitlab.com/gitlab-org/api/client-go v1.46.0
go.opentelemetry.io/otel v1.44.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.43.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.43.0
go.opentelemetry.io/otel/metric v1.44.0
go.opentelemetry.io/otel/sdk v1.43.0
go.opentelemetry.io/otel/sdk/metric v1.43.0
Expand Down Expand Up @@ -90,8 +92,6 @@ require (
go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc v1.43.0 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp v1.43.0 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.43.0 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.43.0 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.43.0 // indirect
go.opentelemetry.io/otel/exporters/prometheus v0.65.0 // indirect
go.opentelemetry.io/otel/exporters/stdout/stdouttrace v1.43.0 // indirect
go.opentelemetry.io/proto/otlp v1.10.0 // indirect
Expand Down
9 changes: 9 additions & 0 deletions pkg/adapter/adapter.go
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,15 @@ func New(run *params.Run, k *kubeinteraction.Interaction) adapter.AdapterConstru
}

func (l *listener) Start(ctx context.Context) error {
tp := tracing.New(l.logger)
defer func() {
shutdownCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
if err := tp.Shutdown(shutdownCtx); err != nil {
l.logger.Errorw("failed to shut down tracer provider", "error", err)
}
}()

adapterPort := globalAdapterPort
envAdapterPort := os.Getenv("PAC_CONTROLLER_PORT")
if envAdapterPort != "" {
Expand Down
13 changes: 13 additions & 0 deletions pkg/reconciler/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package reconciler
import (
"context"
"path"
"time"

"github.com/openshift-pipelines/pipelines-as-code/pkg/apis/pipelinesascode"
"github.com/openshift-pipelines/pipelines-as-code/pkg/apis/pipelinesascode/keys"
Expand All @@ -14,6 +15,7 @@ import (
"github.com/openshift-pipelines/pipelines-as-code/pkg/params/info"
prmetrics "github.com/openshift-pipelines/pipelines-as-code/pkg/pipelinerunmetrics"
queuepkg "github.com/openshift-pipelines/pipelines-as-code/pkg/queue"
"github.com/openshift-pipelines/pipelines-as-code/pkg/tracing"
tektonv1 "github.com/tektoncd/pipeline/pkg/apis/pipeline/v1"
tektonPipelineRunInformerv1 "github.com/tektoncd/pipeline/pkg/client/injection/informers/pipeline/v1/pipelinerun"
tektonPipelineRunReconcilerv1 "github.com/tektoncd/pipeline/pkg/client/injection/reconciler/pipeline/v1/pipelinerun"
Expand All @@ -30,6 +32,17 @@ func NewController() func(context.Context, configmap.Watcher) *controller.Impl {
ctx = info.StoreNS(ctx, system.Namespace())
log := logging.FromContext(ctx)

tp := tracing.New(log)
// linter false positive: fresh Background is required because outer ctx is cancelled past <-ctx.Done().
go func() { //nolint:gosec
<-ctx.Done()
shutdownCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
if err := tp.Shutdown(shutdownCtx); err != nil {
log.Errorw("failed to shut down tracer provider", "error", err)
}
}()

run := params.New()
err := run.Clients.NewClients(ctx, &run.Info)
if err != nil {
Expand Down
156 changes: 156 additions & 0 deletions pkg/tracing/provider.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
package tracing

import (
"context"
"os"
"strconv"

"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
"go.opentelemetry.io/otel/propagation"
"go.opentelemetry.io/otel/sdk/resource"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.40.0"
"go.opentelemetry.io/otel/trace/noop"
"go.uber.org/zap"
knativetracing "knative.dev/pkg/observability/tracing"
)

const (
EnvOTLPEndpoint = "OTEL_EXPORTER_OTLP_ENDPOINT"
EnvOTLPProtocol = "OTEL_EXPORTER_OTLP_PROTOCOL"
EnvOTLPTracesProtocol = "OTEL_EXPORTER_OTLP_TRACES_PROTOCOL"
EnvTracesSampler = "OTEL_TRACES_SAMPLER"
EnvTracesSamplerArg = "OTEL_TRACES_SAMPLER_ARG"

protocolGRPC = "grpc"
protocolHTTP = "http/protobuf"
)

type TracerProvider struct {
shutdown func(context.Context) error
}

func New(logger *zap.SugaredLogger) *TracerProvider {
otelConfigured := os.Getenv(EnvOTLPEndpoint) != "" && os.Getenv(EnvTracesSampler) != ""
if otelConfigured && !globalIsNoop() {
logger.Warn("OpenTelemetry and Knative tracing both configured; spans go through OpenTelemetry, Knative's tracer is unused. Set `tracing-protocol: none` in `pipelines-as-code-config-observability` to disable Knative, or unset `OTEL_EXPORTER_OTLP_ENDPOINT` to disable OpenTelemetry.")
}

if os.Getenv(EnvOTLPEndpoint) == "" {
logger.Info("OpenTelemetry not configured (OTLP endpoint missing)")
return passthroughProvider()
}
if os.Getenv(EnvTracesSampler) == "" {
logger.Info("OpenTelemetry not configured (sampler missing)")
return passthroughProvider()
}

proto := protocolFromEnv()
exporter, err := newExporter(context.Background(), logger, proto)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per some AI review, does timeout handling needed here??

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - otlptracegrpc.New doesn't actually open a connection, so there's nothing here that can hang. The gRPC connection is lazy (opens on the first span export), and shutdown already has a 5s timeout.

if err != nil {
logger.Errorw("failed to create OTLP exporter", "error", err)
return passthroughProvider()
}

res, err := resource.Merge(
resource.Default(),
resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceName(TracerName),
),
)
if err != nil {
logger.Errorw("failed to create resource", "error", err)
res = resource.Default()
}

tp := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exporter),
sdktrace.WithResource(res),
sdktrace.WithSampler(samplerFromEnv(logger)),
)

otel.SetTracerProvider(tp)
otel.SetTextMapPropagator(propagation.TraceContext{})

logger.Infow("tracing initialized", "endpoint", os.Getenv(EnvOTLPEndpoint), "protocol", proto)

return &TracerProvider{shutdown: tp.Shutdown}
}

func passthroughProvider() *TracerProvider {
return &TracerProvider{shutdown: func(context.Context) error { return nil }}
}

func globalIsNoop() bool {
tp := otel.GetTracerProvider()
if _, ok := tp.(noop.TracerProvider); ok {
return true
}
// Knative wraps noop in its own TracerProvider when tracing-protocol is none/absent.
if knativeProvider, ok := tp.(*knativetracing.TracerProvider); ok {
_, isNoop := knativeProvider.TracerProvider.(noop.TracerProvider)
return isNoop
}
return false
}

func protocolFromEnv() string {
if v := os.Getenv(EnvOTLPTracesProtocol); v != "" {
return v
}
if v := os.Getenv(EnvOTLPProtocol); v != "" {
return v
}
return protocolGRPC
}

func newExporter(ctx context.Context, logger *zap.SugaredLogger, proto string) (sdktrace.SpanExporter, error) {
endpoint := os.Getenv(EnvOTLPEndpoint)
switch proto {
case protocolHTTP:
return otlptracehttp.New(ctx, otlptracehttp.WithEndpointURL(endpoint))
case protocolGRPC:
return otlptracegrpc.New(ctx, otlptracegrpc.WithEndpointURL(endpoint))
default:
logger.Errorw("unsupported OTLP protocol; falling back to grpc", "protocol", proto)
return otlptracegrpc.New(ctx, otlptracegrpc.WithEndpointURL(endpoint))
}
}

func (tp *TracerProvider) Shutdown(ctx context.Context) error {
if tp.shutdown != nil {
return tp.shutdown(ctx)
}
return nil
}

func samplerFromEnv(logger *zap.SugaredLogger) sdktrace.Sampler {
name := os.Getenv(EnvTracesSampler)
argStr := os.Getenv(EnvTracesSamplerArg)
arg, err := strconv.ParseFloat(argStr, 64)
if err != nil && argStr != "" {
logger.Errorw("ignoring malformed sampler argument; defaulting to 0% sampling", "env", EnvTracesSamplerArg, "value", argStr)
}
if argStr == "" && (name == "traceidratio" || name == "parentbased_traceidratio") {
logger.Infow("ratio sampler selected without "+EnvTracesSamplerArg+"; defaulting to 0% sampling", "env", EnvTracesSampler, "value", name)
}
switch name {
case "always_on":
return sdktrace.AlwaysSample()
case "always_off":
return sdktrace.NeverSample()
case "traceidratio":
return sdktrace.TraceIDRatioBased(arg)
case "parentbased_always_on":
return sdktrace.ParentBased(sdktrace.AlwaysSample())
case "parentbased_always_off":
return sdktrace.ParentBased(sdktrace.NeverSample())
case "parentbased_traceidratio":
return sdktrace.ParentBased(sdktrace.TraceIDRatioBased(arg))
}
logger.Warnw("unrecognized OTEL_TRACES_SAMPLER value; falling back to never sample", "value", name)
return sdktrace.NeverSample()
}
Loading
Loading