Skip to content

docs: specification detailing startup failure behaviour in the SDKs#112

Open
sighphyre wants to merge 2 commits intomainfrom
chore/config-fail-spec
Open

docs: specification detailing startup failure behaviour in the SDKs#112
sighphyre wants to merge 2 commits intomainfrom
chore/config-fail-spec

Conversation

@sighphyre
Copy link
Member

Details for how an SDK should degrade on startup.

Some details up for discussion

  • I've explicitly opted not to go for an offline mode. Simply because rolling that out to 25+ SDKs would make me cry and because it's a subtly complex feature to get right. Graceful degradation will give us 95% of the value there, with the tradeoff that errors will be a bit ugly
  • On SDKs that support backoff + retry, degradation like this would mean the SDK effectively goes dormant after some time after a noisy startup sequence. This should be enough for SRE engineers to pick up the problem while not causing much in the way of issues along network lines
  • Explicitly added hard initialization because it's a feature in some of our SDKs already
  • Ready event is something I'd like some opinions on. Emitting this event means that users would actually have to read their logs and react. Not emitting it means deadlocks in production. The latter is more scary to me, so I've chosen to spec that it's emitted. Open to challenges here
  • I've included an escape hatch for ecosystems like Node/Rust where it's possible to semi reliably detect development configuration and react to it. Soft fail is an ergonomic degradation so anything we can do to improve developer happiness while keeping production safe would be grand

@@ -0,0 +1,146 @@
# Initialization & Configuration Validation Specification

This document specifies how Unleash SDKs MUST behave during initialization when configuration is missing or invalid.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be called "required configuration parameters" and internally link to the definitions section?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that seems like a good suggestion


## Overview

Unleash SDKs are frequently configured via environment variables. At runtime, required configuration such as url and token may be null, undefined, or otherwise invalid.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you use "such as" because you expect more required configs in future?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't undefined too JS specific to be part of the spec? maybe "absent"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does invalid mean non conformant to the token and url spec?

This comment was marked as outdated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you use "such as" because you expect more required configs in future?

Because it's late in the day and I don't have the entire list in my mind 😅 Need to do a teensy bit of digging tomorrow and figure out what exactly is required. I think it's just this but I'm not 100% sure

isn't undefined too JS specific to be part of the spec? maybe "absent"?

Cheers! Good feedback. Absent seems like the right thing here

does invalid mean non conformant to the token and url spec?

I think I'm being a bit too vague here. Invalid in my mind means "can't talk to unleash"/"unleash doesn't like your token". I specifically don't want to go down the path of validating URLs because it's weird and hard to do. I'd like to reuse existing failure paths that we have in the SDK - we're already pretty robust around not crashing when upstream is unreachable

If possible it would be great to specify clearly what's required.

Agree, I'll figure out what this is and update

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the full list is a nice-to-have, it can be evolved or changed over time. Happy having it if it's cheap.

Required configuration parameters:

- **URL** - A URL specifying the base path of the Unleash client API
- **Token** - An Unleash token that allows access to the relevant client API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we have very unfortunate name for our backend APIs that we call client API but maybe we can say "backend client API" to be less confusing to newcomers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gosh we are being burned - this needs to cover frontend SDKs here too so I think I need to be more clear here. This means "the URL that you put in the config at startup". I think some frontend SDKs do weird things like expect the full path too. Needs some clarity though, agree

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we landed on backend vs frontend. The fact that the backend lives under /client and that we used to call those APIs client is confusing. Maybe we could introduce a /backend url, and deprecate /client, but I don't want to go this route now. I think what Mateusz suggests makes sense for now, just be clearer with words :)

- **URL** - A URL specifying the base path of the Unleash client API
- **Token** - An Unleash token that allows access to the relevant client API

These parameters are required for communication with Unleash.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already said on line 22


These parameters are required for communication with Unleash.

**Soft initialization**:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opinion. I find "soft" too vague. I'd prefer "lenient" because it means it tolerates misconfiguration. The opposite would be "strict".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, that's a nice suggestion

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure abut "lenient". After looking for some alternatives I quite like best-effort e.g.: Best-effort initialization (default).


**Hard initialization**:

An explicit, non-default initialization method that enforces successful initial synchronization before returning control.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-default -> opt-in

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what returning control means here. Some APIs have both sync and async init.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, nice suggestion

- Soft validation MUST check only for presence of required configuration parameters.
- SDKs MUST NOT reject URLs purely based on format validation. Communication success is determined by whether Unleash responds successfully.
- Missing or invalid required configuration MUST be surfaced through existing logging or error-event mechanisms.
- Configuration that is not required but missing should default to safe, known values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to touch non required params in this spec? If so, what should happen with optional but incorrect parameters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configuration that is not required but missing should default to safe, known values.

Is this too vague? I did want to cover these, apparently I'm doing a bad job of it 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Mateusz point is that this is increasing the surface area of this definition, we could address optional params in a different contract. Maybe it's fine if we don't even mention them cause it doesn't alter this document's scope


When using a hard initialization method:

1) url and token MUST be treated as required.
Copy link
Contributor

@kwasniew kwasniew Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be a breaking change for some SDKs since it's a default behavior nowadays. Are we gonna bump major versions for them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to then yeah we will. But I don't intend to add hard/strict initialization to SDKs that don't have it until someone cries for it. This is more to clarify the behavior of SDKs that do have this right now. If we gotta break them, then so be it


## Debug / Development Mode Behavior

SDK ecosystems that provide a clear debug or development mode (e.g., Rust debug_assertions, Node non-production environments) MAY enforce stricter validation semantics during soft initialization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean we have 3 modes?

  • soft/lenient
  • hard/strict
  • soft in dev mode
    If so then maybe the reader should know that in the beginning of the document. I have to read this section again tomorrow.


An explicit, non-default initialization method that enforces successful initial synchronization before returning control.

**Debug/Development mode**:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call it development mode (although Rust would disagree). IMO development mode is the local-dev mode, while debug mode is when you set LOG_LEVEL=debug (that can be done in development or in production for reasons that I'm not going to name). Feel free to disagree, cause in Rust debug_assertions suggest that the more accurate name is Debug mode...

Comment on lines +56 to +57
`ready` in this context means:
The SDK has initialized and public API methods are callable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can steal some terminology from K8s here... liveness vs readiness. I think when you finish constructing the SDK, it's alive (liveness => true), while after you load from upstream, or backup, or bootstrap, it is "ready" (we could argue there are different levels of readiness, particularly if you have bootstrap>backup>upstream, maybe it's a level of freshness, IDK).

But agree, after we attempted to fetch from Unleash, we should move into the ready state, because we did our best effort to retrieve toggles.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that spec should distinguish between "fetched" successfully vs "gave up and loaded backup" so that users know if they're running on real data or defaults

- Soft validation MUST check only for presence of required configuration parameters.
- SDKs MUST NOT reject URLs purely based on format validation. Communication success is determined by whether Unleash responds successfully.
- Missing or invalid required configuration MUST be surfaced through existing logging or error-event mechanisms.
- Configuration that is not required but missing should default to safe, known values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Mateusz point is that this is increasing the surface area of this definition, we could address optional params in a different contract. Maybe it's fine if we don't even mention them cause it doesn't alter this document's scope


1) url and token MUST be treated as required.
2) The SDK MUST attempt a full fetch from Unleash. If retries are supported in the SDK then the retry limit must be exceeded.
3) The method MUST NOT resolve/return until at least one successful toggle fetch has occurred.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the same behavior in the presence of a backup?

Copy link

@krzychukula krzychukula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instructions how to deal with my review:

  • I lean on the side of commenting even if I'm not super sure because that's what helped me a lot in the past. But, I don't expect for people to respond to all my comments. I usually flag nitpicky comments, but this time please treat all of them as nitpicky. I wanted to better understand this part of Unleash and I don't expect my comments to make a lot of sense.
  • Feel free to resolve/close comments without replying.
  • If you find I'm misunderstanding something critical then please let me know :)


This specification defines:

- Default (soft) initialization behavior

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find "soft" and "hard" clarifying here. Disclaimer: I'm reading from top to bottom so maybe it becomes clearer later. If I forget to clarify please ping me.

Edit:

  1. After reading the document I got even more convinced about using "strict" instead of "hard".
  2. I'm not sure about about default initialization. Maybe just Default initialization? It could also be Safe initialization to better explain that it won't throw? 🤔 Or a mix of the two like: Default (safe) initialization 🤷
Suggested change
- Default (soft) initialization behavior
- Default initialization
- Strict initialization

2) The SDK MUST start and enter a running state, even if url and/or token are missing or invalid.
3) The SDK MUST emit its normal ready (or equivalent) event once initialization completes.

`ready` in this context means:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add information about the degraded state to the event in some way? Even if only for users to create their own custom logs I think this could be helpful.


1) The SDK MUST NOT throw, panic, or terminate the host process due to missing or invalid required configuration.
2) The SDK MUST start and enter a running state, even if url and/or token are missing or invalid.
3) The SDK MUST emit its normal ready (or equivalent) event once initialization completes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once initialization completes
Can we better define "when" this will happen in some way?
Especially for invalid values I think it might be helpful to understand what happens and how long it can take.
But, I'm not really sure if this wouldn't overcomplicate this doc.


### Validation Semantics

- Soft validation MUST check only for presence of required configuration parameters.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "soft" here may be confused/conflated with "Soft initialization"

What about words like static or maybe basic? Unless it's intentional and I'm missing the connection.

Suggested change
- Soft validation MUST check only for presence of required configuration parameters.
- Static validation MUST check only for presence of required configuration parameters.

- Soft validation MUST check only for presence of required configuration parameters.
- SDKs MUST NOT reject URLs purely based on format validation. Communication success is determined by whether Unleash responds successfully.
- Missing or invalid required configuration MUST be surfaced through existing logging or error-event mechanisms.
- Configuration that is not required but missing should default to safe, known values.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what safe, known values are in practice. If they are default values then maybe it can be rephrased in some way?

Suggested change
- Configuration that is not required but missing should default to safe, known values.
- Missing optional configuration should fall back to their default values.


## Degraded Public API Behavior

If the SDK has not successfully fetched toggle state from Unleash and bootstrapping and backup loading have failed:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a section on bootstrapping. What does it mean?

Also I would find it clearer if you used Network in this sentence as the section describing it is called "Network Behavior". Or maybe add a small (network) clarifier?

Suggested change
If the SDK has not successfully fetched toggle state from Unleash and bootstrapping and backup loading have failed:
If the SDK has not successfully fetched toggle state from Unleash (network) and bootstrapping and backup loading have failed:


## Hard Initialization

SDKs MAY provide an explicit, non-default initialization method that enforces strict validation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that enforces strict validation

I would follow this explanation and rename "Hard Initialization" to "Strict Initialization".

When using a hard initialization method:

1) url and token MUST be treated as required.
2) The SDK MUST attempt a full fetch from Unleash. If retries are supported in the SDK then the retry limit must be exceeded.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "fetch" cross platform? How about "network request" or something similar?

If retries are supported in the SDK then the retry limit must be exceeded.

If there is a retry-limit then I think that Default Initialization should explicitly deal with it. Maybe I missed it, but I think it would fall under "Rate limiting and retry behavior are governed by existing SDK mechanisms and MUST NOT be bypassed." and I'm not sure how would that interact with "Periodic Warnings". But, that may be about different things that I just don't know much about.

Maybe retries and retry-limit could be a new point 4)?


1) url and token MUST be treated as required.
2) The SDK MUST attempt a full fetch from Unleash. If retries are supported in the SDK then the retry limit must be exceeded.
3) The method MUST NOT resolve/return until at least one successful toggle fetch has occurred.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I'm missing something, but doesn't this contradict the "The method MUST reject, throw, or return an error."?


## Debug / Development Mode Behavior

SDK ecosystems that provide a clear debug or development mode (e.g., Rust debug_assertions, Node non-production environments) MAY enforce stricter validation semantics during soft initialization.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAY enforce stricter validation semantics during soft initialization.

I don't like "soft" as it seems like it isn't proper initialization for some reason.

But, if we use default then I wonder if it wouldn't be confused with default values 🤔

MAY enforce stricter validation semantics during default initialization.
🤷 Not sure.

When using a hard initialization method:

1) url and token MUST be treated as required.
2) The SDK MUST attempt a full fetch from Unleash. If retries are supported in the SDK then the retry limit must be exceeded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a missing 'not' here?

Suggested change
2) The SDK MUST attempt a full fetch from Unleash. If retries are supported in the SDK then the retry limit must be exceeded.
2) The SDK MUST attempt a full fetch from Unleash. If retries are supported in the SDK then the retry limit must not be exceeded.

@gastonfournier gastonfournier moved this from New to In Progress in Issues and PRs Feb 24, 2026

Timeout behavior and retry semantics are governed by existing SDK implementation.

## Debug / Development Mode Behavior
Copy link
Contributor

@kwasniew kwasniew Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two modes (soft + hard):

  • Simpler mental model, each mode behaves consistently regardless of environment
  • Developer chooses behavior explicitly, no surprises
  • Easier to implement, test, and document across SDKs
  • Tradeoff: devs who always use soft init won't get automatic fail-fast during development

Three modes (soft + hard + debug):

  • Catches misconfig automatically during local dev without requiring hard init
  • Tradeoff: soft init becomes context-dependent, harder to reason about
  • Every SDK must detect and document environment semantics
  • More surface area to implement, test, and maintain across SDK ecosystem

Basically: three modes adds a small (maybe big?) developer convenience at the cost of a meaningful increase in complexity and a less predictable soft init contract.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opinion: I'm in explicit over implicit or configuration over convention camp. Two modes is configuration, three modes relies on environment convention. Works like magic until it doesn't match your actual intent

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opinion: developer mode is great and adds a lot of developer convenience. But, I wonder if we can have that by just by changing our examples in README of the SDKs that support that.

For example node sdk could have:

import { initialize } from 'unleash-client';

const unleash = initialize({
  url: 'https://YOUR-API-URL',
  appName: 'my-node-name',
  customHeaders: { Authorization: '<YOUR_API_TOKEN>' },
  strictInitialization: process.env.NODE_ENV === 'development' # throws on errors
});

It also has a clear advantage that each customer can tweak how NODE_ENV test or others should behave.

Co-authored-by: Thomas Heartman <thomas@getunleash.ai>
When using a hard initialization method:

1) url and token MUST be treated as required.
2) The SDK MUST attempt a full fetch from Unleash. If retries are supported in the SDK then the retry limit must be exceeded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If URL is missing should it still attempt a fetch? Isn't is wasteful with empty URL?

Copy link

@krzychukula krzychukula Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the first point was about preventing that?

  1. url and token MUST be treated as required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

5 participants