-
Notifications
You must be signed in to change notification settings - Fork 67
docs: config #375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
docs: config #375
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,170 @@ | ||
| # Provider Config Proposal | ||
|
|
||
| ## Requirements | ||
|
|
||
| - Provider should be configured by one config. | ||
| - Single source of truth that can be watched by provider; changes in the source must be propagated to all watching providers. | ||
| - Hot reload - provider should apply config changes on the fly. | ||
| - Config must not be public. | ||
| - Follow the inventory operator config style. | ||
|
|
||
| ## Questions to clarify | ||
|
|
||
| - Do we want to share a single config between providers located in different K8s clusters? (If yes, ConfigMap is not suitable.) | ||
|
|
||
| ## Terminology | ||
|
|
||
| - **Ops** (human operator): Person who runs and maintains the provider. Receives notifications, decides when to restart. | ||
| - **Startup config**: Values (e.g. cluster.k8s, manifest_namespace) that require a restart to take effect. | ||
| - **Runtime config**: Values that can be reloaded on the fly without restart. | ||
|
|
||
|
|
||
| ## Proposed solution | ||
|
|
||
| ### Config format | ||
|
|
||
| YAML format with subsections per module (like inventory operator). | ||
|
|
||
| <details> | ||
| <summary>Expand to see config example</summary> | ||
|
|
||
| ```yaml | ||
| version: v1 | ||
| cluster: | ||
| k8s: true | ||
| manifest_namespace: lease | ||
| public_hostname: "" | ||
| node_port_quantity: 1 | ||
| wait_ready_duration: 5s | ||
| overcommit: | ||
| cpu: 0 | ||
| memory: 0 | ||
| storage: 0 | ||
| deployment: | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is this relating to specifically? Provider deployment?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That section is related to those flags: link As I understand, it is settings for tenant workloads (how the provider deploys leases to K8s), not the provider’s own deployment. |
||
| ingress_static_hosts: false | ||
| ingress_domain: "" | ||
| ingress_expose_lb_hosts: false | ||
| network_policies_enabled: true | ||
| runtime_class: gvisor | ||
| blocked_hostnames: [] | ||
| docker_image_pull_secrets: "" | ||
|
|
||
| bidengine: | ||
| pricing_strategy: scale | ||
| deposit: "5000000uakt" | ||
| timeout: 5m | ||
| scale: | ||
| cpu: "0" | ||
| memory: "0" | ||
| storage: "0" | ||
| endpoint: "0" | ||
| ip: "0" | ||
|
|
||
| gateway: | ||
| listen_address: "0.0.0.0:8443" | ||
| grpc_listen_address: "0.0.0.0:8444" | ||
| tls: | ||
| cert: "" | ||
| key: "" | ||
|
|
||
| monitor: | ||
| max_retries: 40 | ||
| retry_period: 4s | ||
| retry_period_jitter: 15s | ||
| healthcheck_period: 10s | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. health check is performed by readiness/liveness probes. What is this specifically setting?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This corresponds to that flag: link This is not K8s readiness/liveness probes, but the check conducted by the deployment against tenant workloads. |
||
| healthcheck_period_jitter: 5s | ||
|
|
||
| balance_checker: | ||
| withdrawal_period: 24h | ||
| lease_funds_check_interval: 10m | ||
|
|
||
| cert_issuer: | ||
| enabled: false | ||
|
|
||
| # ... other sections | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Future Gateway API settings |
||
| ``` | ||
|
|
||
| </details> | ||
|
|
||
| ### Hot reload | ||
|
|
||
| Decisions: | ||
|
|
||
| 1. **Auto-restart on config change?** No - to avoid unexpected downtime. Notify Ops; they restart when ready. | ||
|
|
||
| 2. **Mixed change (runtime + startup):** Apply runtime values only. Ops must be notified that restart is required. | ||
|
|
||
| 3. **Module re-init without full process restart?** Possibly yes, in a later iteration. Cluster and bidengine have shared state, so a clean restart is recommended for those modules. Some values (e.g. listen address) could be applied without restart by redesign - start new server, close old one. | ||
|
|
||
| **Restart notification** (when startup config changes, notify Ops; they restart when ready) | ||
|
|
||
| - **Flow**: | ||
| - Provider loads config, watches or polls for changes | ||
| - Provider detects config change; runtime config is applied immediately, | ||
| - If there is a startup config change, Provider emits `provider_config_restart_required=1` metric (or K8s Event when in-cluster) - passive marker, no traffic drain | ||
| - Prometheus or other monitoring tool alerts Ops (Slack, PagerDuty, etc.) | ||
| - Ops restarts when ready | ||
|
|
||
| ## Solution comparison | ||
|
|
||
| ### By scenario | ||
|
|
||
| | Scenario | Best fit | | ||
| |----------|----------| | ||
| | **Single cluster** | ConfigMap + K8s watch | | ||
| | **Multi-cluster, minimal infra** | S3 + poll | | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure I understand the multi-cluster setup in this context. Does it mean a shared configuration between providers?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can imagine an operator that has two providers in physicality different clusters. Why I am separating this, because K8S config maps works only within the same cluster, and it is the simplest and k8s native solution that I would consider in case we go with single cluster appproach. |
||
| | **Multi-cluster, near-instant updates** | Redis, Consul or Vault | | ||
|
|
||
|
|
||
| ### S3 vs Other Config Sources | ||
|
|
||
| | Criteria | S3 + poll | ConfigMap + K8s watch | Redis | Consul | HTTP + poll | Vault | | ||
| |----------|-----------|------------------------|-------|--------|-------------|-------| | ||
| | **Single source of truth** | Yes | Per cluster | Yes | Yes | Yes | Yes | | ||
| | **Multi-cluster** | Yes | No | Yes* | Yes | Yes | Yes | | ||
| | **Watch / push** | No (poll) | Yes | Yes (pub/sub) | Yes | No (poll) | Yes (KV watch) | | ||
| | **Auth** | Access key or IAM | K8s SA + RBAC | Password | ACL token | Bearer, mTLS, OAuth2 | AppRole, K8s auth | | ||
| | **Auth: set once** | Yes (key) | Yes (SA) | Yes (password) | Yes (token) | Depends | Yes (AppRole) | | ||
| | **Auth: outside cloud** | Access key | N/A (in-cluster) | Password | Token | Bearer, mTLS | AppRole | | ||
| | **Infra to run** | None (managed) | None | Redis | Consul | HTTP server | Vault | | ||
| | **Provider deps** | AWS SDK | K8s client (existing) | redis client | consul client | net/http | vault client | | ||
| | **Complexity** | Low | Low | Medium | Medium | Low-Medium | High | | ||
| | **Max config delay** | Poll interval (e.g. 30s) | Seconds | Seconds | Seconds | Poll interval | Seconds | | ||
|
|
||
| \* Redis must be reachable from all clusters (shared instance or replication). | ||
|
|
||
|
|
||
|
|
||
| ### Trade-offs | ||
|
|
||
| | Solution | Pros | Cons | | ||
| |----------|------|------| | ||
| | **S3** | No extra infra, managed, multi-cloud, simple auth | Polling only, config delay up to poll interval | | ||
| | **ConfigMap** | Native K8s, real-time watch, no secrets | Single cluster only | | ||
| | **Redis** | Pub/sub, fast updates, simple auth | Run and operate Redis | | ||
| | **Consul** | KV + watch, multi-datacenter, ACL | Run and operate Consul | | ||
| | **HTTP** | Flexible, any backend | Need server + watch/poll strategy | | ||
| | **Vault** | Strong auth, KV watch | Heavy, more setup | | ||
|
|
||
| ## Migration plan | ||
|
|
||
| 1. **Phase 1 - Struct + loader**: Define Go structs for config, implement YAML loader. Keep flags; map flags to struct fields during transition. | ||
| 2. **Phase 2 - Remote source**: Add S3/ConfigMap backend as primary config source. Flags override remote values (backward compat). | ||
| 3. **Phase 3 - Remove flags**: Deprecate individual flags; remote config becomes the only input. Env vars for secrets only (e.g. `AKASH_PROVIDER_KEY`). | ||
| 4. **Phase 4(optional) - File fallback**: Add optional local file for dev; used when remote is not configured or unreachable. | ||
|
|
||
| ## Go implementation | ||
|
|
||
| - **YAML parsing**: `gopkg.in/yaml.v3` (already in go.mod) | ||
| - **Config struct + merge**: Custom structs with `mapstructure` tags; `github.com/go-viper/mapstructure/v2` for YAML-to-struct | ||
| - **File watch**: `fsnotify` or `github.com/fsnotify/fsnotify` for local file; K8s watch for ConfigMap; S3 poll | ||
| - **No Viper for new config**: Current code uses `spf13/viper` with flags. New design: explicit load (YAML unmarshal + optional merge), no Viper. Simplifies precedence and avoids flag/config coupling. | ||
|
|
||
| ## Local override of global config | ||
|
|
||
| Use case: global config (S3/ConfigMap) shared by providers; one provider needs different values (e.g. dev, debugging, cluster-specific). | ||
|
|
||
| | Solution | How it works | Pros | Cons | | ||
| |----------|--------------|------|------| | ||
| | **Override file** | Load global first, then `config.local.yaml` (or path from `--config-override`). Deep-merge; local wins. | Simple, explicit, no extra infra | Two files to manage; override path must be passed | | ||
| | **Env per field** | `CLUSTER_DEPLOYMENT_INGRESS_DOMAIN=dev.example.com` overrides `cluster.deployment.ingress_domain`. | No extra files, 12-factor | Verbose for nested keys; env proliferation | | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This config example is general, not precise, but it shows how the future config format may look like