Skip to main content

ADR-0002 · Secret references and SecretSource adapters

Status: accepted v1.1 (2026-05-03; v1.0 — 2026-05-03) · Full normative text

Amends: ADR-0001 §6 (Secrets handling — Phase 2 placeholder), §8 (Source adapters).

Why a second ADR

ADR-0001 fixed Phase 1 secrets handling as ${ENV_VAR} interpolation plus value masking driven by _meta/secret_patterns.yaml. The env-only model is sufficient for single-process deployments where the operator is willing to pre-stage every credential into the process environment, but operator feedback flagged three real-world gaps:

  1. Centralised secret lifecycle — three services sharing one OPENAI_API_KEY need three env-injection points; rotating the key means rolling all three.
  2. Audit and access control — env vars are visible to sibling processes under the same UID, leak into core dumps and /proc/<pid>/environ, and carry no audit trail.
  3. Per-environment scoping without per-environment YAML — today the operator either commits app-config.staging.yaml with ${OPENAI_API_KEY_STAGING} per environment, or maintains parallel secret stores with identical key names.

ADR-0002 closes those gaps with two parallel interfaces under a single Config.loadFrom([...]) call: ConfigSource keeps returning whole trees; SecretSource resolves single keys lazily. The two are first-class peers; the loader dispatches by interface.

Six key decisions

1. Reference syntax — ${secret:<scheme>[:<path>]}

A new normative interpolation token, parallel to ADR-0001 §2 ${VAR}:

${secret:<scheme>:<path>} → resolved value of <scheme>+<path>
${secret:<scheme>:<path>:-<default>} → literal fallback if reference does not resolve

Examples:

llm:
api_key: "${secret:env:OPENAI_API_KEY}" # passthrough — default scheme = env
fallback: "${secret:env:OPENAI_API_KEY:-sk-dev-placeholder}"
database:
password: "${secret:vault:secret/dagstack/prod/db#password}"
external_api:
token: "${secret:awssm:arn:aws:secretsmanager:eu-west-1:...:secret/openai-key}"
regional: "${secret:gcpsm:projects/foo/secrets/openai/versions/latest}"

Grammar (normative). From _meta/secret_ref_grammar.yaml:

secret_ref := "${" "secret" ":" scheme ":" path_with_query [field_proj] [":-" default] "}"
scheme := [a-z][a-z0-9_]*
path_with_query:= path ["?" query]
path := <any chars except "}", "#"; literal "?" is "??", literal "#" is "##", ":-" is "::-">
query := query_kv ("&" query_kv)*
query_kv := query_key "=" query_value
query_key := [a-z][a-z0-9_]*
query_value := <percent-encoded per RFC 3986; literal "&", "=", "}", "#" MUST be %-encoded>
field_proj := "#" field
field := <any chars except "}">
default := <any chars except "}">

The ?key=value query block is reserved for backend-specific options. Phase 2 normative key: version (Vault only — ?version=3 selects a specific KV v2 version). Bindings MUST reject unknown keys with secret_unresolved. Token order is fixed: path → ?query → #field → :-default → }.

Escape rules. Literal #, ? and :- inside a path segment are escaped by doubling (##, ??, ::-). Literal &, =, }, # inside a query value MUST be percent-encoded per RFC 3986. Standard-library URL helpers (urllib.parse.quote / encodeURIComponent / url.QueryEscape) produce correct encodings.

1.1 The env scheme — backwards compatibility

${secret:env:OPENAI_API_KEY} is semantically identical to ${OPENAI_API_KEY}. The env scheme is a degenerate case of secret resolution implemented by an EnvSecretSource that the loader auto-registers if no explicit one is passed. Migration from Phase 1 is a mechanical sed; no behavioural change.

1.2 Sub-key projection via #field

Most secret managers store a "secret" as a JSON object with multiple fields (Vault KV v2, AWS-SM JSON-typed secrets). The #field syntax projects one sub-key from a multi-key envelope:

${secret:vault:secret/dagstack/prod/db#password}
${secret:vault:secret/dagstack/prod/db#username}

Both references hit the same Vault read; the loader caches by <scheme>:<path-up-to-#> so one round-trip serves both. If #field is omitted and the resolved value is an object, the binding raises secret_unresolved rather than auto-stringifying.

2. The SecretSource contract — separate from ConfigSource

Pseudocode for the contract (each implementation realises it idiomatically):

SecretSource {
scheme: string # short scheme name (matches ${secret:<scheme>:...})
id: string # human-readable identifier (URI-style)

resolve(path: string, ctx: ResolveContext): SecretValue # binding picks sync/async idiom
close?(): void # release resources
watch?(path: string, callback: (SecretChangeEvent) => void): Subscription # Phase 3
}

SecretValue {
value: string # always string at the wire level
version?: string # opaque version id from the backend
expires_at?: ISO-8601 # if the backend returns a TTL
source_id: string # echoed from SecretSource.id for diagnostics
}

ResolveContext {
cancellation?: <binding-native cancellation handle>
deadline?: ISO-8601 / native deadline
attempt: int # 1-based attempt counter
}

Why two interfaces rather than a marker capability on ConfigSource:

  • Type safetyConfigSource.load() -> ConfigTree is total; SecretSource.resolve(path) -> SecretValue is keyed and partial.
  • Watch semantics — config watch is a tree-level event; secret rotation is a key-level versioned event. Two signals, two shapes.
  • Cache lifecycle — config sources cache for the process lifetime; secrets MAY cache with TTL or per-lease.

Sync vs async — per-binding choice (same rule as ADR-0001 §4):

  • Go: Resolve(ctx, path) (SecretValue, error).
  • TypeScript: resolve(path, ctx): Promise<SecretValue>.
  • Python: def resolve(self, path, ctx) — sync by default; a parallel AsyncSecretSource protocol with async def resolve_async is provided for non-blocking event loops.

SecretValue.value is always a string. Type coercion happens at the Config.get* call site, exactly like for env-interpolated values (ADR-0001 §4.4). The binding MUST NOT JSON-parse the value into a sub-tree.

3. SecretRef — opaque placeholder and resolution timing

A ${secret:...} token does not trigger a secret-manager round-trip at Source.load() time. The file source emits a SecretRef placeholder at the corresponding tree leaf:

SecretRef {
scheme: string
path: string # full path including any #field projection
default?: string # the literal after ":-", if any
origin_source: string # ConfigSource.id where this token was found
}

The merged tree may contain SecretRef instances mixed with regular scalars. Resolution happens at one of three points:

TriggerBehaviour
config.get(path) returns a SecretRefThe binding MUST resolve transparently and return the resolved string.
config.get_string / get_int etc.Resolve transparently; apply primitive coercion per ADR-0001 §4.3.
config.get_section(path, schema)Resolve every SecretRef inside the subsection, then run the schema validator.
config.snapshot()Replace every SecretRef with [MASKED] per _meta/secret_patterns.yaml. The reference itself is never resolved by snapshot(). An audit-mode opt-in (include_secrets=True / { includeSecrets: true } / WithIncludeSecrets()) MAY resolve and mask by suffix-pattern only.

Lazy by default with eager opt-in. Per-binding choice:

  • Python — lazy by default; Config.load_from(..., eager_secrets=True) walks the merged tree at load time.
  • TypeScript — eager by default (loadFrom is async; pays the cost up-front).
  • Go — eager by default (same rationale as TypeScript).

Pilot consumer recommendation (long-lived servers): eager mode. Surfacing secret_unresolved at startup is observably better than a 5xx on the first inbound request.

Caching. A binding MUST cache resolved secrets in-process for the lifetime of the Config object, keyed by <scheme>:<full path>. The cache MUST honour expires_at from SecretValue (a value with expires_at in the past is treated as a cache miss).

Forced refresh. config.refresh_secrets() / config.refreshSecrets() / config.RefreshSecrets() drops the cache and triggers re-resolution on next access — the manual rotation hook for Phase 2. Push-based rotation is deferred to Phase 3.

4. Loader integration

Config.load_from / loadFrom accepts a heterogeneous list of ConfigSource and SecretSource instances. The loader dispatches by interface.

Normative loader rules:

  1. Source ordering. ConfigSource order continues to define merge priority (ADR-0001 §3). SecretSource order does not define priority — each scheme has at most one registered source. Two SecretSource instances with the same scheme is a programming error (ConfigError(reason=validation_failed, details="duplicate SecretSource scheme")).
  2. Implicit env source. The loader MUST register a default EnvSecretSource if none is passed explicitly.
  3. Unknown scheme at load time. If a ${secret:<scheme>:...} token uses a scheme with no registered source AND no :-default, the binding raises ConfigError(reason=secret_unresolved) at load time, not at first read.

5. Error model — three new reasons

Three new entries in _meta/error_reasons.yaml:

namevalueWhen
SECRET_UNRESOLVEDsecret_unresolvedReference cannot be resolved (no source, key missing, ?version= destroyed, #field absent).
SECRET_BACKEND_UNAVAILABLEsecret_backend_unavailableBackend unreachable (network, DNS, auth at connect time).
SECRET_PERMISSION_DENIEDsecret_permission_deniedBackend rejected the read with an authorisation error (Vault 403, AWS-SM AccessDeniedException).

Three reasons (not one) because operators react differently:

  • secret_unresolved → check the YAML and the backend key spelling.
  • secret_backend_unavailable → check network / DNS / credentials.
  • secret_permission_denied → check the Vault policy / AWS IAM.

source_id on these errors is the SecretSource.id (e.g. vault:https://vault.example.com), distinct from a ConfigSource.id. The details string also references the YAML file the token came from:

ConfigError(
path = "llm.api_key",
reason = secret_unresolved,
details = "vault:secret/dagstack/prod/openai → 404 Not Found "
"(referenced from yaml:app-config.yaml)",
source_id = "vault:https://vault.example.com",
)

6. Pilot adapter — VaultSource (HashiCorp Vault KV v2)

The first adapter shipped in all three bindings.

  • KV version: KV v2 only in Phase 2. KV v1 lacks versioning and soft-delete; ships in Phase 3 if requested.
  • Auth methods (Phase 2 mandatory): Token, AppRole. Optional: Kubernetes ServiceAccount. Phase 3 adds AWS IAM, JWT/OIDC, TLS client cert.
  • Namespace: passed at construction time (namespace="dagstack/prod"); the adapter prepends automatically.
  • Versioning: ${secret:vault:secret/.../db?version=3#password}. Cache key includes the version.
  • #field projection: pluck a sub-key from a JSON envelope.

SDK choice per binding:

BindingLibraryPackaging
Pythonhvac>=2.0,<3.0pip install 'dagstack-config[vault]'
TypeScriptnode-vault>=0.10npm install @dagstack/config node-vault (optional peer-dep)
Gogithub.com/hashicorp/vault/api (official)go get go.dagstack.dev/config/vault (separate sub-module)

Each binding records its SDK choice in a per-language ADR (adr/0001-vault-source.md) with version constraints and deprecation policy.

Migration story

# Phase 1
llm:
api_key: "${OPENAI_API_KEY}"

Three steps, in operator-effort order:

Step 0 — no change. ${OPENAI_API_KEY} is identical to ${secret:env:OPENAI_API_KEY} (§1.1). Operators with no Vault do nothing.

Step 1 — opt into the secret namespace, still using env.

llm:
api_key: "${secret:env:OPENAI_API_KEY}"

Step 2 — point at Vault.

llm:
api_key: "${secret:vault:secret/dagstack/prod/openai#api_key}"

The YAML wire format keeps ${VAR} working indefinitely. Future bindings may emit a deprecation warning under a strict-mode lint; the syntax remains normative.

A configuration example with secrets

app-config.yaml
llm:
api_key: "${secret:vault:secret/dagstack/prod/openai#api_key}"
database:
host: "${DB_HOST:-localhost}"
password: "${secret:vault:secret/dagstack/prod/db#password}"
fallback_key: "${secret:env:OPENAI_API_KEY:-sk-dev-placeholder}"
main.py
import os

from dagstack.config import Config, YamlFileSource
from dagstack.config.vault import AppRoleAuth, VaultSource

cfg = Config.load_from(
[
YamlFileSource("app-config.yaml"),
VaultSource(
addr="https://vault.example.com",
auth=AppRoleAuth(
role_id=os.environ["VAULT_ROLE_ID"],
secret_id=os.environ["VAULT_SECRET_ID"],
),
namespace="dagstack/prod",
),
# EnvSecretSource auto-registered for ${secret:env:...}
],
eager_secrets=True,
)
api_key = cfg.get_string("llm.api_key")

Consequences

Positive:

  • Operator-grade secrets — Vault (and later cloud secret managers) as first-class config sources.
  • No breaking change${VAR} keeps working; the new syntax is strictly additive.
  • Type safety preservedget_int / get_string / get_section continue to work transparently; secret resolution is invisible to the consumer.
  • Pluggability — the SecretSource interface is what every backend implements; new schemes ship without changing the loader.
  • Audit-ready — every resolution carries source_id and the original YAML location.

Trade-offs:

  • Operational complexity — Vault adds a process dependency. Mitigated by SDK opt-in extras (Python [vault], TS peer-dep, Go sub-module).
  • Spec surface area — three new _meta/*.yaml files, three new error reasons, two new interfaces. The cost of solving the real problem.
  • Resolution-timing surprise — Python's lazy default means a secret error surfaces at first request. Mitigated by recommending eager_secrets=True for long-lived servers and making TS/Go eager-by-default.

Spec-distributed artefacts

New files in _meta/:

Existing _meta/types.yaml gains rows for SecretSource, AsyncSecretSource, SecretRef, SecretValue, SecretChangeEvent, ResolveContext, EnvSecretSource, VaultSource.

Normative source

The full text with all six decisions in detail, the conformance fixture catalogue, cross-binding CI extension, and the open questions tracking sheet: config-spec/adr/0002-secret-references-and-sources.md.