Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,14 @@ repos:
hooks:
- id: terraform_fmt
- id: terraform_validate
# `terraform_validate` fails on first run when a module source is
# pinned to a new commit SHA and `.terraform/modules/` still holds
# the previous install. `--retry-once-with-cleanup` makes the hook
# purge `.terraform/` and re-init on first validate failure, then
# re-run validate — the documented escape hatch from
# antonbabenko/pre-commit-terraform for SHA-pinned module sources.
args:
- --hook-config=--retry-once-with-cleanup=true
- id: terraform_tflint
args:
- --args=--only=terraform_deprecated_interpolation
Expand Down
52 changes: 52 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,58 @@ any operator who runs `tofu apply` without overrides). Enforce explicitly:
tofu apply -var markdown_lint_enforcement=active
```

## State backend

State for the org-rulesets stack lives in **its own dedicated S3 bucket
and is gated by its own scoped IAM role.** Cross-stack state-bucket
sharing is not used; every Terraform stack in this account gets its own
bucket via the `terraform-aws-template` module, on the principle that
"a misapply against this repo cannot reach another stack's state and
its IAM grants don't widen as new stacks come online."

| Component | Value |
| --- | --- |
| State bucket | `tfstate-github-<account-id>` (us-east-2) |
| State key | `github/terraform.tfstate` |
| Bootstrap state key | `_bootstrap/terraform.tfstate` (same bucket) |
| Lock | S3 native (`use_lockfile = true` — no DynamoDB) |
| Encryption | AES256 / SSE-S3 (no KMS) |
| IAM role | `tf-github` — scoped to that one bucket only |

Identity flow:

1. Operator's underlying IAM user (e.g. `terraform`) sits in `~/.aws/config`
as a `source_profile`. MFA is required on this user — the role's
trust policy denies sessions without `aws:MultiFactorAuthPresent`.
2. `aws-vault exec tf-github -- <cmd>` calls AWS STS to assume
`tf-github`. aws-vault prompts for MFA once per session and caches
the STS credentials.
3. The STS credentials reach `tofu` / `terragrunt` via environment
variables. The `aws` provider in the github provider's wire-up
has no work — the github provider uses `GITHUB_TOKEN`, not AWS — but
the backend's S3 access does, and it's scoped to the one bucket.

The aws-vault profile name (`tf-github`) **matches the role name** by
convention. Sibling stacks bootstrapped via `terraform-aws-template`
follow the same pattern: profile name = `tf-<project>` = role name.

Future CI uses GitHub OIDC instead of MFA AssumeRole. The role's trust
policy already accepts `repo:<github_org>/<github_repo>` on push to
the default branch and on pull_request events — no operator user
involvement. `.github/workflows/terragrunt.yml` is not in this repo
yet; when added, it uses `aws-actions/configure-aws-credentials@v4`
with `role-to-assume = arn:aws:iam::<account>:role/tf-github`.

**Never** run this stack with the elevated bootstrap credentials
(`iam-user` or any admin identity). Those are only for one-time
`bootstrap/` applies. All ongoing operations — `terragrunt init`,
`plan`, `apply` — go through `aws-vault exec tf-github`.

First-time setup walkthrough lives in
[`bootstrap/README.md`](bootstrap/README.md). Re-run only when the
template's pinned ref bumps or the role / bucket configuration
intentionally changes.

## Cost policy

**Never apply a policy or enable a feature that costs money unless the
Expand Down
42 changes: 29 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,37 +64,53 @@ upstream via `data "http"`, not committed as a local template.

## Requirements

- **OpenTofu** (>= 1.6) and the `integrations/github` provider, pinned in
`versions.tf`. The dev shell supplies the toolchain via direnv:
- **OpenTofu** (>= 1.10 for S3 native locking) and the
`integrations/github` provider, pinned in `versions.tf`. Dev shell via
direnv:

```bash
git clone git@github.com:dryvist/terraform-github.git
cd terraform-github && direnv allow # provides tofu, terraform, terragrunt
cd terraform-github && direnv allow # provides tofu, terraform, terragrunt, aws-vault
```

- **AWS state backend bootstrapped.** The dedicated state bucket
(`tfstate-github-<account-id>`) and scoped IAM role (`tf-github`) must
exist. First-time setup runs once with elevated AWS admin creds —
see [`bootstrap/README.md`](bootstrap/README.md). After bootstrap, all
ongoing operations use the scoped role via `aws-vault`.

- **`aws-vault` profile `tf-github`** in `~/.aws/config`. Profile shape
is documented in `bootstrap/README.md` → "Hand off to the operator".
Verify with `aws-vault exec tf-github -- aws sts get-caller-identity`
before running terragrunt.

- **`GITHUB_TOKEN` with `admin:org`** (the ORG_ADMIN token tier,
`gh-claude-org-admin`) to create or modify org rulesets. The provider reads it
from the environment.
- **S3 state backend** access — bucket / key / region supplied at init (see Usage).
`gh-claude-org-admin`) for apply. The github provider reads it from
the environment; the default `DRYVIST` tier is read-only on org
rulesets and will `403` on apply.

## Usage

State lives in S3 (org convention). Backend values (bucket / key / region) are
supplied at init — never committed, because the bucket name embeds the AWS
account ID:
Daily flow (after bootstrap):

```bash
tofu init -backend-config=bucket=<state-bucket> \
-backend-config=key=terraform-github/terraform.tfstate \
-backend-config=region=us-east-2
aws-vault exec tf-github -- terragrunt init # one-time per worktree
aws-vault exec tf-github -- terragrunt plan
aws-vault exec tf-github -- terragrunt apply
```

Validation needs no backend or credentials:
`terragrunt.hcl` resolves the state bucket name from
`get_aws_account_id()` at runtime, so no account identifier is committed.

Validation only (no backend, no credentials):

```bash
tofu init -backend=false && tofu validate
```

First-time setup (one-time, elevated AWS creds): see
[`bootstrap/README.md`](bootstrap/README.md).

**Rolling out a rule safely.** Org-wide enforcement can block merges everywhere
at once. For `markdown_lint_enforcement` (legacy default `evaluate`), use the
dry-run gate before enforcing:
Expand Down
164 changes: 164 additions & 0 deletions bootstrap/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# bootstrap

One-time provisioning of the AWS state backend (S3 bucket) and scoped IAM
role (`tf-github`) that the parent terraform-github stack uses for all
subsequent operations. Calls the canonical
[dryvist/terraform-aws-template](https://github.com/dryvist/terraform-aws-template)
module at the pinned `v0.1.0` ref.

Run this **once**, with elevated AWS admin credentials. After the apply
succeeds and state is migrated into the new bucket, this directory is
effectively read-only — re-runs only happen if the template's pinned ref
bumps or if the IAM role / bucket need a documented configuration change.

```mermaid
graph LR
EU["Elevated AWS user<br/>(temporary, admin)"] -->|tofu apply<br/>local state| BS[bootstrap/]
BS -->|creates| OIDC[aws_iam_openid_<br/>connect_provider.github]
BS -->|calls module| MOD["terraform-aws-template v0.1.0"]
MOD -->|creates| BKT["S3 bucket<br/>tfstate-github-&lt;account&gt;"]
MOD -->|creates| ROLE["IAM role tf-github"]
ROLE -.->|trust + MFA| OP["IAM user terraform<br/>(operator)"]
ROLE -.->|trust OIDC| GHA["GitHub Actions<br/>repo:&lt;org&gt;/&lt;repo&gt;"]
```

## Requirements

Run `aws sts get-caller-identity` from this directory's shell. The ARN must
be an admin / elevated user with permission to:

- Create / read S3 buckets and bucket policies
- Create IAM roles, inline policies, and OIDC providers
- Read OIDC providers (for the import path if one already exists)

The operator IAM user that goes into `operator_user_arns` must have MFA
enabled in IAM. The role's trust policy enforces `aws:MultiFactorAuthPresent`,
so MFA-less sessions cannot AssumeRole regardless of the operator's other
permissions.

Tooling:

- OpenTofu ≥ 1.10 (for `use_lockfile`)
- `aws` CLI ≥ 2.x

## Usage

End-to-end flow: one-time bootstrap apply, then state migration, then a
one-time `~/.aws/config` profile addition. After that, the operator runs the
parent stack via `aws-vault exec tf-github -- terragrunt …`.

### 1. Apply the bootstrap

```bash
cd bootstrap
cp terraform.tfvars.example terraform.tfvars # then edit with real values
tofu init
tofu apply
```

Expected resources on a clean account:

| Resource | What it is |
| --- | --- |
| `aws_iam_openid_connect_provider.github` | Account-wide GitHub Actions OIDC provider |
| `module.state_backend.aws_s3_bucket.state` | `tfstate-github-<account>` (AES256, versioning, public-access-blocked, TLS-only, 90-day noncurrent expiry) |
| `module.state_backend.aws_iam_role.terraform` | `tf-github` with combined trust (OIDC for the repo + MFA AssumeRole from the operator user) |
| `module.state_backend.aws_iam_role_policy.state` | Inline policy scoped to the new bucket only |

**OIDC provider already exists?** If another stack created it, the apply
errors with `EntityAlreadyExists`. Import once, then re-run apply:

```bash
tofu import aws_iam_openid_connect_provider.github \
arn:aws:iam::<account-id>:oidc-provider/token.actions.githubusercontent.com
tofu apply
```

### 2. Migrate state into the new bucket

After the first apply, the bootstrap state is still local
(`bootstrap/terraform.tfstate`). Move it into the bucket it just created:

1. Capture the backend config from outputs:

```bash
tofu output -raw backend_config
```

2. In `versions.tf`, uncomment the `backend "s3"` block and replace the
`<account-id>` placeholder in the bucket name with your real account id
(or paste the entire block from step 1's output).

3. Migrate:

```bash
tofu init -migrate-state
```

Confirm `yes` when prompted. The local `terraform.tfstate` lifts into
`s3://tfstate-github-<account>/_bootstrap/terraform.tfstate`. Delete
the local file once migration succeeds; the gitignored backup is fine
to keep until you trust the migration.

### 3. Hand off to the operator

Once the role exists, the operator (whose IAM user ARN was in
`operator_user_arns`) needs a matching `aws-vault` profile. Add to
`~/.aws/config`:

```ini
[profile tf-github]
role_arn = arn:aws:iam::<account-id>:role/tf-github
source_profile = <operator-source-profile>
mfa_serial = arn:aws:iam::<account-id>:mfa/<operator-user>
region = us-east-2
```

`<operator-source-profile>` is whatever profile holds the operator IAM
user's static keys (e.g. `terraform`). `<operator-user>` matches the
user portion of `operator_user_arns[i]`. The MFA serial is found at
IAM → Users → operator → Security credentials → "Assigned MFA device".

Verify:

```bash
aws-vault exec tf-github -- aws sts get-caller-identity
```

The returned ARN should be `arn:aws:sts::<account-id>:assumed-role/tf-github/<session>`,
not the operator user.

### 4. Run the parent stack

From the repo root (one level up):

```bash
aws-vault exec tf-github -- terragrunt init
aws-vault exec tf-github -- terragrunt plan
aws-vault exec tf-github -- terragrunt apply
```

`terragrunt.hcl` at the root is already configured to point at
`tfstate-github-<account>` under key `github/terraform.tfstate`. No
further edits required after the bootstrap.

### 5. What to do (or not) on subsequent changes

- Bumping the template's pinned ref → edit `module.state_backend.source`, run
`tofu plan` to see the diff, apply if benign.
- Adding more operator users → append ARNs to `operator_user_arns` in
`terraform.tfvars`, apply. The role's trust policy updates in place.
- Adding `.github/workflows/terragrunt.yml` to the parent stack → no
bootstrap change. The role already trusts `repo:<github_org>/<github_repo>`
on push to `main` and on pull_request.
- Wanting to widen `branch_pattern` → edit `terraform.tfvars`, apply.
- Wanting to widen the role's permissions → DON'T. Template is scoped to
one bucket on purpose. New AWS responsibilities = new role from the
template, not policy creep on this one.

## Cost

Free for this size. See the parent stack's
[`AGENTS.md` cost-policy section](../AGENTS.md#cost-policy) for the
matrix. S3 storage for one tiny state file + a few noncurrent versions is
~$0/month, no KMS, no DynamoDB.
43 changes: 43 additions & 0 deletions bootstrap/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# GitHub Actions OIDC provider — prerequisite for the IAM role's OIDC
# trust statement. Account-wide singleton: one provider per AWS account
# serves every GitHub repo that needs OIDC. If another stack in this
# account already created this provider, this resource's create will
# fail with "EntityAlreadyExists"; resolve by importing once:
#
# tofu import aws_iam_openid_connect_provider.github \
# arn:aws:iam::<account-id>:oidc-provider/token.actions.githubusercontent.com
#
# Then re-run apply. Subsequent applies are no-ops on this resource.
resource "aws_iam_openid_connect_provider" "github" {
url = "https://token.actions.githubusercontent.com"
client_id_list = ["sts.amazonaws.com"]
}

# State backend + scoped IAM role, provisioned via the canonical template.
# Project name is intentionally short ("github") — the template derives
# the bucket name as `tfstate-${project}-${account}` and the role name
# as `tf-${project}`, so the final names are `tfstate-github-<account>`
# and `tf-github`. The consuming repo's terragrunt.hcl points its state
# at this bucket under key `github/terraform.tfstate`; the bootstrap's
# own state lives at `_bootstrap/terraform.tfstate` after migration.
#
# depends_on ensures the OIDC provider exists before the role's trust
# policy references it.
module "state_backend" {
# Pinned to the commit SHA that v0.1.0 points to, per CKV_TF_1: module
# sources from a git URL should pin to an immutable commit, not a tag
# name (tags can be force-pushed). The trailing comment records the
# human-readable tag this SHA materialized from so future bumps are
# traceable. Update both together when bumping.
source = "git::https://github.com/dryvist/terraform-aws-template.git?ref=c85894b3667cc753a3d5ac07b50e9a7be9302331" # v0.1.0

project = "github"
github_org = var.github_org
github_repo = var.github_repo
branch_pattern = var.branch_pattern
aws_region = var.aws_region

operator_user_arns = var.operator_user_arns

depends_on = [aws_iam_openid_connect_provider.github]
}
29 changes: 29 additions & 0 deletions bootstrap/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
output "state_bucket" {
description = "S3 bucket the consuming repo writes its state to."
value = module.state_backend.state_bucket
}

output "state_bucket_arn" {
description = "ARN of the state bucket."
value = module.state_backend.state_bucket_arn
}

output "tf_role_arn" {
description = "IAM role ARN the operator assumes (via aws-vault + MFA) and CI assumes (via GitHub OIDC) to run terraform-github."
value = module.state_backend.tf_role_arn
}

output "aws_region" {
description = "Region where the state bucket lives."
value = module.state_backend.aws_region
}

output "state_key_prefix" {
description = "Prefix the consuming repo writes its state objects under."
value = module.state_backend.state_key_prefix
}

output "backend_config" {
description = "Ready-to-paste `terraform { backend \"s3\" {} }` block for the consuming repo's backend configuration. Use this to fill the commented-out backend block in `versions.tf` before running `tofu init -migrate-state`."
value = module.state_backend.backend_config
}
3 changes: 3 additions & 0 deletions bootstrap/providers.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
provider "aws" {
region = var.aws_region
}
Loading
Loading