Terraform Standards

An opinionated set of standards for authoring, structuring, testing, and publishing Terraform modules and workspace configurations. Every opinion here is grounded in HashiCorp’s official style guide and recommended practices . Where this document goes further than the official guidance, it is marked explicitly.

Scope: The module-level rules (file structure, variable patterns, dynamic blocks) apply most strictly to public registry modules. The pipeline and state rules apply to workspace (root module) configurations.

Cloud portability: The examples target Azure (the azurerm, azapi, azuread, and msgraph providers) because that is our primary platform. The standards themselves are provider-agnostic - file layout, variable and validation patterns, for_each over count, module design, state separation, testing, and the PR/CI gates apply unchanged to AWS (aws), Google Cloud (google), Kubernetes, or any other provider. Swap the resources and the provider block to match your target. This is a Terraform standard that uses Azure for its examples, not an “Azure Terraform” standard.

Engine portability: Everything here applies equally to OpenTofu, the open-source fork (same HCL, same module and state model, same CLI surface), and to orchestrators layered on top such as Terragrunt and Atmos. Where a feature is version-gated (for example import blocks or check blocks), use the equivalent OpenTofu release. Pick whichever engine your organisation has standardised on - the rules do not change.

CI portability: The pipeline examples use GitHub Actions, but the model (fmt and validate and lint and scan, then plan, then a gated apply of the reviewed plan) is identical on Azure DevOps, GitLab CI, and any other runner. Only the YAML syntax and the OIDC wiring differ.

Why standards?

Consistent Terraform code is not about aesthetics. As HashiCorp’s recommended practices describe, the two core challenges in infrastructure at scale are technical complexity (different provider APIs) and organisational complexity (multiple teams, parallel work). Standards directly address the second:

Engineers can read and review modules they did not write
Modules can be composed without surprises about their interface
CI pipelines can validate code mechanically without human gatekeeping
Module upgrades are predictable - callers know what to expect

File Structure

Reusable module

A module is a self-contained directory with a defined interface. Every module must contain these files:

PLAINTEXT

terraform-<provider>-<resource-name>/
├── main.tf           # Resource definitions only. No variable or output declarations.
├── variables.tf      # All input variable declarations.
├── outputs.tf        # All output declarations.
├── terraform.tf      # terraform {} block: required_version + required_providers.
├── README.md         # Required for registry publish. Generated by terraform-docs.
└── examples/
    └── complete/
        ├── main.tf   # A working end-to-end example calling the module.
        ├── variables.tf
        └── outputs.tf

Optional files added when needed:

PLAINTEXT

├── locals.tf         # Local value definitions when they are substantial enough to split out.
├── data.tf           # data source lookups that inform resource configuration.
├── moved.tf          # moved {} blocks when refactoring resource addresses without destroy.
├── CHANGELOG.md      # Semantic version history. Required for registry uploads.
└── tests/
    └── *.tftest.hcl  # Native terraform test files.

Rule: Never put variable declarations in main.tf or resource definitions in variables.tf. The file split is the contract - it tells a reader exactly where to look without grepping. (HashiCorp style guide - Files and configuration structure )

Workspace configuration (root module)

A workspace configuration is what actually gets applied against an environment. It is not a reusable module - it calls modules and wires in real environment values.

PLAINTEXT

infra/
├── main.tf           # Module calls and any top-level resource definitions.
├── variables.tf
├── outputs.tf
├── terraform.tf      # required_version + required_providers only.
├── providers.tf      # provider {} blocks with auth and feature flags.
├── backend.tf        # Remote state backend - kept separate for easy override.
├── override.tf       # LOCAL DEV ONLY. Never commit. Add to .gitignore.
└── env/
    ├── dev.tfvars
    ├── staging.tfvars
    └── prod.tfvars   # Non-sensitive defaults only. Never commit secrets.

Provider Pinning

Every terraform.tf must declare required_version and required_providers with explicit version constraints. (HashiCorp - Provider version constraints )

HCL

# terraform.tf
terraform {
  required_version = ">= 1.9.0, < 2.0.0"
 
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = ">= 4.0.0, < 5.0.0"
    }
    random = {
      source  = "hashicorp/random"
      version = ">= 3.6.0, < 4.0.0"
    }
  }
}

Constraint style	When to use
`>= X.Y.0, < X+1.0.0`	Reusable modules - accept any compatible patch/minor, block breaking majors
`~> X.Y.0`	Workspace configs - allows only patch updates within `X.Y`, tighter pin
`= X.Y.Z`	Only for a diagnosed provider regression - overly brittle, blocks security fixes

Rule: Never use >= X alone without an upper bound in a module. Callers cannot safely adopt your module if it may silently start using a breaking provider version.

The provider configuration block (features {}, credentials, alias) belongs in the workspace root, not in a reusable module. Pass an aliased provider into a child module via providers = {} if the module genuinely needs it.

Provider selection - azurerm vs azapi

Choose a single Azure provider and commit to it across your entire Terraform codebase. Both azurerm and azapi are valid but have different tradeoffs. The decision is primarily: does azurerm support all the resources you need?

Provider	Speed	Validation	Safety	Use case
azurerm	Slower - uses pre-built abstractions	Strong - validates against provider schema	Safer - fewer edge cases	Stable, well-tested resources; most production workloads
azapi	Faster - calls ARM API directly	Weak - minimal validation	Own issues - less stable, more edge cases	Resources not yet in azurerm, cutting-edge features

How to decide:

Start with azurerm. It covers the vast majority of Azure services with stable, well-tested resource definitions.
Switch to azapi only if azurerm lacks a resource you need. Check the azurerm documentation and GitHub issues first.
Once chosen, commit to it. Mixing providers within the same workspace or module creates cognitive overhead, increases testing burden, and makes code harder to review.

When mixing is necessary:

If a service is available in both providers, always use azurerm. If a service exists only in azapi (e.g. a very new Azure feature), use azapi only for that resource and azurerm for everything else. Document the exception in a code comment.

HCL

# Example: using azurerm for all resources except one new service only in azapi
terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
    azapi = {
      source  = "azure/azapi"
      version = "~> 2.0"
    }
  }
}
 
# Standard resources with azurerm
resource "azurerm_resource_group" "this" {
  name     = "rg-ldo-uks-prd"
  location = "uksouth"
}
 
# New resource only available in azapi - document why
resource "azapi_resource" "new_service" {
  # This service is not yet available in the azurerm provider (v4.x)
  type = "Microsoft.NewService/services@2024-01-01"
  name = "new-service-instance"
  # ...
}

Entra ID with the azuread and msgraph providers

Resource Manager resources (Microsoft.*) live in azurerm/azapi. Entra ID (Azure AD) directory objects - app registrations, service principals, groups, directory roles, conditional access - live in a separate API (Microsoft Graph) and need a Graph-aware provider. Terraform has two, and the choice mirrors the azurerm-vs-azapi decision exactly:

Provider	Source	Role	Use case
azuread	`hashicorp/azuread`	High-level, validated, schema-backed	Common directory objects - applications, service principals, users, groups. Start here.
msgraph	`Microsoft/msgraph`	Thin, generic layer over the Graph REST API	Graph features `azuread` does not yet model (PIM, some M365/SharePoint Graph resources). Drop to this only when needed.

Rule: msgraph is to azuread what azapi is to azurerm - a low-level escape hatch with automatic support for new resource types but weaker validation. Start with azuread; reach for msgraph (or azapi, which can also create some Graph resources) only for what azuread cannot yet express, and document the exception in a comment.

HCL

# terraform.tf - both Graph providers alongside azurerm
terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
    azuread = {
      source  = "hashicorp/azuread"
      version = "~> 3.0"
    }
    msgraph = {
      # Public preview at time of writing - pin tightly and review before bumping.
      source  = "Microsoft/msgraph"
      version = "~> 0.1"
    }
  }
}

Both Graph providers authenticate the same way as azurerm - OIDC in CI, no stored secret - and the provider configuration block belongs in the workspace root, never in a reusable module:

HCL

# providers.tf - workspace root only
provider "azuread" {
  use_oidc = true   # same ARM_/AZURE_ OIDC env vars as the azurerm provider
}
 
provider "msgraph" {
  use_oidc = true
}
 
# High-level object via azuread (preferred)
resource "azuread_application" "api" {
  display_name = "app-ldo-api-prd"
}
 
# A Graph resource azuread does not model yet, via the generic msgraph provider.
# Documented because it is the lower-level escape hatch.
resource "msgraph_resource" "pim_role_setting" {
  url = "policies/roleManagementPolicies"
  # body = { ... } - raw Microsoft Graph payload
}

Note: Granting the deploying identity Graph permissions (for example Application.ReadWrite.All) is a privileged operation. Scope it tightly, and keep Entra ID changes behind the same CODEOWNER and plan-review gates as everything else - a directory change has a wider blast radius than a single resource group.

Version management on developer machines

The required_version constraint in terraform.tf is the authoritative version declaration - it enforces the correct Terraform version at runtime regardless of what is installed locally. Managing multiple versions across different projects still needs a version manager on the machine itself.

tenv is the recommended version manager. It supports Terraform, OpenTofu, Terragrunt, and Atmos in a single tool, reads .terraform-version files to auto-switch on directory change, and installs binaries from official HashiCorp releases. Prefer it over the older tfenv, which is Terraform-only and no longer actively maintained.

Bash

# Install (macOS)
brew install tenv
 
# Install (Linux - adjust arch as needed)
TENV_VERSION=$(curl -s https://api.github.com/repos/tofuutils/tenv/releases/latest | grep tag_name | cut -d'"' -f4)
curl -Lo /usr/local/bin/tenv \
    "https://github.com/tofuutils/tenv/releases/download/${TENV_VERSION}/tenv_${TENV_VERSION}_linux_amd64"
chmod +x /usr/local/bin/tenv
 
# Install a specific Terraform version
tenv terraform install 1.9.8
 
# Pin globally
tenv terraform use 1.9.8
 
# Pin to the current directory only (writes .terraform-version)
tenv terraform use 1.9.8 local
 
# Auto-detect: reads required_version from terraform.tf and installs if missing
tenv terraform detect

Pin the version per-repo by committing a .terraform-version file at the repo root:

PLAINTEXT

# .terraform-version
1.9.8

tenv reads this file and switches automatically. The hashicorp/setup-terraform GitHub Actions action also reads .terraform-version automatically - this means the same file pins the version in CI and on developer machines with no extra configuration.

For HCP Terraform and Terraform Stacks customers: The workspace or stack configuration controls the Terraform version centrally. Runs that execute in HCP do not need a local version manager - the platform handles version selection. tenv remains useful for local plan previews and module development outside of HCP runs.

Legacy alternative: tfenv works but is Terraform-only and unmaintained. Migrate to tenv for new setups and existing repos that need multi-tool version management.

Provider authentication - standardise on OIDC

All CI/CD pipelines must authenticate to Azure using OIDC (federated identity) - never a client secret, certificate, or long-lived key. OIDC credentials are ephemeral tokens issued per-job with no secret stored anywhere. (HashiCorp - OIDC with Azure )

HCL

# providers.tf - provider config in root module only, never in reusable modules
provider "azurerm" {
  features {}
 
  # OIDC auth - ARM_CLIENT_ID, ARM_TENANT_ID, ARM_SUBSCRIPTION_ID set via env vars.
  # ARM_USE_OIDC=true set in the pipeline environment.
  # No client secret, no certificate, no ARM_CLIENT_SECRET.
  use_oidc = true
}

Required pipeline environment variables (no secrets):

Bash

ARM_USE_OIDC=true
ARM_CLIENT_ID=<app-registration-client-id>
ARM_TENANT_ID=<tenant-id>
ARM_SUBSCRIPTION_ID=<subscription-id>
# ARM_OIDC_TOKEN is injected automatically by the runner (GitHub, Azure DevOps)

GitHub Actions - federated credential configuration:

YAML

permissions:
  id-token: write   # Required for OIDC token request
  contents: read
 
steps:
  - name: Azure login (OIDC - no secrets)
    uses: azure/login@v2
    with:
      client-id: ${{ vars.AZURE_CLIENT_ID }}       # non-secret - use vars not secrets
      tenant-id: ${{ vars.AZURE_TENANT_ID }}
      subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }}

The federated credential on the App Registration in Entra ID must match the exact subject claim of the runner (repo:org/repo:ref:refs/heads/main for GitHub, or equivalent for Azure DevOps).

Why not client secrets:

Method	Expiry	Rotation	Risk on leak	Verdict
OIDC / federated	Per-job (~10 min)	Automatic	Minimal - token is already expired	Use this
Managed Identity	N/A	Automatic	None - no credential issued	Use for Azure-hosted runners
Client secret	0-24 months	Manual	Full subscription access until rotated	Never
Certificate	0-24 months	Manual	Full subscription access until rotated	Never

Rule: If the pipeline runner is itself an Azure resource (self-hosted agent on an AKS node, Azure VM, Container App), use Managed Identity instead of OIDC - no credential of any kind is needed. OIDC is for external runners (GitHub-hosted, Azure DevOps Microsoft-hosted) that cannot carry a managed identity.

`providers.tf` - provider configuration

All provider {} blocks live in providers.tf in the workspace root. This separates provider configuration (credentials, feature flags, aliases) from the version requirements in terraform.tf and the resource definitions in main.tf.

providers.tf is the right place for:

The provider "azurerm" {} block with features {} and auth settings
Additional provider declarations (azapi, random, azuread, etc.)
Provider aliases for multi-subscription or multi-region deployments

providers.tf is not the right place for:

required_providers constraints - those belong in terraform.tf
Resource definitions - those belong in main.tf

HCL

# providers.tf
 
provider "azurerm" {
  features {
    # Prevent accidental resource group deletion when it still has resources
    resource_group {
      prevent_deletion_if_contains_resources = true
    }
    # Never purge Key Vaults on destroy - allows recovery from accidental deletes
    key_vault {
      purge_soft_delete_on_destroy    = false
      recover_soft_deleted_key_vaults = true
    }
    # Prevent accidental VM deletion through the API
    virtual_machine {
      delete_os_disk_on_deletion     = true
      graceful_shutdown              = false
      skip_shutdown_and_force_delete = false
    }
  }
 
  subscription_id = var.subscription_id
  use_oidc        = true
}
 
provider "azapi" {
  subscription_id = var.subscription_id
  use_oidc        = true
}
 
provider "azuread" {
  use_oidc = true
}
 
# Multi-subscription alias - reference with provider = azurerm.secondary
provider "azurerm" {
  alias           = "secondary"
  subscription_id = var.secondary_subscription_id
  use_oidc        = true
  features {}
}

The features {} block values above are opinionated production defaults. Every organisation should decide their own features {} policy and keep it consistent across all workspace configurations - this is one of the few places where organisation-wide defaults make sense as a shared module or template.

`override.tf` - local development

override.tf is a Terraform language feature that merges over the top of existing configuration without modifying any committed file. Terraform processes all *.override.tf and override.tf files last, so their declarations win over the equivalent blocks in other files. (HashiCorp - Override files )

override.tf must be in .gitignore. It is machine-specific and will cause CI failures or auth errors if committed.

GITIGNORE

# .gitignore - Terraform override files
override.tf
override.tf.json
*_override.tf
*_override.tf.json

Common local dev use case - replace OIDC with CLI auth:

In CI the workspace uses OIDC. Locally, engineers authenticate via az login or terraform login. Rather than modifying providers.tf, create an override.tf that replaces the provider block:

HCL

# override.tf - LOCAL DEV ONLY - gitignored
# Replaces the OIDC provider config in providers.tf with CLI auth.
# Run `az login` and `az account set --subscription <id>` before applying.
 
provider "azurerm" {
  features {}
  subscription_id = "00000000-0000-0000-0000-000000000000"  # your dev subscription
  # use_oidc omitted - azurerm falls back to CLI / environment auth
}
 
provider "azapi" {
  subscription_id = "00000000-0000-0000-0000-000000000000"
}

Common local dev use case - replace remote backend with local state:

CI uses the remote azurerm backend. Locally, a remote backend requires storage account access and a lock during development. Override it with a local backend to iterate quickly:

HCL

# backend_override.tf - LOCAL DEV ONLY - gitignored
# Switches to local state so you can plan/apply without remote backend access.
# WARNING: do not mix local and remote state for the same environment.
 
terraform {
  backend "local" {
    path = "terraform.tfstate.local"
  }
}

After adding a backend override you must re-run terraform init to reinitialise with the new backend. The local state file (terraform.tfstate.local) should also be in .gitignore.

What override.tf is not for:

Permanent configuration differences between environments - use tfvars files for those
Hiding sensitive configuration from code review - all permanent config must be committed
Switching between fundamentally different architectures - that is a different workspace, not an override

Rule: If you find yourself writing an override.tf that is more than 20-30 lines, it is a sign the committed configuration has too many environment-specific assumptions baked in. Fix the parameterisation in the committed files instead.

Variables

Naming

Use snake_case for all names. (HashiCorp style guide )
Use descriptive nouns: storage_account_name, not sa or name.
Do not repeat the resource type when the module only manages one resource type: name not logic_app_name in a single-resource module. Qualify when there are multiple types: storage_account_name vs key_vault_name.
Booleans: prefix with enable_, create_, or is_: enable_https_only, create_public_ip, is_zone_redundant.

Declaration order and required fields

Every variable must have description and type. Omit default only for genuinely required inputs.

HCL

# Required input - no default
variable "name" {
  description = "The name of the Logic App Standard instance."
  type        = string
}
 
variable "location" {
  description = "Azure region in which to deploy all resources. Example: uksouth."
  type        = string
}
 
# Optional input - sensible default provided
variable "https_only" {
  description = "When true, only HTTPS connections are accepted. Strongly recommended."
  type        = bool
  default     = true
}
 
variable "tags" {
  description = "A map of tags to assign to all resources created by this module."
  type        = map(string)
  default     = {}
}

Rule: Include type and description for every variable without exception. (HashiCorp style guide - Variables )

Validation blocks

Add validation blocks to catch bad inputs at plan time, before any API call.

HCL

variable "environment" {
  description = "Short environment code used in resource naming."
  type        = string
 
  validation {
    condition     = contains(["dev", "tst", "uat", "ppd", "prd"], var.environment)
    error_message = "environment must be one of: dev, tst, uat, ppd, prd."
  }
}
 
variable "min_tls_version" {
  description = "Minimum TLS version for the Logic App. Must be 1.2 or 1.3."
  type        = string
  default     = "1.2"
 
  validation {
    condition     = contains(["1.2", "1.3"], var.min_tls_version)
    error_message = "min_tls_version must be 1.2 or 1.3."
  }
}
 
variable "sku_name" {
  description = "App Service Plan SKU for the Logic App Standard."
  type        = string
 
  validation {
    condition     = can(regex("^(WS[123]|EP[123]|Y1)$", var.sku_name))
    error_message = "sku_name must be a valid Logic App SKU: WS1, WS2, WS3, EP1, EP2, EP3, or Y1."
  }
}

The `list(object)` pattern for multi-resource modules

When a module provisions multiple instances of the same logical resource, accept a single list(object({...})) variable rather than one flat variable per attribute. This keeps the module interface stable as new optional attributes are added without breaking existing callers.

HCL

variable "logic_apps" {
  description = "List of Logic App Standard instances to create."
  type = list(object({
    # Required
    name                       = string
    sku_name                   = string
    storage_account_name       = string
    storage_account_access_key = string
 
    # Optional - all have defaults declared here
    enabled       = optional(bool, true)
    https_only    = optional(bool, true)
    identity_type = optional(string, "SystemAssigned")
    identity_ids  = optional(list(string), [])
    app_settings  = optional(map(string), {})
    tags          = optional(map(string), {})
 
    site_config = optional(object({
      always_on       = optional(bool, false)
      http2_enabled   = optional(bool, false)
      min_tls_version = optional(string, "1.2")
      ftps_state      = optional(string, "Disabled")
 
      ip_restriction = optional(list(object({
        name                      = string
        action                    = string
        priority                  = number
        ip_address                = optional(string)
        service_tag               = optional(string)
        virtual_network_subnet_id = optional(string)
      })), [])
    }), {})
  }))
 
  # Validate nested attributes of every element. This is what the
  # `expect_failures = [var.logic_apps]` test below asserts against.
  validation {
    condition = alltrue([
      for app in var.logic_apps : can(regex("^(WS[123]|EP[123]|Y1)$", app.sku_name))
    ])
    error_message = "Each logic_apps[*].sku_name must be a valid Logic App SKU: WS1, WS2, WS3, EP1, EP2, EP3, or Y1."
  }
}

Rules for optional():

Always provide a sensible default as the second argument. optional(bool) produces null on omission, which forces null-checks everywhere the value is used. optional(bool, true) produces true directly - no null-check needed.
Use optional(list(...), []) for nested lists. An empty list means no entries, which a dynamic block consumes cleanly via for_each.
Use optional(object({...}), {}) for optional nested objects. Terraform constructs the object from each attribute’s own optional() default when the caller omits the block entirely.
A new optional() attribute added to an existing object type is non-breaking - existing callers that don’t provide it receive the declared default. This is the main advantage of the pattern over flat variables.

Locals

Locals derive computed values that would otherwise be repeated. They are not a replacement for variables (which have a type, a description, and can be overridden by callers).

HCL

# locals.tf (or inline in main.tf if small)
locals {
  # Convert the list to a map keyed by name - the canonical for_each input
  logic_app_map = { for app in var.logic_apps : app.name => app }
 
  # Merge module-managed labels into every resource's tags
  common_tags = merge(var.tags, {
    managed-by = "terraform"
    module     = "terraform-azurerm-logic-app"
  })
}

Rule: Avoid over-using locals. Write values inline if they are used only once. A local that wraps a single variable reference (local.name = var.name) adds no value and adds an indirection layer for readers. (HashiCorp style guide - Locals )

If the locals block exceeds ~10 entries, move it to locals.tf. Do not put locals in main.tf.

Resources

Resource label naming

A resource label is the Terraform-internal identifier - not the cloud resource name. Use a noun that describes the role.

HCL

# ✅ Single resource of this type - use "this"
resource "azurerm_logic_app_standard" "this" { ... }
resource "azurerm_service_plan" "this" { ... }
 
# ✅ Multiple resources of the same type in the same module - qualify by role
resource "azurerm_role_assignment" "storage_contributor" { ... }
resource "azurerm_role_assignment" "keyvault_reader" { ... }
 
# ❌ Do not echo the resource type in the label
resource "azurerm_logic_app_standard" "logic_app_standard" { ... }
resource "azurerm_service_plan" "azurerm_service_plan" { ... }

When a module manages exactly one resource of a given type, name it "this". (HashiCorp style guide )

`for_each` over `count` for named resources

for_each produces stable resource addresses keyed by a meaningful string. count produces integer-indexed addresses that shift when items are inserted or removed mid-list, causing unexpected destroy-and-recreate cycles.

HCL

# ✅ for_each - removing "app-b" does not affect "app-a" or "app-c"
resource "azurerm_logic_app_standard" "this" {
  for_each = local.logic_app_map   # map keyed by app name
 
  name                = each.key
  resource_group_name = var.rg_name
  location            = var.location
}
 
# ❌ count - removing index 1 shifts index 2 → 1, triggering a replace of app-c
resource "azurerm_logic_app_standard" "this" {
  count = length(var.logic_apps)
  name  = var.logic_apps[count.index].name
}

The canonical list → map conversion:

HCL

locals {
  logic_app_map = { for app in var.logic_apps : app.name => app }
}
 
resource "azurerm_logic_app_standard" "this" {
  for_each = local.logic_app_map
  # Reference values via each.key (name) and each.value (the full object)
}

Use count only for genuinely boolean resource existence: “create this resource or not”. Use for_each for any named collection. (HashiCorp style guide )

Argument ordering within a resource block

HCL

resource "azurerm_logic_app_standard" "this" {
  # 1. Meta-arguments first, separated by a blank line from the rest
  for_each = local.logic_app_map
 
  # 2. Required arguments
  name                       = each.key
  location                   = var.location
  resource_group_name        = var.rg_name
  app_service_plan_id        = azurerm_service_plan.this[each.key].id
  storage_account_name       = each.value.storage_account_name
  storage_account_access_key = each.value.storage_account_access_key
 
  # 3. Optional arguments
  enabled    = each.value.enabled
  https_only = each.value.https_only
  tags       = each.value.tags
 
  # 4. Nested dynamic blocks
  dynamic "site_config" { ... }
  dynamic "identity" { ... }
 
  # 5. Meta-argument blocks last, separated by blank line
  lifecycle {
    ignore_changes = [app_settings["WEBSITE_RUN_FROM_PACKAGE"]]
  }
}

Avoid redundant null checks

When optional() is declared with a default, the value is never null at resource evaluation time. Do not null-check it:

HCL

# ❌ Redundant - optional(bool, true) cannot be null
https_only = each.value.https_only != null ? each.value.https_only : true
 
# ✅ Direct - the default is already enforced in the variable declaration
https_only = each.value.https_only

Only null-check attributes declared with bare optional(type) (no default) if you need distinct behaviour between null and the zero value (e.g. null vs false for a bool that the API treats differently).

Dynamic Blocks

Use dynamic blocks for optional nested configuration, not for unconditionally repeating identical structure.

HCL

# ✅ site_config is optional - skip the block entirely when not provided
dynamic "site_config" {
  for_each = each.value.site_config != null ? [each.value.site_config] : []
  content {
    always_on       = site_config.value.always_on
    http2_enabled   = site_config.value.http2_enabled
    min_tls_version = site_config.value.min_tls_version
    ftps_state      = site_config.value.ftps_state
 
    # Nested dynamic for a repeated sub-block
    dynamic "ip_restriction" {
      for_each = site_config.value.ip_restriction
      content {
        name                      = ip_restriction.value.name
        action                    = ip_restriction.value.action
        priority                  = ip_restriction.value.priority
        ip_address                = ip_restriction.value.ip_address
        service_tag               = ip_restriction.value.service_tag
        virtual_network_subnet_id = ip_restriction.value.virtual_network_subnet_id
      }
    }
  }
}

Identity block consolidation

A common anti-pattern uses three separate dynamic "identity" blocks gated on the identity type string. Consolidate into a single block:

HCL

# ❌ Three separate dynamic blocks - verbose, harder to read, harder to extend
dynamic "identity" {
  for_each = each.value.identity_type == "SystemAssigned" ? [1] : []
  content { type = "SystemAssigned" }
}
dynamic "identity" {
  for_each = each.value.identity_type == "UserAssigned" ? [1] : []
  content {
    type         = "UserAssigned"
    identity_ids = each.value.identity_ids
  }
}
dynamic "identity" {
  for_each = each.value.identity_type == "SystemAssigned, UserAssigned" ? [1] : []
  content {
    type         = "SystemAssigned, UserAssigned"
    identity_ids = each.value.identity_ids
  }
}
 
# ✅ Single block - identity_type drives all variation
dynamic "identity" {
  for_each = each.value.identity_type != null ? [each.value.identity_type] : []
  content {
    type = identity.value
    identity_ids = contains(
      ["UserAssigned", "SystemAssigned, UserAssigned"],
      identity.value
    ) ? each.value.identity_ids : []
  }
}

Rule: Never repeat the same dynamic block pattern multiple times in the same resource unless the provider genuinely defines distinct block types with different schemas. If you are doing this, the variable type probably needs consolidating with optional() defaults.

Outputs

Declaration rules

HCL

# ✅ Map output preserving the for_each key structure
output "logic_app_ids" {
  description = "Map of Logic App name to resource ID."
  value       = { for k, v in azurerm_logic_app_standard.this : k => v.id }
}
 
output "logic_app_identities" {
  description = "Map of Logic App name to managed identity block (object_id, tenant_id, etc.)."
  value       = { for k, v in azurerm_logic_app_standard.this : k => v.identity }
}
 
# Credentials must be marked sensitive
output "logic_app_site_credentials" {
  description = "Map of Logic App name to site-level publishing credentials."
  sensitive   = true
  value       = { for k, v in azurerm_logic_app_standard.this : k => v.site_credential }
}

Rules:

Include description for every output. (HashiCorp style guide )
Mark sensitive = true for any output containing credentials, private keys, connection strings, or SAS tokens. Sensitive outputs are still persisted in state - sensitive only suppresses CLI display.
Output maps keyed by resource name when the module creates multiple resources via for_each. The output shape mirrors the input shape.
Do not re-output values the caller already passed in. If the caller provided name, there is no need to output it.

What to expose

Output everything a caller may reasonably need to chain into another resource:

Resource IDs (for role assignments, diagnostic settings, references)
Principal / object IDs (for role assignments to managed identities)
Hostnames, endpoints, FQDNs
Private endpoint IP addresses
Generated names (when the module generates the name internally)

Check Blocks

check blocks run assertions after every apply and plan. A failing check emits a warning but does not abort the apply - making them appropriate for invariants that depend on real runtime state. (HashiCorp - check blocks )

HCL

# Verify the deployed Logic App health endpoint responds
check "logic_app_healthy" {
  data "http" "health" {
    url = "https://${azurerm_logic_app_standard.this["my-app"].default_hostname}/api/health"
  }
 
  assert {
    condition     = data.http.health.status_code == 200
    error_message = "Logic App health endpoint returned ${data.http.health.status_code}, expected 200."
  }
}
 
# Warn when a certificate is within 30 days of expiry
check "cert_not_expiring" {
  assert {
    condition     = timecmp(azurerm_app_service_certificate.this.expiration_date, timeadd(timestamp(), "720h")) > 0
    error_message = "TLS certificate expires within 30 days - renew immediately."
  }
}
 
# Confirm a DNS record resolves to the expected value after provisioning
check "dns_resolves" {
  data "dns_a_record_set" "app" {
    host = "myapp.example.com"
  }
 
  assert {
    condition     = contains(data.dns_a_record_set.app.addrs, azurerm_public_ip.this.ip_address)
    error_message = "DNS A record for myapp.example.com does not resolve to ${azurerm_public_ip.this.ip_address}."
  }
}

Use check for:

HTTP health endpoints post-deploy
Certificate expiry warnings
DNS propagation confirmation
External dependency availability

Use lifecycle { precondition {} } instead when a failing condition must abort the apply, not just warn. Preconditions run before the resource is created/updated; postconditions run after.

HCL

resource "azurerm_logic_app_standard" "this" {
  for_each = local.logic_app_map
 
  lifecycle {
    precondition {
      condition     = each.value.https_only == true
      error_message = "Logic App '${each.key}' must have https_only = true in production."
    }
 
    postcondition {
      condition     = self.enabled == true
      error_message = "Logic App '${each.key}' was not enabled after creation."
    }
  }
}

State Management

Remote backend

Configure a remote backend in backend.tf. Never commit a local terraform.tfstate file to version control.

HCL

# backend.tf
terraform {
  backend "azurerm" {
    resource_group_name  = "rg-tfstate-uks-prd"
    storage_account_name = "satfstateldouksprd01"
    container_name       = "tfstate"
    key                  = "myapp/prod/terraform.tfstate"
    use_azuread_auth     = true   # Prefer Entra ID auth - no storage key in CI
  }
}

Rule: Use use_azuread_auth = true and assign the CI/CD workload identity the Storage Blob Data Contributor role on the container. Never use storage account access keys in pipelines.

One state file per environment per application

PLAINTEXT

tfstate/
├── platform/prod/terraform.tfstate       # Shared networking, DNS, platform RBAC
├── platform/dev/terraform.tfstate
├── myapp/prod/terraform.tfstate          # Application stack - production
├── myapp/staging/terraform.tfstate
└── myapp/dev/terraform.tfstate

The blast radius of a plan or apply must be a single application in a single environment. Never put multiple unrelated applications in one state file.

State inspection (read-only - acceptable)

These commands are safe and do not modify state:

Bash

terraform state list                                          # list all managed resources
terraform state show 'azurerm_logic_app_standard.this["logic-ldo-uks-prd-01"]'  # inspect one resource
terraform plan                                               # always the first diagnostic step

State surgery - strongly discouraged

Direct state manipulation (terraform state mv, terraform state rm, terraform state push, terraform force-unlock) is a last resort. It bypasses Terraform’s dependency graph, produces state that can diverge from reality, and leaves no reviewable audit trail. Mistakes can trigger unexpected destroys on the next plan.

Before reaching for state commands, exhaust the declarative alternatives:

Goal	Declarative approach	State surgery (avoid)
Rename a resource address	`moved {}` block in `moved.tf`	`terraform state mv`
Remove a resource Terraform no longer manages	`removed {}` block (Terraform 1.7+)	`terraform state rm`
Import an unmanaged resource	`import {}` block (Terraform 1.5+) or `terraform import`	Manual state edit
Split a state file	Redesign workspace boundaries; see HCP Stacks below	`terraform state pull` / `push`

HCL

# moved.tf - rename a resource address without destroying it
moved {
  from = azurerm_logic_app_standard.logic_app["logic-ldo-uks-prd-01"]
  to   = azurerm_logic_app_standard.this["logic-ldo-uks-prd-01"]
}
 
# import block (Terraform 1.5+) - bring an existing resource under management
import {
  to = azurerm_logic_app_standard.this["logic-ldo-uks-prd-01"]
  id = "/subscriptions/.../resourceGroups/.../providers/Microsoft.Web/sites/logic-ldo-uks-prd-01"
}

Both moved {} and import {} blocks are reviewed in pull requests, leave an audit trail in git history, and are applied idempotently - applying the same block twice is safe.

After any import {} block or terraform import command, always run terraform plan and confirm zero drift before committing the configuration. (HashiCorp - moved blocks )

State separation and HCP Terraform / Stacks

If you are reaching for state surgery because a monolithic state file has grown too large or you need to split resources across teams, the problem is workspace design - not state manipulation.

Workspace redesign (always first)

One state file per application component per environment. Resources that change together belong together; resources with different owners or blast radii belong in separate workspaces. This is the simplest and most portable solution.

HCP Terraform (if you have an HCP subscription)

HCP Terraform is HashiCorp’s managed Terraform platform. It provides:

Remote state - state stored in HCP, encrypted at rest, with rollback and audit logs
Runs - VCS-triggered plans and applies with cost estimation, policy enforcement (Sentinel), and approval gates
Variable scope - organisation, project, and workspace-level variables with sensitive value masking
Modules registry - publish and version your modules, with automatic version constraints
Team management - role-based access control (RBAC), team overrides, and audit logging
Run tasks - policy enforcement at plan-time (e.g. cost threshold checks, compliance validation)

HCP Terraform is not required to use Terraform, but it is strongly recommended for production. It eliminates local state management, provides a single source of truth, and enforces a review-and-approve workflow.

Terraform Stacks (GA - HCP Terraform feature)

Terraform Stacks is an HCP Terraform feature (requires HCP subscription) that composes multiple configurations into a single deployable unit. Each component maintains its own state file and applies independently while sharing outputs declaratively through the Stack deployment configuration.

When to use Stacks:

Your infrastructure consists of multiple loosely-coupled components (e.g. networking, platform, observability) that should deploy independently but coordinate via outputs
Teams own different components and need independent apply workflows but shared variable management
You need environment-specific composition (e.g. prod uses high-availability components, dev uses single-instance variants)
You want to avoid terraform_remote_state data source coupling, which creates invisible dependencies
You have upstream and downstream Stacks that need to coordinate (e.g. a networking stack feeds VPC IDs to application stacks)

Stacks architecture:

A Stack is a directory of *.tfcomponent.hcl files (the stack configuration - providers, variables, components, outputs) plus one or more *.tfdeploy.hcl files (the deployments, one per environment). All .tfcomponent.hcl files merge into a single configuration and all .tfdeploy.hcl files merge into a single deployment file. Each component sources an ordinary Terraform module (registry or local).

PLAINTEXT

infrastructure-stack/
├── providers.tfcomponent.hcl     # required_providers + provider config (OIDC)
├── variables.tfcomponent.hcl     # stack-level variable declarations
├── components.tfcomponent.hcl    # component blocks (each sources a module)
├── outputs.tfcomponent.hcl       # stack outputs (type is mandatory)
├── deployments.tfdeploy.hcl      # deployment blocks: production, staging, ...
└── modules/                      # the modules the components reference
    ├── networking/   # main.tf, variables.tf, outputs.tf
    └── platform/     # main.tf, variables.tf, outputs.tf

The GA file extensions are .tfcomponent.hcl (configuration) and .tfdeploy.hcl (deployments). The beta .tfstack.hcl / .deployment.hcl extensions are no longer used.

Stack configuration (.tfcomponent.hcl):

The configuration declares required_providers, the stack variables, the provider blocks, the components (each sourcing a module), and the stack outputs. Split across files for readability - they all merge into one configuration. There is no terraform {} block, and every output requires an explicit type.

HCL

# providers.tfcomponent.hcl
required_providers {
  azurerm = {
    source  = "hashicorp/azurerm"
    version = "~> 4.0"
  }
}
 
# OIDC token is injected per-deployment (see deployments.tfdeploy.hcl) and is
# marked ephemeral so it is never written to state.
variable "identity_token" {
  type      = string
  ephemeral = true
}
variable "azure_client_id"       { type = string }
variable "azure_tenant_id"       { type = string }
variable "azure_subscription_id" { type = string }
 
provider "azurerm" "this" {
  config {
    features {}
    use_cli         = false
    use_oidc        = true
    oidc_token      = var.identity_token
    client_id       = var.azure_client_id
    tenant_id       = var.azure_tenant_id
    subscription_id = var.azure_subscription_id
  }
}

HCL

# variables.tfcomponent.hcl
variable "environment" { type = string }
variable "location"    { type = string }
variable "cidr_block"  { type = string }

HCL

# components.tfcomponent.hcl - each component sources a module and is handed the provider
component "networking" {
  source = "./modules/networking"
 
  inputs = {
    environment = var.environment
    location    = var.location
    cidr_block  = var.cidr_block
  }
 
  providers = {
    azurerm = provider.azurerm.this
  }
}

HCL

# outputs.tfcomponent.hcl - reference component outputs as component.<name>.<output> (no .outputs)
output "vnet_id" {
  type  = string
  value = component.networking.vnet_id
}
 
output "subnet_ids" {
  type  = map(string)
  value = component.networking.subnet_ids
}

Deployments (.tfdeploy.hcl):

Deployments are concrete instances of the stack (one per environment), each supplying input values. Azure auth is OIDC: an identity_token is exchanged for short-lived credentials and store blocks pull the ARM identifiers from an HCP Terraform variable set - no client secret is stored. A deployment provides inputs to the whole stack; it does not point at sub-stacks via a path.

HCL

# deployments.tfdeploy.hcl
identity_token "azurerm" {
  audience = ["api://AzureADTokenExchange"]
}
 
# Pull ARM_* identifiers from an HCP Terraform variable set
store "varset" "azure" {
  id       = "varset-xxxxxxxxxxxxxxxx"
  category = "env"
}
 
deployment "production" {
  inputs = {
    environment = "production"
    location    = "uksouth"
    cidr_block  = "10.0.0.0/16"
 
    # OIDC: short-lived, never persisted to state
    identity_token        = identity_token.azurerm.jwt
    azure_client_id       = store.varset.azure.ARM_CLIENT_ID
    azure_tenant_id       = store.varset.azure.ARM_TENANT_ID
    azure_subscription_id = store.varset.azure.ARM_SUBSCRIPTION_ID
  }
}
 
deployment "staging" {
  inputs = {
    environment = "staging"
    location    = "ukwest"
    cidr_block  = "10.1.0.0/16"
 
    identity_token        = identity_token.azurerm.jwt
    azure_client_id       = store.varset.azure.ARM_CLIENT_ID
    azure_tenant_id       = store.varset.azure.ARM_TENANT_ID
    azure_subscription_id = store.varset.azure.ARM_SUBSCRIPTION_ID
  }
}

The Azure AD application (or user-assigned managed identity) needs a federated credential whose subject trusts the HCP Terraform organisation/project/stack/deployment, with audience api://AzureADTokenExchange.

Cross-stack composition (publish_output / upstream_input):

A stack exposes values to other stacks with publish_output in its .tfdeploy.hcl, and consumes another stack’s published outputs with upstream_input. This replaces terraform_remote_state, makes the dependency explicit, and lets HCP Terraform re-run downstream stacks automatically when an upstream output changes.

HCL

# networking stack - deployments.tfdeploy.hcl: publish outputs for downstream stacks
publish_output "vnet_id" {
  value = deployment.production.vnet_id
}
 
publish_output "subnet_ids" {
  value = deployment.production.subnet_ids
}

HCL

# application stack - deployments.tfdeploy.hcl: consume the networking stack
upstream_input "networking" {
  type   = "stack"
  source = "app.terraform.io/myorg/myproject/networking"
}
 
deployment "production" {
  inputs = {
    environment = "production"
    vnet_id     = upstream_input.networking.vnet_id
    subnet_ids  = upstream_input.networking.subnet_ids
 
    identity_token        = identity_token.azurerm.jwt
    azure_client_id       = store.varset.azure.ARM_CLIENT_ID
    azure_tenant_id       = store.varset.azure.ARM_TENANT_ID
    azure_subscription_id = store.varset.azure.ARM_SUBSCRIPTION_ID
  }
}

Example scenario: a networking Stack publishes VNet and subnet IDs. An application Stack (downstream) consumes them via upstream_input and passes them to components that deploy Logic Apps or App Service Plans into that VNet. When the networking Stack is updated, HCP Terraform automatically triggers the application Stack run to incorporate the changes.

Stacks best practices:

Each component is independently deployable and testable; reference component outputs within the stack as component.<name>.<output> (there is no .outputs indirection in GA).
Authenticate with OIDC via an identity_token; mark the token variable as ephemeral = true so it never lands in state, and pull ARM identifiers from a store "varset" rather than hardcoding them.
Every Stacks output needs an explicit type, and the configuration has no terraform {} block - required_providers is declared at the top level of a .tfcomponent.hcl file.
Use publish_output / upstream_input for cross-stack data instead of terraform_remote_state; keep downstream stacks loose by depending on the shape of the data, not on which upstream provides it.
Limits: a stack can reference up to 20 upstream stacks and expose to up to 25 downstream stacks, and all related stacks must live in the same HCP Terraform project.
Plan and apply at the stack level; HCP Terraform orchestrates the per-component operations and re-runs downstream stacks when upstream outputs change.

Comparison: Stacks vs alternatives

Approach	State files	Inter-component coupling	Blast radius	Workflow
Monolithic (bad)	1 large file	Tight - all resources depend on each other	Entire application	Single apply affects everything
Separate workspaces + `terraform_remote_state`	Multiple, but implicit coupling	Implicit (data source dependency not visible in code)	Per-workspace, but downstream consumers affected	Manual coordination between applies
Terraform Stacks	Multiple (one per component), explicit composition	Explicit (stack config shows all inputs/outputs)	Per-component, with clear dependency graph	Orchestrated apply across components

`terraform_remote_state` (if not using Stacks)

For referencing outputs from one workspace in another without the full composition of Stacks. Use sparingly - it creates an implicit dependency between workspaces that is not visible in either codebase.

HCL

data "terraform_remote_state" "networking" {
  backend = "remote"
  config = {
    organization = var.tfe_organization
    workspaces = {
      name = "networking-${var.environment}"
    }
  }
}
 
resource "azurerm_app_service_plan" "this" {
  name = "plan-${var.environment}"
  # ... reference the remote output
  tags = {
    networking_workspace = data.terraform_remote_state.networking.outputs.workspace_id
  }
}

Never use state surgery

Never use terraform state pull + edit + terraform state push to split or merge state files manually. This approach is error-prone, un-reviewable, and will eventually cause a state conflict or an accidental destroy. If state separation is unavoidable, fix it via workspace redesign or use HCP Terraform / Stacks instead.

Testing

Native terraform test (Terraform 1.6+)

Write .tftest.hcl files alongside your module. These test plan-time logic without requiring real infrastructure for unit-level checks. (HashiCorp - terraform test )

HCL

# tests/defaults.tftest.hcl
 
variables {
  location = "uksouth"
  rg_name  = "rg-test"
  tags     = {}
  logic_apps = [{
    name                       = "logic-test-01"
    sku_name                   = "WS1"
    storage_account_name       = "satestldoukstst01"
    storage_account_access_key = "placeholder"
  }]
}
 
# Plan-only - no real resources created
run "defaults_are_applied" {
  command = plan
 
  assert {
    condition     = azurerm_logic_app_standard.this["logic-test-01"].https_only == true
    error_message = "https_only should default to true."
  }
 
  assert {
    condition     = azurerm_logic_app_standard.this["logic-test-01"].enabled == true
    error_message = "enabled should default to true."
  }
}
 
run "invalid_sku_rejected" {
  command = plan
 
  variables {
    logic_apps = [{
      name                       = "logic-bad-sku"
      sku_name                   = "INVALID"
      storage_account_name       = "satestldoukstst01"
      storage_account_access_key = "placeholder"
    }]
  }
 
  expect_failures = [var.logic_apps]
}

Bash

terraform test                              # Run all .tftest.hcl files
terraform test -filter=defaults_are_applied # Run one named test

Terratest (Go - integration testing)

For tests that deploy real infrastructure, use Terratest. Place tests in a test/ directory at the repo root. (Terratest - getting started )

// test/logic_app_test.go
package test
 
import (
    "testing"
    "github.com/gruntwork-io/terratest/modules/terraform"
    "github.com/stretchr/testify/assert"
)
 
func TestLogicAppModuleComplete(t *testing.T) {
    t.Parallel()
 
    opts := &terraform.Options{
        TerraformDir: "../examples/complete",
        Vars: map[string]interface{}{
            "location": "uksouth",
        },
    }
 
    // Always defer destroy before apply - ensures cleanup on test failure
    defer terraform.Destroy(t, opts)
    terraform.InitAndApply(t, opts)
 
    ids := terraform.OutputMap(t, opts, "logic_app_ids")
    assert.NotEmpty(t, ids, "logic_app_ids output should not be empty")
    assert.Contains(t, ids, "logic-test-complete")
}

Testing strategy

Test type	Tool	Scope	When to run
Format check	`terraform fmt -check -recursive`	Style	Every commit
Static analysis	`tflint`	Correctness	Every commit
Security scan	`trivy`, `checkov`	Misconfigurations	Every commit
Unit / plan test	`terraform test`	Module logic, no real infra	Every commit
Integration test	Terratest	Full deploy + assertions	PR merge, nightly
Compliance	`terraform-compliance`	Naming / tagging policy	PR merge

Security & Compliance

Treat the state backend as a secret store

Terraform stores every managed attribute in state as plaintext - including generated passwords, connection strings, and any value an API returns. Anyone who can read the state file can read those secrets.

Restrict the backend container to the apply identity and break-glass admins only (Storage Blob Data Contributor via use_azuread_auth = true, never an access key).
Enable blob versioning and soft delete on the state container, and rely on Azure Storage encryption at rest (add a customer-managed key where the data classification requires it).
Never terraform state pull onto a workstation or paste state into a ticket - that exfiltrates every secret in it.
Prefer ephemeral resources and write-only arguments (see Ephemeral Values) so high-value secrets never land in state at all.

Rule: The state file is as sensitive as the most sensitive secret it contains. Lock down backend RBAC to the smallest possible set, keep versioning and soft delete on, and keep secrets out of state with write-only arguments wherever the provider supports them.

Commit the dependency lock file

.terraform.lock.hcl pins provider versions and their checksums - the supply-chain control that makes terraform init verify it is downloading the exact providers you reviewed. Commit it, and generate hashes for every platform your developers and CI run on, or init fails on the platform whose hash is missing:

Bash

terraform providers lock \
  -platform=linux_amd64 \
  -platform=darwin_arm64 \
  -platform=windows_amd64

Rule: Commit .terraform.lock.hcl with hashes for every platform in use. CI runs plain terraform init, which fails closed if the lock and configuration disagree; never run terraform init -upgrade in an automated pipeline - bump providers deliberately in a reviewed PR.

Static analysis and policy-as-code gate the pipeline

The pipeline already runs tflint and a Trivy config scan (see Pipelines). Treat those findings, and any organisational policy, as merge-blocking:

Misconfiguration scanning - Trivy / Checkov for insecure defaults (public network access, unencrypted storage, permissive NSGs). Fail the build on HIGH/CRITICAL.
Policy-as-code - Conftest/OPA against the plan JSON for org rules (required tags, allowed regions, naming), backed by Azure Policy at the platform so drift created outside Terraform is caught too.
Least-privilege apply identity - the CI workload identity authenticates with OIDC (no stored secret) and holds only the roles its stack needs, scoped to its resource group or subscription, never tenant-wide Owner.

Rule: Security scanning and policy checks run before plan and fail closed. A HIGH/CRITICAL misconfiguration or a policy violation blocks the merge - it is never a warning to be merged past.

Code Review & Merge Gates

Infrastructure changes reach main through a pull request, never a direct push. The branch protection rules, the CODEOWNERS file, and a mandatory plan are what make the workflow production-ready: every change is reviewed by an owner of that code and every reviewer can see exactly what will change in Azure before approving.

Terraform pull-request workflow. A change in the protected main repo is raised as a pull request from a feature branch. The pull request triggers the required status-check gates: CI checks (fmt, validate, tflint, scan) followed by terraform plan, whose output is posted to the pull request. The reviewed plan reaches a CODEOWNER approval gate that also requires all checks to be green. On approval the branch is squash-merged to main, which runs terraform apply against the reviewed plan and provisions the Azure subscription. If a reviewer requests changes, the flow loops back to the pull request.

Branch protection - the non-negotiable gates

Configure these on main (GitHub: Settings > Branches > branch protection rule, or a ruleset; Azure DevOps: branch policies):

Require a pull request before merging - no direct pushes to main, including for administrators.
Require at least 1 approving review from a code owner - tick Require review from Code Owners. One CODEOWNER approval is the floor; high-blast-radius repos (platform, networking, identity) should require 2.
Require status checks to pass - fmt-check, validate, tflint, security-scan, and crucially plan must all be green before the merge button unlocks.
Require branches to be up to date before merging - forces a re-plan against the current main, so an approval cannot be applied on top of a stale base.
Dismiss stale approvals when new commits are pushed - a re-review is required if the author changes the code after approval.
Require conversation resolution - no unresolved review threads at merge.
Require linear history / squash merge - keep main history clean and each change atomic.

Rule: Every PR must produce a successful terraform plan as a required status check, and the plan output must be posted to the PR. Reviewers approve a specific plan, not just a diff of HCL. A PR that cannot plan cannot merge.

CODEOWNERS

CODEOWNERS maps paths to the teams or individuals that must approve changes to them. Combined with Require review from Code Owners, it guarantees the right people gate the right code. Place it at .github/CODEOWNERS (GitHub) or the repo root.

Bash

# .github/CODEOWNERS
# Syntax: <path pattern>   <owner> [<owner> ...]
# Later matches win, so order from general to specific.
 
# Default owner for everything in the repo
*                       @libre-devops/platform-engineering
 
# Reusable modules - require a module maintainer
/modules/               @libre-devops/terraform-maintainers
 
# Environment roots - require the owning team per environment
/env/prod/              @libre-devops/platform-leads
/env/dev/               @libre-devops/platform-engineering
 
# Pipeline and policy definitions - tightly held
/.github/workflows/     @libre-devops/platform-leads
/policy/                @libre-devops/security
 
# A single sensitive file can have its own owner
backend.tf              @libre-devops/platform-leads

Rules that keep CODEOWNERS effective:

Owners must be teams, not individuals, wherever possible - a team survives someone leaving; a named person becomes a bottleneck and a single point of failure.
Every owning team needs write access to the repo, or its approval will not count.
The most specific matching line wins, so list general patterns first and tighten downward.
Protect the protection - make CODEOWNERS, /.github/workflows/, and backend.tf owned by a restricted team so the gates themselves cannot be quietly weakened in a PR.

Production-ready PR workflow

End to end, a change moves like this:

Branch off main (feat/add-storage-account). Never commit to main directly.
Open a draft PR early. On PR open, CI runs fmt-check > validate > tflint > security-scan and a speculative plan (a plan that is never applied). The plan output is posted back as a PR comment so reviewers and the author see the proposed Azure changes inline.
Iterate. Each push re-runs the checks and refreshes the plan comment. Pushing new commits dismisses any stale approval.
Ready for review. Mark the PR ready; CODEOWNERS automatically requests the owning team(s). At least one code owner reviews the HCL and the rendered plan.
Gates must be green. Required status checks (including plan) pass, the branch is up to date with main, conversations are resolved, and at least one CODEOWNER has approved. Only then does merge unlock.
Merge (squash) to main.
Apply. The merge to main triggers the apply job, which downloads the exact plan artifact that was reviewed and runs terraform apply tfplan behind an environment approval gate for production. The plan that was reviewed is the plan that is applied - never a re-plan. See Pipelines for the job definitions.

Rule: The reviewed plan is the applied plan. The apply job consumes the saved plan artifact from the PR pipeline; it must not run a fresh terraform plan. Re-planning on apply discards the human review and can apply changes nobody approved.

TACOS and HCP Terraform - native git-driven workflows

The gates above can be assembled from raw GitHub/Azure DevOps branch policies plus pipeline steps, but the TACOS category (Terraform Automation and Collaboration Software) implements this VCS-driven model natively. Commercial platforms with free tiers include HCP Terraform/Terraform Cloud, Spacelift, env0, and Scalr. Fully open-source / self-hosted options that run inside your own CI or cluster include:

Atlantis - the original open-source Terraform PR automation. Listens on webhooks, runs plan on PR, posts the output as a comment, and applies on a atlantis apply comment after approval. Self-hosted, MPL-2.0.
Digger - open-source; runs the plan/apply inside your existing GitHub Actions / Azure DevOps / GitLab CI runners (no separate compute) and orchestrates locking and PR comments. Apache-2.0.
Terrateam - open-source GitHub-native workflow engine (plan on PR, apply on comment, OPA policies, layered environments). Mozilla-2.0, self-hostable.
Terramate - open-source orchestration and code generation that layers change detection and ordering onto your own pipeline.

Whichever you pick, the native model is the same:

Speculative plans on every PR, posted back to the PR as a status check and comment, with no custom pipeline glue.
Policy-as-code gates (Sentinel or OPA) evaluated against the plan before apply, as a first-class run stage.
Apply gated on merge to the tracked branch, with the run reusing the reviewed plan and adding its own approval step, remote state, run locking, and a full audit log.
Drift detection and scheduled health checks out of the box.

Rule: If you use HCP Terraform or a TACOS, lean on its native VCS integration for the plan-on-PR and gated-apply flow rather than rebuilding it in pipeline YAML. Anything those platforms do not cover natively - bespoke approval routing, custom policy engines, cross-tool orchestration - requires custom automation, and that custom glue is where most workflow bugs and security gaps appear. Keep it minimal and own it deliberately.

Pipelines

Standard CI/CD stage order

PLAINTEXT

fmt-check → validate → tflint → security-scan → test → plan → [approval gate] → apply

Never apply without a prior plan artifact. Never re-plan inside the apply job.

GitHub Actions reference

GitHub Actions is the worked example below, but the pipeline shape is the target, not the tool. Azure DevOps (YAML pipelines with a Workload Identity Federation service connection), GitLab CI (id_tokens for OIDC), and any other runner implement the identical flow: validate and scan, then plan to an artifact, then a gated apply of that same artifact. Only the YAML syntax and the OIDC wiring change.

YAML

name: Terraform
 
on:
  push:
    branches: [main]
  pull_request:
 
permissions:
  id-token: write   # Required for OIDC
  contents: read
 
jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
 
      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: "~1.9"
 
      - name: Format check
        run: terraform fmt -check -recursive
 
      - name: Init (no backend)
        run: terraform init -backend=false
 
      - name: Validate
        run: terraform validate
 
      - name: TFLint
        uses: terraform-linters/setup-tflint@v4
      - run: tflint --recursive
 
      - name: Trivy security scan
        # Pin third-party actions to a full commit SHA, never a branch or tag.
        uses: aquasecurity/trivy-action@ed142fd0673e97e23eac54620cfb913e5ce36c25 # v0.36.0
        with:
          scan-type: config
          scan-ref: .
 
  plan:
    needs: validate
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
 
      - name: Azure login (OIDC - no secrets stored)
        uses: azure/login@v2
        with:
          client-id: ${{ vars.AZURE_CLIENT_ID }}         # not a secret - store as a var
          tenant-id: ${{ vars.AZURE_TENANT_ID }}
          subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }}
 
      - name: Init
        run: terraform init
 
      - name: Plan
        run: |
          terraform plan \
            -out=tfplan \
            -var-file=env/${{ github.ref == 'refs/heads/main' && 'prod' || 'dev' }}.tfvars
 
      - name: Upload plan artifact
        uses: actions/upload-artifact@v4
        with:
          name: tfplan
          path: tfplan
          retention-days: 1
 
  apply:
    needs: plan
    runs-on: ubuntu-latest
    environment: production   # Requires a manual approval gate configured in GitHub
    if: github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
 
      - name: Azure login (OIDC)
        uses: azure/login@v2
        with:
          client-id: ${{ vars.AZURE_CLIENT_ID }}
          tenant-id: ${{ vars.AZURE_TENANT_ID }}
          subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }}
 
      - name: Init
        run: terraform init
 
      - name: Download plan artifact
        uses: actions/download-artifact@v4
        with:
          name: tfplan
 
      - name: Apply
        run: terraform apply -auto-approve tfplan

Key rules:

Use OIDC (azure/login with client-id/tenant-id) - never store a client secret as a pipeline variable.
Upload the plan artifact and download it in the apply job - never re-plan on apply. The plan that was reviewed is the plan that gets applied.
Require a named GitHub environment with protection rules on any job that runs apply against production or staging.
Run terraform fmt -check in CI (not terraform fmt) - the formatter should not auto-fix in CI. Fix locally, commit the fix.
terraform init -backend=false in the validate job - avoids requiring backend credentials just to run validate.

Module Registry Upload

Module naming

The Terraform Registry enforces this naming pattern:

PLAINTEXT

terraform-<PROVIDER>-<MODULE_NAME>

The provider must match a Terraform provider namespace and the module name uses hyphens. Examples:

PLAINTEXT

terraform-azurerm-logic-app
terraform-azurerm-key-vault
terraform-aws-s3-bucket
terraform-github-repository

Required files for registry publish

File	Required	Notes
`README.md`	Yes	Must include inputs, outputs, and a usage example
`main.tf`	Yes	At least one resource
`variables.tf`	Yes	All inputs declared with descriptions
`outputs.tf`	Yes	All outputs declared with descriptions
`terraform.tf`	Yes	`required_providers` with version constraints
`providers.tf`	Workspace only	Provider blocks with feature flags and auth. Not required in reusable modules.
`examples/`	Recommended	At least one working, runnable example
`CHANGELOG.md`	Recommended	Version history for callers
`.github/`	Recommended	CI workflows

Semantic versioning

Use Semantic Versioning . Tag releases from the main branch.

Change	Version bump	Example
New `optional()` attribute with a default, new output	Patch	`1.0.0 → 1.0.1`
New optional variable, new resource type added alongside existing ones	Minor	`1.0.0 → 1.1.0`
Removed variable, renamed resource address, changed output type, new required variable	Major	`1.0.0 → 2.0.0`

Rule: Adding a new optional() attribute with a default to an existing list(object) variable is non-breaking. Existing callers receive the default automatically and do not need to update. Removing an attribute or changing its type is always a major version bump.

terraform-docs

Install:

Bash

brew install terraform-docs                                        # macOS
go install github.com/terraform-docs/terraform-docs@latest        # cross-platform

Every module must include a .terraform-docs.yml at its root. When this file is present, terraform-docs . picks it up automatically with no flags needed. This is the production-grade config:

YAML

# .terraform-docs.yml
# https://terraform-docs.io/user-guide/configuration/
 
formatter: "markdown table"
 
# Minimum terraform-docs version required to process this config.
version: ">=0.19.0"
 
# Read the module description from the first comment block in main.tf.
# Write /**/ or # comments at the top of main.tf and they appear in the README header.
header-from: main.tf
footer-from: ""
 
recursive:
  enabled: false   # set true + path: modules to also document submodules
 
output:
  file: README.md
  mode: inject     # inject between markers, leave surrounding content untouched
  template: |-
    <!-- BEGIN_TF_DOCS -->
    {{ .Content }}
    <!-- END_TF_DOCS -->
 
output-values:
  enabled: false   # set true + from: path/to/terraform.tfstate to show real output values
  from: ""
 
sort:
  enabled: true
  by: name         # "name" | "required" | "type"
 
settings:
  anchor: true         # generate HTML anchors for every input/output row
  color: true
  default: true        # show the default column
  description: true    # show the description column
  escape: true         # escape Markdown special characters in cells
  hide-empty: false    # show sections even when they have no entries
  html: true           # use HTML table syntax (better rendering on GitHub)
  indent: 2            # heading level for sections (## = 2)
  lockfile: true       # read .terraform.lock.hcl to populate the providers section
  read-comments: true  # read variable/output descriptions from inline source comments
  required: true       # show the required column
  sensitive: true      # show the sensitive column
  type: true           # show the type column
 
sections:
  hide: []
  show: []
  # To hide specific sections uncomment and list them:
  # hide:
  #   - modules      # hide if the module calls no child modules
  #   - resources    # hide if callers don't need to know what's provisioned

Add the injection markers to README.md where the generated table should appear:

MARKDOWN

<!-- BEGIN_TF_DOCS -->
<!-- END_TF_DOCS -->

Invoke manually:

Bash

terraform-docs .          # uses .terraform-docs.yml automatically

Run via CI (GitHub Actions):

YAML

- name: terraform-docs
  uses: terraform-docs/gh-actions@v1
  with:
    working-dir: .
    config-file: .terraform-docs.yml
    output-file: README.md
    output-method: inject
    git-push: true
    git-commit-message: "docs: regenerate README via terraform-docs"

Module maintenance scripts

Every module repository should include a maintenance script at its root. Three implementations are provided below - choose one based on your toolchain. All three are functionally equivalent: sort variable/output blocks alphabetically, run terraform fmt, write a custom README header, and regenerate the terraform-docs section. Git releases are kept separate in CI.

All implementations use a brace-depth parser rather than a simple regex to correctly handle nested object({}) types and default = {} map blocks.

Set-ReadmeHeader / set_readme_header / set-readme-header is a common function across all three that writes user-authored markdown above the  markers. The resulting README structure is always:

PLAINTEXT

[Your title, description, usage example - maintained by hand]

<!-- BEGIN_TF_DOCS -->
## Requirements ...  <- auto-generated by terraform-docs
## Inputs ...
## Outputs ...
<!-- END_TF_DOCS -->

PowerShell (`Terraform-Sort.ps1`)

Requires PowerShell 7.2+. Preferred for Windows and cross-platform teams already using PowerShell for Azure automation.

PowerShell

# Sort, format, generate README with a header file
./Terraform-Sort.ps1 -SortVariables -SortOutputs -FormatTerraform -GenerateReadme -ReadmeHeaderFile HEADER.md
 
# Sort and format only (no README generation)
./Terraform-Sort.ps1 -SortVariables -SortOutputs -FormatTerraform
 
# Also process ./examples/module-dev before the root module
./Terraform-Sort.ps1 -SortVariables -SortOutputs -FormatTerraform -GenerateReadme -IncludeExampleDir
 
# Preview what would change without writing any files
./Terraform-Sort.ps1 -SortVariables -SortOutputs -FormatTerraform -GenerateReadme -WhatIf

PowerShell

#Requires -Version 7.2
# .SYNOPSIS
#     Sorts Terraform variable and output blocks, runs terraform fmt, writes a
#     custom README header, and regenerates the terraform-docs section of README.md.
#
# .DESCRIPTION
#     Designed for libre-devops Terraform module repositories. No Git operations -
#     releases are handled separately in CI. Uses a brace-depth parser that handles
#     nested object types and default maps correctly.
#
# .PARAMETER VariablesFile    Path to variables.tf (default: ./variables.tf)
# .PARAMETER OutputsFile      Path to outputs.tf   (default: ./outputs.tf)
# .PARAMETER SortVariables    Sort variable blocks alphabetically
# .PARAMETER SortOutputs      Sort output blocks alphabetically
# .PARAMETER FormatTerraform  Run terraform fmt -recursive
# .PARAMETER GenerateReadme   Regenerate terraform-docs section of README.md
# .PARAMETER ReadmeHeader     Custom markdown header written above terraform-docs output
# .PARAMETER ReadmeHeaderFile Path to a markdown file used as README header
# .PARAMETER IncludeExampleDir Also process ./examples/module-dev
#
# .EXAMPLE
#     ./Terraform-Sort.ps1 -SortVariables -SortOutputs -FormatTerraform
#
# .EXAMPLE
#     ./Terraform-Sort.ps1 -SortVariables -SortOutputs -GenerateReadme -ReadmeHeaderFile HEADER.md
 
[CmdletBinding(SupportsShouldProcess)]
param(
    [string] $VariablesFile    = './variables.tf',
    [string] $OutputsFile      = './outputs.tf',
    [string] $ReadmeHeader     = '',
    [string] $ReadmeHeaderFile = '',
 
    [switch] $SortVariables,
    [switch] $SortOutputs,
    [switch] $FormatTerraform,
    [switch] $GenerateReadme,
    [switch] $IncludeExampleDir
)
 
Set-StrictMode -Version Latest
$ErrorActionPreference = 'Stop'
 
$script:Errors  = [System.Collections.Generic.List[string]]::new()
$script:RootDir = (Get-Location).Path
 
# ── Logging helpers ───────────────────────────────────────────────────────────
 
function Write-Step ([string]$Msg) { Write-Host "  >> $Msg"    -ForegroundColor Cyan  }
function Write-Ok   ([string]$Msg) { Write-Host "  OK  $Msg"   -ForegroundColor Green }
function Write-Fail ([string]$Msg) {
    Write-Host "  FAIL  $Msg" -ForegroundColor Red
    $script:Errors.Add($Msg)
}
 
function Assert-Tool ([string]$Name) {
    if (-not (Get-Command $Name -ErrorAction SilentlyContinue)) {
        throw "$Name not found in PATH"
    }
}
 
# ── HCL block parser (brace-depth) ───────────────────────────────────────────
 
function Get-TerraformBlocks {
    # Extracts complete top-level HCL blocks using brace-depth tracking.
    # Correctly handles nested object types, default maps, and validation blocks.
    param(
        [string] $Content,
        [ValidateSet('variable','output')]
        [string] $Keyword
    )
 
    $blocks = [System.Collections.Generic.List[string]]::new()
    $lines  = $Content -split '\r?\n'
    $i      = 0
 
    while ($i -lt $lines.Count) {
        if ($lines[$i] -match "^${Keyword}\s+`"[^`"]+`"\s*\{") {
            $depth  = 0
            $buffer = [System.Text.StringBuilder]::new()
 
            while ($i -lt $lines.Count) {
                $line = $lines[$i]
                $null = $buffer.AppendLine($line)
                foreach ($ch in $line.ToCharArray()) {
                    if   ($ch -eq '{') { $depth++ }
                    elseif ($ch -eq '}') { $depth-- }
                }
                $i++
                if ($depth -eq 0) { break }
            }
 
            $blocks.Add($buffer.ToString().TrimEnd())
        }
        else { $i++ }
    }
 
    return $blocks
}
 
function Get-BlockName ([string]$Block, [string]$Keyword) {
    if ($Block -match "${Keyword}\s+`"([^`"]+)`"") { return $Matches[1] }
    return [string]::Empty
}
 
# ── Core operations ───────────────────────────────────────────────────────────
 
function Invoke-TerraformFmt {
    Assert-Tool 'terraform'
    Write-Step 'terraform fmt -recursive'
    & terraform fmt -recursive
    if ($LASTEXITCODE -ne 0) { throw "terraform fmt failed (exit $LASTEXITCODE)" }
    Write-Ok 'terraform fmt'
}
 
function Invoke-SortFile {
    param(
        [string] $FilePath,
        [ValidateSet('variable','output')]
        [string] $Keyword
    )
 
    if (-not (Test-Path $FilePath -PathType Leaf)) {
        Write-Fail "Not found: $FilePath"
        return
    }
 
    Write-Step "Sorting ${Keyword} blocks in $FilePath"
    $content = Get-Content $FilePath -Raw -Encoding UTF8
    $blocks  = Get-TerraformBlocks -Content $content -Keyword $Keyword
 
    if ($blocks.Count -eq 0) {
        Write-Fail "No ${Keyword} blocks found in $FilePath"
        return
    }
 
    $sorted = ($blocks |
        Sort-Object { Get-BlockName -Block $_ -Keyword $Keyword } |
        ForEach-Object { $_.TrimEnd() }) -join "`n`n"
 
    if ($PSCmdlet.ShouldProcess($FilePath, "Write sorted ${Keyword} blocks")) {
        Set-Content -Path $FilePath -Value ($sorted.TrimEnd() + "`n") -Encoding UTF8 -NoNewline
        Write-Ok "Sorted $($blocks.Count) ${Keyword} block(s) -> $FilePath"
    }
}
 
function Set-ReadmeHeader {
    # Writes a custom markdown header to README.md and appends the terraform-docs
    # injection markers below it. Run Invoke-ReadmeUpdate afterwards to populate
    # the section between the markers.
    #
    # Resulting README.md structure:
    #   [Your custom header - title, description, usage example, etc.]
    #
    #   <!-- BEGIN_TF_DOCS -->
    #   <!-- END_TF_DOCS -->
    #
    # terraform-docs injects Requirements/Inputs/Outputs between the markers
    # without touching the header above.
    #
    # Header: markdown content to place above the markers.
    #         Pass an empty string to write markers-only.
    param([string]$Header)
 
    Write-Step 'Writing README header'
 
    $markers = "<!-- BEGIN_TF_DOCS -->`n<!-- END_TF_DOCS -->`n"
    $body    = if ($Header.Trim()) {
        $Header.TrimEnd() + "`n`n" + $markers
    }
    else {
        $markers
    }
 
    if ($PSCmdlet.ShouldProcess('README.md', 'Write README header')) {
        Set-Content -Path 'README.md' -Value $body -Encoding UTF8 -NoNewline
        Write-Ok 'README.md header written'
    }
}
 
function Invoke-ReadmeUpdate {
    Assert-Tool 'terraform-docs'
 
    if (Test-Path '.terraform-docs.yml') {
        Write-Step 'terraform-docs . (using .terraform-docs.yml)'
        & terraform-docs .
    }
    else {
        # No config file - ensure markers exist then inject markdown table
        if (-not (Test-Path 'README.md')) {
            Set-Content 'README.md' `
                -Value "<!-- BEGIN_TF_DOCS -->`n<!-- END_TF_DOCS -->`n" `
                -Encoding UTF8 -NoNewline
        }
        Write-Step 'terraform-docs markdown table --output-mode inject'
        & terraform-docs markdown table --output-file README.md --output-mode inject .
    }
 
    if ($LASTEXITCODE -ne 0) { throw "terraform-docs failed (exit $LASTEXITCODE)" }
    Write-Ok 'README.md docs section updated'
}
 
# ── Directory processor ───────────────────────────────────────────────────────
 
function Invoke-ModuleDirectory {
    param(
        [string] $Directory,
        [string] $HeaderText = ''
    )
 
    if (-not (Test-Path $Directory -PathType Container)) {
        Write-Warning "Directory not found, skipping: $Directory"
        return
    }
 
    Write-Host "`nProcessing: $(Resolve-Path $Directory)" -ForegroundColor White
    Push-Location $Directory
    try {
        if ($FormatTerraform) {
            try   { Invoke-TerraformFmt }
            catch { Write-Fail $_.Exception.Message }
        }
 
        if ($SortVariables) {
            try   { Invoke-SortFile -FilePath './variables.tf' -Keyword 'variable' }
            catch { Write-Fail $_.Exception.Message }
        }
 
        if ($SortOutputs) {
            try   { Invoke-SortFile -FilePath './outputs.tf' -Keyword 'output' }
            catch { Write-Fail $_.Exception.Message }
        }
 
        if ($GenerateReadme) {
            # Set-ReadmeHeader writes the custom header + markers.
            # Invoke-ReadmeUpdate injects the terraform-docs content between the markers.
            if ($HeaderText) {
                try   { Set-ReadmeHeader -Header $HeaderText }
                catch { Write-Fail $_.Exception.Message }
            }
            try   { Invoke-ReadmeUpdate }
            catch { Write-Fail $_.Exception.Message }
        }
    }
    finally {
        Pop-Location
    }
}
 
# ── Resolve header ────────────────────────────────────────────────────────────
 
$resolvedHeader = ''
 
if ($ReadmeHeaderFile) {
    if (-not (Test-Path $ReadmeHeaderFile -PathType Leaf)) {
        Write-Error "ReadmeHeaderFile not found: $ReadmeHeaderFile"
        exit 1
    }
    $resolvedHeader = Get-Content $ReadmeHeaderFile -Raw -Encoding UTF8
    Write-Host "README header source: $ReadmeHeaderFile" -ForegroundColor Cyan
}
elseif ($ReadmeHeader) {
    $resolvedHeader = $ReadmeHeader
    Write-Host 'README header source: -ReadmeHeader parameter' -ForegroundColor Cyan
}
 
# ── Entry point ───────────────────────────────────────────────────────────────
 
if ($IncludeExampleDir) {
    Invoke-ModuleDirectory `
        -Directory   (Join-Path $script:RootDir 'examples/module-dev') `
        -HeaderText  $resolvedHeader
}
 
Invoke-ModuleDirectory -Directory $script:RootDir -HeaderText $resolvedHeader
 
# ── Summary ───────────────────────────────────────────────────────────────────
 
Write-Host ''
if ($script:Errors.Count -gt 0) {
    Write-Host "Completed with $($script:Errors.Count) error(s):" -ForegroundColor Red
    $script:Errors | ForEach-Object { Write-Host "  - $_" -ForegroundColor Red }
    exit 1
}
 
Write-Host 'Done.' -ForegroundColor Green

Bash (`terraform-sort.sh`)

Requires Bash 4.0+ and GNU awk (gawk). Available on Linux by default; on macOS install via brew install gawk. Suitable for Linux CI environments and engineers who prefer shell scripts.

Bash

# Sort, format, generate README with a header file
./terraform-sort.sh --sort-variables --sort-outputs --format-terraform --generate-readme --readme-header-file HEADER.md
 
# Sort and format only (no README generation)
./terraform-sort.sh --sort-variables --sort-outputs --format-terraform
 
# Also process ./examples/module-dev before the root module
./terraform-sort.sh --sort-variables --sort-outputs --format-terraform --generate-readme --include-example-dir
 
# Show help
./terraform-sort.sh --help

Bash

#!/usr/bin/env bash
# terraform-sort.sh - Sort Terraform blocks, run fmt, write README header,
# regenerate terraform-docs. No git operations.
set -euo pipefail
 
# ── Defaults ──────────────────────────────────────────────────────────────────
 
VARIABLES_FILE='./variables.tf'
OUTPUTS_FILE='./outputs.tf'
README_HEADER=''
README_HEADER_FILE=''
SORT_VARIABLES=false
SORT_OUTPUTS=false
FORMAT_TERRAFORM=false
GENERATE_README=false
INCLUDE_EXAMPLE_DIR=false
 
# ── Usage ─────────────────────────────────────────────────────────────────────
 
usage() {
    cat <<EOF
Usage: $(basename "$0") [OPTIONS]
 
  --variables-file FILE     Path to variables.tf (default: $VARIABLES_FILE)
  --outputs-file FILE       Path to outputs.tf   (default: $OUTPUTS_FILE)
  --sort-variables          Sort variable blocks alphabetically
  --sort-outputs            Sort output blocks alphabetically
  --format-terraform        Run terraform fmt -recursive
  --generate-readme         Regenerate terraform-docs section of README.md
  --readme-header TEXT      Custom markdown header above terraform-docs output
  --readme-header-file FILE Path to a markdown file used as README header
  --include-example-dir     Also process ./examples/module-dev
  -h, --help                Show this help
EOF
}
 
# ── Argument parsing ──────────────────────────────────────────────────────────
 
while [[ $# -gt 0 ]]; do
    case "$1" in
        --variables-file)      VARIABLES_FILE="$2";    shift 2 ;;
        --outputs-file)        OUTPUTS_FILE="$2";      shift 2 ;;
        --sort-variables)      SORT_VARIABLES=true;    shift   ;;
        --sort-outputs)        SORT_OUTPUTS=true;      shift   ;;
        --format-terraform)    FORMAT_TERRAFORM=true;  shift   ;;
        --generate-readme)     GENERATE_README=true;   shift   ;;
        --readme-header)       README_HEADER="$2";     shift 2 ;;
        --readme-header-file)  README_HEADER_FILE="$2";shift 2 ;;
        --include-example-dir) INCLUDE_EXAMPLE_DIR=true; shift ;;
        -h|--help)             usage; exit 0 ;;
        *) printf 'Unknown option: %s\n' "$1" >&2; usage; exit 1 ;;
    esac
done
 
ROOT_DIR="$(pwd)"
ERRORS=()
 
# ── Logging ───────────────────────────────────────────────────────────────────
 
step() { printf '  \033[36m>> %s\033[0m\n' "$*";       }
ok()   { printf '  \033[32mOK  %s\033[0m\n' "$*";      }
fail() { printf '  \033[31mFAIL  %s\033[0m\n' "$*" >&2; ERRORS+=("$*"); }
 
assert_tool() {
    command -v "$1" &>/dev/null || { fail "$1 not found in PATH"; return 1; }
}
 
# ── HCL block parser (gawk brace-depth) ──────────────────────────────────────
 
sort_terraform_blocks() {
    local file="$1" keyword="$2" tmp_dir count=0
 
    [[ -f "$file" ]] || { fail "Not found: $file"; return 1; }
 
    step "Sorting ${keyword} blocks in $file"
    tmp_dir="$(mktemp -d)"
 
    # Extract each block into its own file named by block name.
    # gawk is required for the 3-argument match() form used here.
    gawk -v kw="$keyword" -v tmpdir="$tmp_dir" '
    BEGIN { in_block=0; depth=0; block=""; name="" }
 
    !in_block && match($0, "^" kw " \"([^\"]+)\" \\{", arr) {
        name = arr[1]; in_block = 1; depth = 0; block = $0 "\n"
        for (i=1; i<=length($0); i++) {
            c = substr($0,i,1)
            if (c=="{") depth++; else if (c=="}") depth--
        }
        if (depth==0) {
            sub(/\n$/,"",block); print block > (tmpdir "/" name ".tf")
            close(tmpdir "/" name ".tf"); in_block=0; block=""; name=""
        }
        next
    }
 
    in_block {
        block = block $0 "\n"
        for (i=1; i<=length($0); i++) {
            c = substr($0,i,1)
            if (c=="{") depth++; else if (c=="}") depth--
        }
        if (depth==0) {
            sub(/\n$/,"",block); print block > (tmpdir "/" name ".tf")
            close(tmpdir "/" name ".tf"); in_block=0; block=""; name=""
        }
    }
    ' "$file"
 
    # Sort by filename (block name) and join with double newlines
    local output='' first=true
    while IFS= read -r -d $'\0' bf; do
        $first && output="$(cat "$bf")" && first=false \
               || output="$output"$'\n\n'"$(cat "$bf")"
        count=$((count + 1))
    done < <(find "$tmp_dir" -name '*.tf' -print0 | sort -z)
 
    rm -rf "$tmp_dir"
 
    if [[ $count -eq 0 ]]; then
        fail "No ${keyword} blocks found in $file"; return 1
    fi
 
    printf '%s\n' "$output" > "$file"
    ok "Sorted $count ${keyword} block(s) -> $file"
}
 
# ── Core operations ───────────────────────────────────────────────────────────
 
run_terraform_fmt() {
    assert_tool terraform || return
    step 'terraform fmt -recursive'
    terraform fmt -recursive && ok 'terraform fmt' || fail 'terraform fmt failed'
}
 
set_readme_header() {
    local header="$1"
    step 'Writing README header'
    local markers=$'<!-- BEGIN_TF_DOCS -->\n<!-- END_TF_DOCS -->'
    if [[ -n "$header" ]]; then
        printf '%s\n\n%s\n' "$header" "$markers" > README.md
    else
        printf '%s\n' "$markers" > README.md
    fi
    ok 'README header written'
}
 
run_terraform_docs() {
    assert_tool terraform-docs || return
    if [[ -f '.terraform-docs.yml' ]]; then
        step 'terraform-docs . (using .terraform-docs.yml)'
        terraform-docs . && ok 'README.md docs section updated' \
                         || fail 'terraform-docs failed'
    else
        [[ -f 'README.md' ]] \
            || printf '<!-- BEGIN_TF_DOCS -->\n<!-- END_TF_DOCS -->\n' > README.md
        step 'terraform-docs markdown table --output-mode inject'
        terraform-docs markdown table \
            --output-file README.md --output-mode inject . \
            && ok 'README.md docs section updated' \
            || fail 'terraform-docs failed'
    fi
}
 
# ── Directory processor ───────────────────────────────────────────────────────
 
process_directory() {
    local dir="$1" header="$2"
    [[ -d "$dir" ]] || { printf 'WARNING: Not found, skipping: %s\n' "$dir"; return; }
 
    printf '\nProcessing: %s\n' "$(cd "$dir" && pwd)"
    pushd "$dir" > /dev/null
 
    $FORMAT_TERRAFORM  && { run_terraform_fmt || true; }
    $SORT_VARIABLES    && { sort_terraform_blocks "$VARIABLES_FILE" 'variable' || true; }
    $SORT_OUTPUTS      && { sort_terraform_blocks "$OUTPUTS_FILE"   'output'   || true; }
 
    if $GENERATE_README; then
        [[ -n "$header" ]] && { set_readme_header "$header" || true; }
        run_terraform_docs || true
    fi
 
    popd > /dev/null
}
 
# ── Resolve header ────────────────────────────────────────────────────────────
 
resolved_header=''
if [[ -n "$README_HEADER_FILE" ]]; then
    [[ -f "$README_HEADER_FILE" ]] \
        || { printf 'ERROR: --readme-header-file not found: %s\n' "$README_HEADER_FILE" >&2; exit 1; }
    resolved_header="$(cat "$README_HEADER_FILE")"
    printf 'README header source: %s\n' "$README_HEADER_FILE"
elif [[ -n "$README_HEADER" ]]; then
    resolved_header="$README_HEADER"
    printf 'README header source: --readme-header parameter\n'
fi
 
# ── Entry point ───────────────────────────────────────────────────────────────
 
$INCLUDE_EXAMPLE_DIR && process_directory "$ROOT_DIR/examples/module-dev" "$resolved_header"
process_directory "$ROOT_DIR" "$resolved_header"
 
# ── Summary ───────────────────────────────────────────────────────────────────
 
printf '\n'
if [[ ${#ERRORS[@]} -gt 0 ]]; then
    printf '\033[31mCompleted with %d error(s):\033[0m\n' "${#ERRORS[@]}" >&2
    printf '  \033[31m- %s\033[0m\n' "${ERRORS[@]}" >&2
    exit 1
fi
printf '\033[32mDone.\033[0m\n'

Python (`terraform_sort.py`)

Standard library only. Requires Python 3.12+; tested against 3.14. Suitable for teams that already use Python tooling or want a single script that runs on any OS without a PowerShell or Bash dependency.

Python

# Sort, format, generate README with a header file
python terraform_sort.py --sort-variables --sort-outputs --format-terraform --generate-readme --readme-header-file HEADER.md
 
# Sort and format only (no README generation)
python terraform_sort.py --sort-variables --sort-outputs --format-terraform
 
# Also process ./examples/module-dev before the root module
python terraform_sort.py --sort-variables --sort-outputs --format-terraform --generate-readme --include-example-dir
 
# Show help
python terraform_sort.py --help

Python

#!/usr/bin/env python3
"""
terraform_sort.py
 
Sort Terraform variable/output blocks, run terraform fmt, write a custom README
header, and regenerate terraform-docs output.
 
Standard library only. Requires Python 3.12+.
"""
 
from __future__ import annotations
 
import argparse
import os
import shutil
import subprocess
import sys
import textwrap
from pathlib import Path
 
# ── Logging ───────────────────────────────────────────────────────────────────
 
def _step(msg: str) -> None:
    print(f'  \033[36m>> {msg}\033[0m', flush=True)
 
def _ok(msg: str) -> None:
    print(f'  \033[32mOK  {msg}\033[0m', flush=True)
 
def _fail(msg: str, errors: list[str]) -> None:
    print(f'  \033[31mFAIL  {msg}\033[0m', file=sys.stderr, flush=True)
    errors.append(msg)
 
# ── HCL block parser (brace-depth) ───────────────────────────────────────────
 
def extract_blocks(content: str, keyword: str) -> list[tuple[str, str]]:
    """
    Extract complete top-level HCL blocks using brace-depth tracking.
    Returns (name, block_text) tuples in source order.
    Handles nested object types, default maps, and validation blocks.
    A known limitation shared with all text-based HCL parsers: brace characters
    inside string literals are counted, which can miscount depth in pathological
    inputs. This does not affect standard Terraform variable/output definitions.
    """
    blocks: list[tuple[str, str]] = []
    lines = content.splitlines(keepends=True)
    i = 0
    prefix = f'{keyword} "'
 
    while i < len(lines):
        stripped = lines[i].lstrip()
        if stripped.startswith(prefix):
            after = stripped[len(prefix):]
            name = after[: after.index('"')]
            depth, buf = 0, []
            while i < len(lines):
                buf.append(lines[i])
                for ch in lines[i]:
                    if ch == '{':
                        depth += 1
                    elif ch == '}':
                        depth -= 1
                i += 1
                if depth == 0:
                    break
            blocks.append((name, ''.join(buf).rstrip('\n')))
        else:
            i += 1
 
    return blocks
 
# ── Core operations ───────────────────────────────────────────────────────────
 
def assert_tool(name: str) -> None:
    if not shutil.which(name):
        raise RuntimeError(f'{name} not found in PATH')
 
 
def _run(cmd: list[str], errors: list[str]) -> bool:
    rc = subprocess.run(cmd).returncode
    if rc != 0:
        errors.append(f'{" ".join(cmd)} exited {rc}')
        return False
    return True
 
 
def sort_file(path: Path, keyword: str, errors: list[str]) -> None:
    if not path.exists():
        _fail(f'Not found: {path}', errors)
        return
 
    _step(f'Sorting {keyword} blocks in {path}')
    blocks = extract_blocks(path.read_text(encoding='utf-8'), keyword)
 
    if not blocks:
        _fail(f'No {keyword} blocks found in {path}', errors)
        return
 
    sorted_blocks = sorted(blocks, key=lambda b: b[0])
    path.write_text(
        '\n\n'.join(text for _, text in sorted_blocks) + '\n',
        encoding='utf-8',
    )
    _ok(f'Sorted {len(blocks)} {keyword} block(s) -> {path}')
 
 
def run_terraform_fmt(errors: list[str]) -> None:
    try:
        assert_tool('terraform')
    except RuntimeError as exc:
        _fail(str(exc), errors)
        return
    _step('terraform fmt -recursive')
    if _run(['terraform', 'fmt', '-recursive'], errors):
        _ok('terraform fmt')
 
 
def set_readme_header(header: str, errors: list[str]) -> None:
    _step('Writing README header')
    markers = '<!-- BEGIN_TF_DOCS -->\n<!-- END_TF_DOCS -->\n'
    body = f'{header.rstrip()}\n\n{markers}' if header.strip() else markers
    try:
        Path('README.md').write_text(body, encoding='utf-8')
        _ok('README header written')
    except OSError as exc:
        _fail(str(exc), errors)
 
 
def run_terraform_docs(errors: list[str]) -> None:
    try:
        assert_tool('terraform-docs')
    except RuntimeError as exc:
        _fail(str(exc), errors)
        return
 
    if Path('.terraform-docs.yml').exists():
        _step('terraform-docs . (using .terraform-docs.yml)')
        if _run(['terraform-docs', '.'], errors):
            _ok('README.md docs section updated')
    else:
        readme = Path('README.md')
        if not readme.exists():
            readme.write_text(
                '<!-- BEGIN_TF_DOCS -->\n<!-- END_TF_DOCS -->\n',
                encoding='utf-8',
            )
        _step('terraform-docs markdown table --output-mode inject')
        if _run(
            ['terraform-docs', 'markdown', 'table',
             '--output-file', 'README.md', '--output-mode', 'inject', '.'],
            errors,
        ):
            _ok('README.md docs section updated')
 
 
def process_directory(
    directory: Path,
    header: str,
    args: argparse.Namespace,
    errors: list[str],
) -> None:
    if not directory.is_dir():
        print(f'WARNING: Not found, skipping: {directory}')
        return
 
    print(f'\nProcessing: {directory.resolve()}', flush=True)
    original = Path.cwd()
    os.chdir(directory)
    try:
        if args.format_terraform:
            run_terraform_fmt(errors)
        if args.sort_variables:
            sort_file(Path(args.variables_file), 'variable', errors)
        if args.sort_outputs:
            sort_file(Path(args.outputs_file), 'output', errors)
        if args.generate_readme:
            if header:
                set_readme_header(header, errors)
            run_terraform_docs(errors)
    finally:
        os.chdir(original)
 
# ── CLI ───────────────────────────────────────────────────────────────────────
 
def build_parser() -> argparse.ArgumentParser:
    p = argparse.ArgumentParser(
        prog='terraform_sort.py',
        description='Sort Terraform blocks, run fmt, write README header, regenerate docs.',
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog=textwrap.dedent('''\
            Examples:
              python terraform_sort.py --sort-variables --sort-outputs --format-terraform
              python terraform_sort.py --sort-variables --sort-outputs \\
                  --generate-readme --readme-header-file HEADER.md
        '''),
    )
    p.add_argument('--variables-file',    default='./variables.tf')
    p.add_argument('--outputs-file',      default='./outputs.tf')
    p.add_argument('--readme-header',     default='')
    p.add_argument('--readme-header-file',default='')
    p.add_argument('--sort-variables',    action='store_true')
    p.add_argument('--sort-outputs',      action='store_true')
    p.add_argument('--format-terraform',  action='store_true')
    p.add_argument('--generate-readme',   action='store_true')
    p.add_argument('--include-example-dir', action='store_true')
    return p
 
 
def main() -> None:
    args = build_parser().parse_args()
    errors: list[str] = []
    root = Path.cwd()
 
    header = ''
    if args.readme_header_file:
        hf = Path(args.readme_header_file)
        if not hf.is_file():
            print(f'ERROR: --readme-header-file not found: {hf}', file=sys.stderr)
            sys.exit(1)
        header = hf.read_text(encoding='utf-8')
        print(f'README header source: {hf}')
    elif args.readme_header:
        header = args.readme_header
        print('README header source: --readme-header parameter')
 
    if args.include_example_dir:
        process_directory(root / 'examples/module-dev', header, args, errors)
 
    process_directory(root, header, args, errors)
 
    print()
    if errors:
        print(f'\033[31mCompleted with {len(errors)} error(s):\033[0m', file=sys.stderr)
        for e in errors:
            print(f'  \033[31m- {e}\033[0m', file=sys.stderr)
        sys.exit(1)
 
    print('\033[32mDone.\033[0m')
 
 
if __name__ == '__main__':
    main()

Release workflow

Git releases are kept separate from the sort/docs script. Tag from main and the Terraform Registry picks up the tag automatically via its connected OAuth webhook:

Bash

git tag -a v1.2.0 -m "feat: add ip_restriction support"
git push origin v1.2.0

Anti-patterns

Using client secrets or certificates for CI/CD authentication - client secrets have a fixed expiry, require manual rotation, and grant full access if leaked. Use OIDC (federated identity) for external runners and Managed Identity for Azure-hosted runners. Neither issues a credential that can be stolen.
Hardcoding provider credentials in module code - credentials belong in the environment (environment variables, managed identity, OIDC). A module that configures a provider with static credentials cannot be aliased, cannot be used in multi-subscription deployments, and leaks secrets into source control.
State surgery (terraform state mv, rm, pull/push) - direct state manipulation bypasses the dependency graph, produces state that can diverge from real infrastructure, and leaves no reviewable audit trail. Use moved {} blocks for renames, removed {} for decommissions, and import {} blocks for onboarding. If you need to split state files, redesign workspace boundaries or adopt HCP Terraform Stacks.
Using count for named resources - count addresses resources by integer index. Inserting or removing a list item shifts all subsequent indexes and triggers unexpected destroy-and-recreate cycles. Use for_each with a stable string key.
Putting provider configuration in a reusable module - callers cannot override or alias a provider that is configured inside a child module. Configure providers only in root modules; pass aliased providers via providers = {} only when genuinely required.
Committing override.tf or *_override.tf - override files are machine-specific. Committing them imposes your local auth configuration on everyone else and will cause CI failures when the pipeline attempts to use the wrong provider settings. Add override.tf, override.tf.json, *_override.tf, and *_override.tf.json to .gitignore in every workspace.
Putting provider configuration in terraform.tf - terraform.tf holds the terraform {} block only (required_version, required_providers). Provider blocks (provider "azurerm" {}) belong in providers.tf. Mixing them makes it harder to override locally and harder to diff changes to version constraints vs auth configuration.
Committing terraform.tfvars containing secrets - tfvars files routinely end up in version control. Use environment variables (TF_VAR_*), a secrets manager (Azure Key Vault, AWS Secrets Manager), or a pipeline secret store for any sensitive value.
One state file for all environments or all applications - a plan against the production state file that also contains development resources is unnecessarily dangerous. Separate state by environment and by application component. The blast radius of a plan should be exactly one stack in one environment.
depends_on on a module call - depends_on on a module reference forces Terraform to treat all of that module’s resources as depending on the listed resource or module, which collapses plan parallelism. Instead, pass the specific resource attribute (e.g. an ID) as a module input variable so Terraform can infer the precise, narrow dependency.
Not setting required_version - Terraform’s language and provider behaviours differ between versions. Engineers running different Terraform versions against the same state produce inconsistent plan outputs. Pin required_version in terraform.tf.
Redundant null checks on optional() with a default - optional(bool, true) is never null. Checking each.value.foo != null ? each.value.foo : true is noise that misleads future readers into thinking null is a possible value when it is not.
Importing resources without confirming zero drift - terraform import adds a resource to state but does not write its HCL configuration. Always follow an import with terraform plan and confirm it shows no changes before committing the configuration.
Overusing locals as variable aliases - a local that just renames an input (local.name = var.name) adds no derived value and adds an indirection layer for readers. Use var.foo directly unless the local genuinely computes a new expression.
Not marking sensitive outputs - outputs containing credentials, keys, or connection strings must be marked sensitive = true. Without this, values are printed in plain text to the terminal on every terraform output and in CI logs.

Terraform Standards

Why standards?

File Structure

Reusable module

Workspace configuration (root module)

Provider Pinning

Provider selection - azurerm vs azapi

Entra ID with the azuread and msgraph providers

Version management on developer machines

Provider authentication - standardise on OIDC

providers.tf - provider configuration

override.tf - local development

Variables

Naming

Declaration order and required fields

Validation blocks

The list(object) pattern for multi-resource modules

Locals

Resources

Resource label naming

for_each over count for named resources

Argument ordering within a resource block

Avoid redundant null checks

Dynamic Blocks

Identity block consolidation

Outputs

Declaration rules

What to expose

Check Blocks

State Management

Remote backend

One state file per environment per application

State inspection (read-only - acceptable)

State surgery - strongly discouraged

State separation and HCP Terraform / Stacks

Workspace redesign (always first)

HCP Terraform (if you have an HCP subscription)

Terraform Stacks (GA - HCP Terraform feature)

terraform_remote_state (if not using Stacks)

Never use state surgery

Testing

Native terraform test (Terraform 1.6+)

Terratest (Go - integration testing)

Testing strategy

Security & Compliance

Treat the state backend as a secret store

Commit the dependency lock file

Static analysis and policy-as-code gate the pipeline

Code Review & Merge Gates

Branch protection - the non-negotiable gates

CODEOWNERS

Production-ready PR workflow

TACOS and HCP Terraform - native git-driven workflows

Pipelines

Standard CI/CD stage order

GitHub Actions reference

Module Registry Upload

Module naming

Required files for registry publish

Semantic versioning

terraform-docs

Module maintenance scripts

PowerShell (Terraform-Sort.ps1)

Bash (terraform-sort.sh)

Python (terraform_sort.py)

Release workflow

Anti-patterns

See Also

`providers.tf` - provider configuration

`override.tf` - local development

The `list(object)` pattern for multi-resource modules

`for_each` over `count` for named resources

`terraform_remote_state` (if not using Stacks)

PowerShell (`Terraform-Sort.ps1`)

Bash (`terraform-sort.sh`)

Python (`terraform_sort.py`)