Why Terraform Dominates Infrastructure as Code
Terraform is the number one IaC tool worldwide. The HashiCorp Terraform Associate certification is the fastest-growing credential in the DevOps space. Every major cloud provider, SaaS platform, and infrastructure vendor has a Terraform provider. And the reason is simple: Terraform solved the infrastructure management problem in a way that scales from a single developer to thousands of engineers managing hundreds of environments.
If you're managing AWS infrastructure manually through the console, or writing one-off scripts to create resources, you already know the pain. Configuration drift, inconsistent environments, no audit trail, and the anxiety of "who changed this and when?" Terraform eliminates all of it by treating infrastructure as code — versioned, reviewable, testable, and repeatable.
This guide covers Terraform on AWS from fundamentals to production patterns. Whether you're writing your first main.tf or refactoring a legacy Terraform codebase, you'll find actionable patterns here. For a comparison of how Terraform stacks up against CloudFormation and Pulumi, see our IaC tool comparison.
Terraform Fundamentals
How Terraform Works
Terraform follows a simple lifecycle:
- Write: Define infrastructure in HCL (HashiCorp Configuration Language) files.
- Plan: Terraform compares your code to the current state and shows what will change.
- Apply: Terraform makes the API calls to create, update, or destroy resources.
- State: Terraform records what it created in a state file, which it uses for future plans.
terraform init # Download providers, initialize backend
terraform plan # Preview changes
terraform apply # Execute changes
terraform destroy # Tear down everything
Your First AWS Resources
# Configure the AWS provider
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
required_version = ">= 1.7"
}
provider "aws" {
region = "us-east-1"
}
# Create a VPC
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "main-vpc"
Environment = "production"
}
}
# Create public and private subnets
resource "aws_subnet" "public" {
count = 2
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(aws_vpc.main.cidr_block, 8, count.index)
availability_zone = data.aws_availability_zones.available.names[count.index]
map_public_ip_on_launch = true
tags = {
Name = "public-${count.index + 1}"
}
}
resource "aws_subnet" "private" {
count = 2
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(aws_vpc.main.cidr_block, 8, count.index + 10)
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = {
Name = "private-${count.index + 1}"
}
}
data "aws_availability_zones" "available" {
state = "available"
}
The cidrsubnet Function
cidrsubnet(prefix, newbits, netnum) calculates subnet CIDR blocks automatically. cidrsubnet("10.0.0.0/16", 8, 0) returns 10.0.0.0/24, cidrsubnet("10.0.0.0/16", 8, 1) returns 10.0.1.0/24, etc. This is cleaner than hardcoding CIDR blocks and scales when you add more subnets.
State Management: The Most Critical Decision
Terraform state is a JSON file that maps your HCL resources to real infrastructure. Whoever controls the state file controls the infrastructure. Getting state management right is the single most important architectural decision in your Terraform setup.
Remote State with S3
Never store state locally for shared infrastructure. Use S3 with DynamoDB locking:
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "production/networking/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
Why this matters:
- S3: Durable storage, versioning for state history, encryption at rest.
- DynamoDB: Prevents concurrent
terraform applyfrom corrupting state. - Key path: Organize state files by environment and component.
State File Organization
Monolithic state (everything in one file) doesn't scale. When a single terraform apply manages your VPC, databases, applications, and DNS, a change to one component plans against everything. Break state into logical components:
terraform/
├── networking/ # VPC, subnets, route tables, NAT gateways
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
├── data/ # RDS, ElastiCache, S3 buckets
│ ├── main.tf
│ └── variables.tf
├── compute/ # ECS, Lambda, Auto Scaling
│ ├── main.tf
│ └── variables.tf
└── dns/ # Route 53 records
├── main.tf
└── variables.tf
Each directory has its own state file. Changes to DNS don't require planning against the entire VPC. Use terraform_remote_state data sources or outputs to share information between components.
Never Edit State Manually
The state file is not meant to be edited by hand. If you need to move resources between state files, rename resources, or import existing infrastructure, use terraform state mv, terraform state rm, and terraform import. Manual edits risk corrupting the state and losing track of infrastructure.
Modules: Reusable Infrastructure Components
Modules are Terraform's mechanism for reuse. Instead of copy-pasting resource blocks, create a module once and instantiate it with different parameters.
Creating a Module
# modules/ecs-service/main.tf
resource "aws_ecs_service" "this" {
name = var.name
cluster = var.cluster_arn
task_definition = aws_ecs_task_definition.this.arn
desired_count = var.desired_count
launch_type = "FARGATE"
network_configuration {
subnets = var.subnet_ids
security_groups = [aws_security_group.this.id]
assign_public_ip = false
}
load_balancer {
target_group_arn = var.target_group_arn
container_name = var.name
container_port = var.container_port
}
}
resource "aws_ecs_task_definition" "this" {
family = var.name
requires_compatibilities = ["FARGATE"]
network_mode = "awsvpc"
cpu = var.cpu
memory = var.memory
execution_role_arn = var.execution_role_arn
task_role_arn = var.task_role_arn
container_definitions = jsonencode([{
name = var.name
image = var.image
essential = true
portMappings = [{
containerPort = var.container_port
protocol = "tcp"
}]
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = "/ecs/${var.name}"
"awslogs-region" = var.region
"awslogs-stream-prefix" = "ecs"
}
}
}])
}
Using the Module
module "api_service" {
source = "./modules/ecs-service"
name = "api"
image = "123456789.dkr.ecr.us-east-1.amazonaws.com/api:v2.1"
cluster_arn = aws_ecs_cluster.main.arn
subnet_ids = module.networking.private_subnet_ids
target_group_arn = aws_lb_target_group.api.arn
container_port = 8000
cpu = 512
memory = 1024
desired_count = 3
execution_role_arn = aws_iam_role.ecs_execution.arn
task_role_arn = aws_iam_role.api_task.arn
region = "us-east-1"
}
module "worker_service" {
source = "./modules/ecs-service"
name = "worker"
image = "123456789.dkr.ecr.us-east-1.amazonaws.com/worker:v2.1"
cluster_arn = aws_ecs_cluster.main.arn
subnet_ids = module.networking.private_subnet_ids
target_group_arn = aws_lb_target_group.worker.arn
container_port = 9000
cpu = 256
memory = 512
desired_count = 2
execution_role_arn = aws_iam_role.ecs_execution.arn
task_role_arn = aws_iam_role.worker_task.arn
region = "us-east-1"
}
Variables and Environments
Variable Definition Patterns
# variables.tf
variable "environment" {
description = "Deployment environment"
type = string
validation {
condition = contains(["dev", "staging", "production"], var.environment)
error_message = "Environment must be dev, staging, or production."
}
}
variable "instance_config" {
description = "EC2 instance configuration"
type = object({
instance_type = string
volume_size = number
min_count = number
max_count = number
})
default = {
instance_type = "t3.medium"
volume_size = 50
min_count = 2
max_count = 10
}
}
Environment-Specific Configuration
Use .tfvars files for environment-specific values:
# environments/production.tfvars
environment = "production"
instance_config = {
instance_type = "m6i.xlarge"
volume_size = 100
min_count = 3
max_count = 20
}
# environments/staging.tfvars
environment = "staging"
instance_config = {
instance_type = "t3.medium"
volume_size = 50
min_count = 1
max_count = 3
}
terraform plan -var-file="environments/production.tfvars"
terraform apply -var-file="environments/production.tfvars"
Workspaces vs. Separate Directories
Terraform workspaces allow multiple state files with the same configuration. They work well for simple environment differences. For production systems with significantly different architectures per environment (e.g., production has multi-AZ RDS, staging has single-AZ), separate directories with shared modules are more maintainable. Most teams evolve from workspaces to separate directories as complexity grows.
CI/CD Integration
Terraform should never be applied from a developer's laptop in production. Integrate it into your CI/CD pipeline with proper controls.
GitHub Actions Workflow
name: Terraform
on:
pull_request:
paths: ['terraform/**']
push:
branches: [main]
paths: ['terraform/**']
jobs:
plan:
runs-on: ubuntu-latest
if: github.event_name == 'pull_request'
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- run: terraform init
working-directory: terraform/
- run: terraform plan -no-color -out=tfplan
working-directory: terraform/
- uses: actions/github-script@v7
with:
script: |
// Post plan output as PR comment
apply:
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
environment: production
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- run: terraform init
working-directory: terraform/
- run: terraform apply -auto-approve
working-directory: terraform/
Key practices:
- Plan on PRs: Every pull request shows the infrastructure diff before merging.
- Apply on merge: Only the main branch triggers apply. No manual applies.
- Environment protection: GitHub's environment protection rules require approval for production applies.
- State locking: DynamoDB locking prevents concurrent applies.
Production Patterns
Tagging Strategy
Tag every resource consistently for cost allocation, ownership tracking, and automation:
locals {
common_tags = {
Environment = var.environment
Project = "myproject"
ManagedBy = "terraform"
Team = var.team
}
}
resource "aws_instance" "app" {
# ...
tags = merge(local.common_tags, {
Name = "app-server-${count.index + 1}"
Role = "application"
})
}
Lifecycle Rules
Prevent Terraform from destroying critical resources:
resource "aws_rds_instance" "main" {
# ...
lifecycle {
prevent_destroy = true
}
}
Data Sources for Existing Infrastructure
Reference resources not managed by your Terraform code:
data "aws_ami" "amazon_linux" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["al2023-ami-*-x86_64"]
}
}
Conclusion
Terraform on AWS is the most marketable IaC skill you can develop. It's the tool that enterprises standardize on, the skill that job postings require, and the technology that the HashiCorp certification ecosystem validates.
The patterns in this guide — remote state, modular architecture, CI/CD integration, and environment management — form the foundation of every production Terraform codebase. Start with a simple configuration, add remote state immediately, extract modules as patterns emerge, and integrate with CI/CD before your team grows beyond one person.
For a hands-on walkthrough of building a complete three-tier AWS architecture with Terraform, check out our three-tier deployment guide. And if you're comparing IaC tools, our Terraform vs. CloudFormation vs. Pulumi comparison will help you make an informed choice.
Want to practice this hands-on?
CloudaQube generates complete labs from a simple description. Try it free.
Get Started Free