Why Terraform Dominates Infrastructure as Code
Terraform is the number one IaC tool worldwide. The HashiCorp Terraform Associate certification is the fastest-growing credential in the DevOps space. Every major cloud provider, SaaS platform, and infrastructure vendor has a Terraform provider. And the reason is simple: Terraform solved the infrastructure management problem in a way that scales from a single developer to thousands of engineers managing hundreds of environments.
If you're managing AWS infrastructure manually through the console, or writing one-off scripts to create resources, you already know the pain. Configuration drift, inconsistent environments, no audit trail, and the anxiety of "who changed this and when?" Terraform eliminates all of it by treating infrastructure as code — versioned, reviewable, testable, and repeatable.
This guide covers Terraform on AWS from fundamentals to production patterns. Whether you're writing your first main.tf or refactoring a legacy Terraform codebase, you'll find actionable patterns here. For a comparison of how Terraform stacks up against CloudFormation and Pulumi, see our IaC tool comparison.
Terraform Fundamentals
How Terraform Works
Terraform follows a simple lifecycle:
- Write: Define infrastructure in HCL (HashiCorp Configuration Language) files.
- Plan: Terraform compares your code to the current state and shows what will change.
- Apply: Terraform makes the API calls to create, update, or destroy resources.
- State: Terraform records what it created in a state file, which it uses for future plans.
terraform init # Download providers, initialize backend
terraform plan # Preview changes
terraform apply # Execute changes
terraform destroy # Tear down everything
Your First AWS Resources
# Configure the AWS provider
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
required_version = ">= 1.7"
}
provider "aws" {
region = "us-east-1"
}
# Create a VPC
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "main-vpc"
Environment = "production"
}
}
# Create public and private subnets
resource "aws_subnet" "public" {
count = 2
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(aws_vpc.main.cidr_block, 8, count.index)
availability_zone = data.aws_availability_zones.available.names[count.index]
map_public_ip_on_launch = true
tags = {
Name = "public-${count.index + 1}"
}
}
resource "aws_subnet" "private" {
count = 2
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(aws_vpc.main.cidr_block, 8, count.index + 10)
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = {
Name = "private-${count.index + 1}"
}
}
data "aws_availability_zones" "available" {
state = "available"
}
The cidrsubnet Function
cidrsubnet(prefix, newbits, netnum) calculates subnet CIDR blocks automatically. cidrsubnet("10.0.0.0/16", 8, 0) returns 10.0.0.0/24, cidrsubnet("10.0.0.0/16", 8, 1) returns 10.0.1.0/24, etc. This is cleaner than hardcoding CIDR blocks and scales when you add more subnets.
State Management: The Most Critical Decision
Terraform state is a JSON file that maps your HCL resources to real infrastructure. Whoever controls the state file controls the infrastructure. Getting state management right is the single most important architectural decision in your Terraform setup.
Remote State with S3
Never store state locally for shared infrastructure. Use S3 with DynamoDB locking:
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "production/networking/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
Why this matters:
- S3: Durable storage, versioning for state history, encryption at rest.
- DynamoDB: Prevents concurrent
terraform applyfrom corrupting state. - Key path: Organize state files by environment and component.
State File Organization
Monolithic state (everything in one file) doesn't scale. When a single terraform apply manages your VPC, databases, applications, and DNS, a change to one component plans against everything. Break state into logical components:
terraform/
├── networking/ # VPC, subnets, route tables, NAT gateways
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
├── data/ # RDS, ElastiCache, S3 buckets
│ ├── main.tf
│ └── variables.tf
├── compute/ # ECS, Lambda, Auto Scaling
│ ├── main.tf
│ └── variables.tf
└── dns/ # Route 53 records
├── main.tf
└── variables.tf
Each directory has its own state file. Changes to DNS don't require planning against the entire VPC. Use terraform_remote_state data sources or outputs to share information between components.
Never Edit State Manually
The state file is not meant to be edited by hand. If you need to move resources between state files, rename resources, or import existing infrastructure, use terraform state mv, terraform state rm, and terraform import. Manual edits risk corrupting the state and losing track of infrastructure.
Modules: Reusable Infrastructure Components
Modules are Terraform's mechanism for reuse. Instead of copy-pasting resource blocks, create a module once and instantiate it with different parameters.
Creating a Module
# modules/ecs-service/main.tf
resource "aws_ecs_service" "this" {
name = var.name
cluster = var.cluster_arn
task_definition = aws_ecs_task_definition.this.arn
desired_count = var.desired_count
launch_type = "FARGATE"
network_configuration {
subnets = var.subnet_ids
security_groups = [aws_security_group.this.id]
assign_public_ip = false
}
load_balancer {
target_group_arn = var.target_group_arn
container_name = var.name
container_port = var.container_port
}
}
resource "aws_ecs_task_definition" "this" {
family = var.name
requires_compatibilities = ["FARGATE"]
network_mode = "awsvpc"
cpu = var.cpu
memory = var.memory
execution_role_arn = var.execution_role_arn
task_role_arn = var.task_role_arn
container_definitions = jsonencode([{
name = var.name
image = var.image
essential = true
portMappings = [{
containerPort = var.container_port
protocol = "tcp"
}]
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = "/ecs/${var.name}"
"awslogs-region" = var.region
"awslogs-stream-prefix" = "ecs"
}
}
}])
}
Using the Module
module "api_service" {
source = "./modules/ecs-service"
name = "api"
image = "123456789.dkr.ecr.us-east-1.amazonaws.com/api:v2.1"
cluster_arn = aws_ecs_cluster.main.arn
subnet_ids = module.networking.private_subnet_ids
target_group_arn = aws_lb_target_group.api.arn
container_port = 8000
cpu = 512
memory = 1024
desired_count = 3
execution_role_arn = aws_iam_role.ecs_execution.arn
task_role_arn = aws_iam_role.api_task.arn
region = "us-east-1"
}
module "worker_service" {
source = "./modules/ecs-service"
name = "worker"
image = "123456789.dkr.ecr.us-east-1.amazonaws.com/worker:v2.1"
cluster_arn = aws_ecs_cluster.main.arn
subnet_ids = module.networking.private_subnet_ids
target_group_arn = aws_lb_target_group.worker.arn
container_port = 9000
cpu = 256
memory = 512
desired_count = 2
execution_role_arn = aws_iam_role.ecs_execution.arn
task_role_arn = aws_iam_role.worker_task.arn
region = "us-east-1"
}
Variables and Environments
Variable Definition Patterns
# variables.tf
variable "environment" {
description = "Deployment environment"
type = string
validation {
condition = contains(["dev", "staging", "production"], var.environment)
error_message = "Environment must be dev, staging, or production."
}
}
variable "instance_config" {
description = "EC2 instance configuration"
type = object({
instance_type = string
volume_size = number
min_count = number
max_count = number
})
default = {
instance_type = "t3.medium"
volume_size = 50
min_count = 2
max_count = 10
}
}
Environment-Specific Configuration
Use .tfvars files for environment-specific values:
# environments/production.tfvars
environment = "production"
instance_config = {
instance_type = "m6i.xlarge"
volume_size = 100
min_count = 3
max_count = 20
}
# environments/staging.tfvars
environment = "staging"
instance_config = {
instance_type = "t3.medium"
volume_size = 50
min_count = 1
max_count = 3
}
terraform plan -var-file="environments/production.tfvars"
terraform apply -var-file="environments/production.tfvars"
Workspaces vs. Separate Directories
Terraform workspaces allow multiple state files with the same configuration. They work well for simple environment differences. For production systems with significantly different architectures per environment (e.g., production has multi-AZ RDS, staging has single-AZ), separate directories with shared modules are more maintainable. Most teams evolve from workspaces to separate directories as complexity grows.
CI/CD Integration
Terraform should never be applied from a developer's laptop in production. Integrate it into your CI/CD pipeline with proper controls.
GitHub Actions Workflow
name: Terraform
on:
pull_request:
paths: ['terraform/**']
push:
branches: [main]
paths: ['terraform/**']
jobs:
plan:
runs-on: ubuntu-latest
if: github.event_name == 'pull_request'
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- run: terraform init
working-directory: terraform/
- run: terraform plan -no-color -out=tfplan
working-directory: terraform/
- uses: actions/github-script@v7
with:
script: |
// Post plan output as PR comment
apply:
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
environment: production
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- run: terraform init
working-directory: terraform/
- run: terraform apply -auto-approve
working-directory: terraform/
Key practices:
- Plan on PRs: Every pull request shows the infrastructure diff before merging.
- Apply on merge: Only the main branch triggers apply. No manual applies.
- Environment protection: GitHub's environment protection rules require approval for production applies.
- State locking: DynamoDB locking prevents concurrent applies.
Production Patterns
Tagging Strategy
Tag every resource consistently for cost allocation, ownership tracking, and automation:
locals {
common_tags = {
Environment = var.environment
Project = "myproject"
ManagedBy = "terraform"
Team = var.team
}
}
resource "aws_instance" "app" {
# ...
tags = merge(local.common_tags, {
Name = "app-server-${count.index + 1}"
Role = "application"
})
}
Lifecycle Rules
Prevent Terraform from destroying critical resources:
resource "aws_rds_instance" "main" {
# ...
lifecycle {
prevent_destroy = true
}
}
Data Sources for Existing Infrastructure
Reference resources not managed by your Terraform code:
data "aws_ami" "amazon_linux" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["al2023-ami-*-x86_64"]
}
}
Importing Existing Infrastructure
Most teams adopting Terraform don't start from scratch. They have existing AWS resources — VPCs, EC2 instances, RDS databases — that were created manually or by another tool. Terraform's import capabilities let you bring these under management without destroying and recreating them.
Terraform Import
# Import an existing S3 bucket
terraform import aws_s3_bucket.legacy-data my-existing-bucket-name
# Import an existing security group
terraform import aws_security_group.app sg-0123456789abcdef0
Before importing, write the resource block in your Terraform code first. The import command populates the state file with the resource's current settings — it does not generate code for you.
Incremental Import Strategy
Don't try to import everything at once. A safer approach:
- Start with foundational resources (VPCs, subnets) that others depend on.
- Import one resource at a time, run
terraform planafter each to confirm no unexpected changes are planned. - Address any drift (differences between what's in state and what Terraform would create by default) before moving on.
- Once imported, add
lifecycle { prevent_destroy = true }to critical resources until you're confident in your configuration.
Terraformer: Auto-Generate Code
Terraformer is an open-source tool that reverse-engineers existing infrastructure into Terraform code. It's not perfect — generated code often needs manual cleanup — but it dramatically speeds up the import process for large existing environments.
# Generate Terraform code for all VPCs in us-east-1
terraformer import aws --resources=vpc --regions=us-east-1
# Generate for specific services
terraformer import aws --resources=sg,subnet,route_table --regions=us-east-1
Use Terraformer to generate a starting point, clean up the generated code, then use terraform import to officially bring resources under state management.
Conclusion
Terraform on AWS is the most marketable IaC skill you can develop. It's the tool that enterprises standardize on, the skill that job postings require, and the technology that the HashiCorp certification ecosystem validates.
The patterns in this guide — remote state, modular architecture, CI/CD integration, and environment management — form the foundation of every production Terraform codebase. Start with a simple configuration, add remote state immediately, extract modules as patterns emerge, and integrate with CI/CD before your team grows beyond one person.
For a hands-on walkthrough of building a complete three-tier AWS architecture with Terraform, check out our three-tier deployment guide. And if you're comparing IaC tools, our Terraform vs. CloudFormation vs. Pulumi comparison will help you make an informed choice.
Want to practice this hands-on?
CloudaQube generates complete labs from a simple description. Try it free.
Get Started Free