Back to BlogInfrastructure as Code

Terraform on AWS: The Complete Infrastructure as Code Guide

Master Terraform for AWS infrastructure management. Covers HCL fundamentals, state management, modules, workspaces, CI/CD integration, and production patterns for scalable IaC.

March 18, 20268 min readBy CloudaQube Team
Terraform workflow managing AWS infrastructure with state and modules

Why Terraform Dominates Infrastructure as Code

Terraform is the number one IaC tool worldwide. The HashiCorp Terraform Associate certification is the fastest-growing credential in the DevOps space. Every major cloud provider, SaaS platform, and infrastructure vendor has a Terraform provider. And the reason is simple: Terraform solved the infrastructure management problem in a way that scales from a single developer to thousands of engineers managing hundreds of environments.

If you're managing AWS infrastructure manually through the console, or writing one-off scripts to create resources, you already know the pain. Configuration drift, inconsistent environments, no audit trail, and the anxiety of "who changed this and when?" Terraform eliminates all of it by treating infrastructure as code — versioned, reviewable, testable, and repeatable.

This guide covers Terraform on AWS from fundamentals to production patterns. Whether you're writing your first main.tf or refactoring a legacy Terraform codebase, you'll find actionable patterns here. For a comparison of how Terraform stacks up against CloudFormation and Pulumi, see our IaC tool comparison.

Terraform Fundamentals

How Terraform Works

Terraform follows a simple lifecycle:

  1. Write: Define infrastructure in HCL (HashiCorp Configuration Language) files.
  2. Plan: Terraform compares your code to the current state and shows what will change.
  3. Apply: Terraform makes the API calls to create, update, or destroy resources.
  4. State: Terraform records what it created in a state file, which it uses for future plans.
terraform init      # Download providers, initialize backend
terraform plan      # Preview changes
terraform apply     # Execute changes
terraform destroy   # Tear down everything

Your First AWS Resources

# Configure the AWS provider
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
  required_version = ">= 1.7"
}

provider "aws" {
  region = "us-east-1"
}

# Create a VPC
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name        = "main-vpc"
    Environment = "production"
  }
}

# Create public and private subnets
resource "aws_subnet" "public" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(aws_vpc.main.cidr_block, 8, count.index)
  availability_zone = data.aws_availability_zones.available.names[count.index]

  map_public_ip_on_launch = true

  tags = {
    Name = "public-${count.index + 1}"
  }
}

resource "aws_subnet" "private" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(aws_vpc.main.cidr_block, 8, count.index + 10)
  availability_zone = data.aws_availability_zones.available.names[count.index]

  tags = {
    Name = "private-${count.index + 1}"
  }
}

data "aws_availability_zones" "available" {
  state = "available"
}

The cidrsubnet Function

cidrsubnet(prefix, newbits, netnum) calculates subnet CIDR blocks automatically. cidrsubnet("10.0.0.0/16", 8, 0) returns 10.0.0.0/24, cidrsubnet("10.0.0.0/16", 8, 1) returns 10.0.1.0/24, etc. This is cleaner than hardcoding CIDR blocks and scales when you add more subnets.

State Management: The Most Critical Decision

Terraform state is a JSON file that maps your HCL resources to real infrastructure. Whoever controls the state file controls the infrastructure. Getting state management right is the single most important architectural decision in your Terraform setup.

Remote State with S3

Never store state locally for shared infrastructure. Use S3 with DynamoDB locking:

terraform {
  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "production/networking/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

Why this matters:

  • S3: Durable storage, versioning for state history, encryption at rest.
  • DynamoDB: Prevents concurrent terraform apply from corrupting state.
  • Key path: Organize state files by environment and component.

State File Organization

Monolithic state (everything in one file) doesn't scale. When a single terraform apply manages your VPC, databases, applications, and DNS, a change to one component plans against everything. Break state into logical components:

terraform/
├── networking/          # VPC, subnets, route tables, NAT gateways
│   ├── main.tf
│   ├── variables.tf
│   └── outputs.tf
├── data/                # RDS, ElastiCache, S3 buckets
│   ├── main.tf
│   └── variables.tf
├── compute/             # ECS, Lambda, Auto Scaling
│   ├── main.tf
│   └── variables.tf
└── dns/                 # Route 53 records
    ├── main.tf
    └── variables.tf

Each directory has its own state file. Changes to DNS don't require planning against the entire VPC. Use terraform_remote_state data sources or outputs to share information between components.

!

Never Edit State Manually

The state file is not meant to be edited by hand. If you need to move resources between state files, rename resources, or import existing infrastructure, use terraform state mv, terraform state rm, and terraform import. Manual edits risk corrupting the state and losing track of infrastructure.

Modules: Reusable Infrastructure Components

Modules are Terraform's mechanism for reuse. Instead of copy-pasting resource blocks, create a module once and instantiate it with different parameters.

Creating a Module

# modules/ecs-service/main.tf
resource "aws_ecs_service" "this" {
  name            = var.name
  cluster         = var.cluster_arn
  task_definition = aws_ecs_task_definition.this.arn
  desired_count   = var.desired_count
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = var.subnet_ids
    security_groups  = [aws_security_group.this.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = var.target_group_arn
    container_name   = var.name
    container_port   = var.container_port
  }
}

resource "aws_ecs_task_definition" "this" {
  family                   = var.name
  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  cpu                      = var.cpu
  memory                   = var.memory
  execution_role_arn       = var.execution_role_arn
  task_role_arn            = var.task_role_arn

  container_definitions = jsonencode([{
    name      = var.name
    image     = var.image
    essential = true
    portMappings = [{
      containerPort = var.container_port
      protocol      = "tcp"
    }]
    logConfiguration = {
      logDriver = "awslogs"
      options = {
        "awslogs-group"         = "/ecs/${var.name}"
        "awslogs-region"        = var.region
        "awslogs-stream-prefix" = "ecs"
      }
    }
  }])
}

Using the Module

module "api_service" {
  source = "./modules/ecs-service"

  name               = "api"
  image              = "123456789.dkr.ecr.us-east-1.amazonaws.com/api:v2.1"
  cluster_arn        = aws_ecs_cluster.main.arn
  subnet_ids         = module.networking.private_subnet_ids
  target_group_arn   = aws_lb_target_group.api.arn
  container_port     = 8000
  cpu                = 512
  memory             = 1024
  desired_count      = 3
  execution_role_arn = aws_iam_role.ecs_execution.arn
  task_role_arn      = aws_iam_role.api_task.arn
  region             = "us-east-1"
}

module "worker_service" {
  source = "./modules/ecs-service"

  name               = "worker"
  image              = "123456789.dkr.ecr.us-east-1.amazonaws.com/worker:v2.1"
  cluster_arn        = aws_ecs_cluster.main.arn
  subnet_ids         = module.networking.private_subnet_ids
  target_group_arn   = aws_lb_target_group.worker.arn
  container_port     = 9000
  cpu                = 256
  memory             = 512
  desired_count      = 2
  execution_role_arn = aws_iam_role.ecs_execution.arn
  task_role_arn      = aws_iam_role.worker_task.arn
  region             = "us-east-1"
}

Variables and Environments

Variable Definition Patterns

# variables.tf
variable "environment" {
  description = "Deployment environment"
  type        = string
  validation {
    condition     = contains(["dev", "staging", "production"], var.environment)
    error_message = "Environment must be dev, staging, or production."
  }
}

variable "instance_config" {
  description = "EC2 instance configuration"
  type = object({
    instance_type = string
    volume_size   = number
    min_count     = number
    max_count     = number
  })
  default = {
    instance_type = "t3.medium"
    volume_size   = 50
    min_count     = 2
    max_count     = 10
  }
}

Environment-Specific Configuration

Use .tfvars files for environment-specific values:

# environments/production.tfvars
environment = "production"
instance_config = {
  instance_type = "m6i.xlarge"
  volume_size   = 100
  min_count     = 3
  max_count     = 20
}

# environments/staging.tfvars
environment = "staging"
instance_config = {
  instance_type = "t3.medium"
  volume_size   = 50
  min_count     = 1
  max_count     = 3
}
terraform plan -var-file="environments/production.tfvars"
terraform apply -var-file="environments/production.tfvars"
i

Workspaces vs. Separate Directories

Terraform workspaces allow multiple state files with the same configuration. They work well for simple environment differences. For production systems with significantly different architectures per environment (e.g., production has multi-AZ RDS, staging has single-AZ), separate directories with shared modules are more maintainable. Most teams evolve from workspaces to separate directories as complexity grows.

CI/CD Integration

Terraform should never be applied from a developer's laptop in production. Integrate it into your CI/CD pipeline with proper controls.

GitHub Actions Workflow

name: Terraform
on:
  pull_request:
    paths: ['terraform/**']
  push:
    branches: [main]
    paths: ['terraform/**']

jobs:
  plan:
    runs-on: ubuntu-latest
    if: github.event_name == 'pull_request'
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init
        working-directory: terraform/
      - run: terraform plan -no-color -out=tfplan
        working-directory: terraform/
      - uses: actions/github-script@v7
        with:
          script: |
            // Post plan output as PR comment

  apply:
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    environment: production
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init
        working-directory: terraform/
      - run: terraform apply -auto-approve
        working-directory: terraform/

Key practices:

  • Plan on PRs: Every pull request shows the infrastructure diff before merging.
  • Apply on merge: Only the main branch triggers apply. No manual applies.
  • Environment protection: GitHub's environment protection rules require approval for production applies.
  • State locking: DynamoDB locking prevents concurrent applies.

Production Patterns

Tagging Strategy

Tag every resource consistently for cost allocation, ownership tracking, and automation:

locals {
  common_tags = {
    Environment = var.environment
    Project     = "myproject"
    ManagedBy   = "terraform"
    Team        = var.team
  }
}

resource "aws_instance" "app" {
  # ...
  tags = merge(local.common_tags, {
    Name = "app-server-${count.index + 1}"
    Role = "application"
  })
}

Lifecycle Rules

Prevent Terraform from destroying critical resources:

resource "aws_rds_instance" "main" {
  # ...
  lifecycle {
    prevent_destroy = true
  }
}

Data Sources for Existing Infrastructure

Reference resources not managed by your Terraform code:

data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["al2023-ami-*-x86_64"]
  }
}

Conclusion

Terraform on AWS is the most marketable IaC skill you can develop. It's the tool that enterprises standardize on, the skill that job postings require, and the technology that the HashiCorp certification ecosystem validates.

The patterns in this guide — remote state, modular architecture, CI/CD integration, and environment management — form the foundation of every production Terraform codebase. Start with a simple configuration, add remote state immediately, extract modules as patterns emerge, and integrate with CI/CD before your team grows beyond one person.

For a hands-on walkthrough of building a complete three-tier AWS architecture with Terraform, check out our three-tier deployment guide. And if you're comparing IaC tools, our Terraform vs. CloudFormation vs. Pulumi comparison will help you make an informed choice.

Want to practice this hands-on?

CloudaQube generates complete labs from a simple description. Try it free.

Get Started Free
Share:
C

CloudaQube Team

Infrastructure Engineering Team

Level up your cloud skills

Get hands-on with AI-generated labs tailored to your skill level. Practice AWS, Azure, Kubernetes, and more.

Start Learning Free