Infrastructure as Code (IaC)¶
Introduction and Evolution¶
Infrastructure as Code (IaC) represents a fundamental paradigm shift in how we provision, manage, and maintain computing infrastructure. Traditional manual configuration leads to "snowflake" environments—unique, hard-to-reproduce setups prone to errors and drift. IaC solves this by codifying everything, enabling rapid provisioning (minutes vs. days), easy replication (e.g., duplicate environments for new branches), and quick recovery via rollbacks.
Historical Context¶
The evolution of infrastructure management has progressed through several distinct phases:
Phase 1: Manual Configuration (Pre-2000s) System administrators manually configured each server through interactive sessions. This approach was:
- Time-consuming and error-prone
- Impossible to reproduce consistently
- Dependent on tribal knowledge and documentation that quickly became outdated
- Resulted in "snowflake servers" where each machine was unique
Phase 2: Script-Based Automation (2000s) Shell scripts and batch files began automating repetitive tasks:
#!/bin/bash
# Early automation example
apt-get update
apt-get install -y nginx
cp /path/to/config /etc/nginx/nginx.conf
systemctl start nginx
systemctl enable nginx
Limitations included:
- Scripts were often not idempotent (running twice could cause issues)
- No state tracking—scripts didn't know what was already done
- Poor error handling and recovery
- Environment-specific hardcoding
Phase 3: Configuration Management Tools (2005-2010) Tools like Puppet (2005), Chef (2009), and later Ansible (2012) introduced:
- Declarative or semi-declarative syntax
- Idempotent operations
- Centralized management
- Resource abstraction
Phase 4: Cloud-Native IaC (2010-Present) The cloud era brought tools designed for provisioning entire infrastructures:
- AWS CloudFormation (2011): First major cloud-native IaC tool
- Terraform (2014): Multi-cloud, provider-agnostic approach
- Pulumi (2018): Real programming languages for infrastructure
- Crossplane (2018): Kubernetes-native infrastructure management
The Problem IaC Solves¶
Consider a typical pre-IaC scenario:
- Developer requests a new environment for testing
- Operations receives ticket, waits in queue (days)
- Manual setup through cloud console (hours, error-prone)
- Documentation updated (often incomplete or forgotten)
- Drift occurs as ad-hoc changes accumulate
- Environment becomes irreproducible—nobody knows exact state
- Disaster recovery requires heroic manual effort
With IaC:
- Developer clones infrastructure code
- Modifies parameters for new environment
- Runs
terraform applyor equivalent - Infrastructure provisions in minutes, identically to production
- Changes tracked in version control
- Recovery is simply re-running the code
Core Principles of IaC¶
IaC follows fundamental principles that distinguish it from ad-hoc automation:
1. Idempotence¶
Definition: An operation is idempotent if applying it multiple times produces the same result as applying it once.
# Idempotent: Running 10 times = running once
desired_state: server exists with 4GB RAM
# NOT idempotent: Running 10 times ≠ running once
action: create a server with 4GB RAM # Creates 10 servers!
Why It Matters:
- Safe to retry failed operations
- Convergence to desired state regardless of current state
- Enables automated remediation and drift correction
Implementation Strategies:
# Non-idempotent approach
def create_user(username):
run_command(f"useradd {username}") # Fails if user exists
# Idempotent approach
def ensure_user(username):
if not user_exists(username):
run_command(f"useradd {username}")
# If user exists, do nothing - same end state
Most IaC tools achieve idempotence through:
- State comparison: Compare desired vs. current state
- Resource identification: Use unique identifiers to track resources
- Conditional execution: Only perform actions when needed
2. Version Control¶
All infrastructure code belongs in version control (Git), enabling:
Change Tracking:
git log --oneline infrastructure/
a1b2c3d Add auto-scaling to web tier
d4e5f6g Increase RDS instance size for production
g7h8i9j Initial VPC and networking setup
Code Review for Infrastructure:
# Pull request shows exactly what changes
- instance_type: "t3.medium"
+ instance_type: "t3.large" # Reviewer can assess impact
Branching Strategies:
main (production) ──────────────────────────────────────►
│
└── feature/add-cache ──► PR ──► merge
│
└── hotfix/security-patch ──► emergency merge
Audit Trail: Every change has author, timestamp, and reason (commit message).
3. Declarative Over Imperative¶
Declarative ("what"): Define the desired end state; the tool figures out how.
# Terraform (declarative)
resource "aws_instance" "web" {
count = 3
instance_type = "t3.micro"
}
# "I want 3 t3.micro instances to exist"
# Terraform handles: create new, modify existing, or delete excess
Imperative ("how"): Specify step-by-step instructions.
# Ansible (imperative)
- name: Create EC2 instances
ec2_instance:
state: present
instance_type: t3.micro
loop: "{{ range(3) | list }}"
# "Execute these steps to create instances"
Why Declarative Dominates:
| Aspect | Declarative | Imperative |
|---|---|---|
| Complexity | Tool handles orchestration | You manage order/dependencies |
| Idempotence | Built-in | Must be coded |
| Drift correction | Automatic convergence | Manual scripting needed |
| Learning curve | Define "what", not "how" | Need procedural knowledge |
| Flexibility | Less (constrained by tool) | More (full control) |
4. Immutable Infrastructure¶
Traditional (mutable): Update servers in place.
Server v1 ──patch──► Server v1.1 ──config──► Server v1.1a ──hotfix──► ???
(drift accumulates, state unknown)
Immutable: Replace servers entirely.
Server v1 (discard) ──► Server v2 (fresh) ──► Server v3 (fresh)
(known state) (known state)
Benefits of Immutability:
- No configuration drift
- Consistent, tested images
- Easy rollback (switch to previous version)
- Better security (no accumulated patches)
Implementation: Build machine images (AMI, Docker) with all configurations baked in. Deploy new instances; destroy old ones.
5. Modularity and Reusability¶
Break infrastructure into composable modules:
infrastructure/
├── modules/
│ ├── networking/ # VPC, subnets, gateways
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── compute/ # EC2, auto-scaling
│ ├── database/ # RDS, replicas
│ └── security/ # IAM, security groups
├── environments/
│ ├── dev/
│ │ └── main.tf # Uses modules with dev params
│ ├── staging/
│ └── production/
Module Contract:
# modules/networking/variables.tf (inputs)
variable "vpc_cidr" {
description = "CIDR block for VPC"
type = string
}
variable "environment" {
description = "Environment name"
type = string
}
# modules/networking/outputs.tf (outputs)
output "vpc_id" {
description = "ID of created VPC"
value = aws_vpc.main.id
}
output "subnet_ids" {
description = "IDs of created subnets"
value = aws_subnet.main[*].id
}
6. Self-Documenting Infrastructure¶
The code IS the documentation:
# This IS the production infrastructure specification
# Not a wiki page that might be outdated
resource "aws_rds_instance" "production" {
identifier = "prod-primary-db"
engine = "postgres"
engine_version = "15.4"
instance_class = "db.r6g.xlarge"
allocated_storage = 500
multi_az = true # High availability enabled
backup_retention_period = 30 # 30 days of backups
tags = {
Environment = "production"
Owner = "platform-team"
CostCenter = "infrastructure"
}
}
Declarative vs. Imperative: Deep Dive¶
Understanding this distinction is crucial for choosing the right tool.
Declarative Model¶
How It Works:
- User defines desired state in configuration
- Tool reads current state (from cloud API or state file)
- Tool computes difference (plan)
- Tool executes minimal changes to reach desired state
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Desired State │ │ Current State │ │ Plan │
│ (config file) │───►│ (API/state) │───►│ (diff) │
└─────────────────┘ └─────────────────┘ └────────┬────────┘
│
▼
┌─────────────────┐
│ Execute │
│ (apply diff) │
└─────────────────┘
Terraform Example:
# Desired: 3 instances in us-west-2
resource "aws_instance" "web" {
count = 3
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.micro"
tags = {
Name = "web-${count.index}"
}
}
If current state has 2 instances → Terraform creates 1 more. If current state has 5 instances → Terraform destroys 2. If current state has 3 correct instances → Terraform does nothing.
Imperative Model¶
How It Works:
- User defines sequence of operations
- Tool executes operations in order
- User must handle conditionals and state checking
# Ansible: Procedural steps
- name: Install packages
apt:
name: "{{ item }}"
state: present
loop:
- nginx
- certbot
- name: Copy configuration
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
notify: Reload nginx
- name: Ensure service running
service:
name: nginx
state: started
enabled: yes
Hybrid Approaches¶
Many tools blend both paradigms:
Ansible (primarily imperative with declarative modules):
# Declarative module usage within imperative playbook
- name: Ensure EC2 instance exists
amazon.aws.ec2_instance:
state: present # Declarative: desired state
name: "my-instance"
instance_type: t3.micro
image_id: ami-12345678
Pulumi (declarative intent with programming constructs):
// Declarative resource definition with imperative logic
const instances = [];
for (let i = 0; i < config.getNumber("instanceCount") || 3; i++) {
instances.push(new aws.ec2.Instance(`web-${i}`, {
instanceType: "t3.micro",
ami: ami.id,
}));
}
When to Use Each¶
| Scenario | Recommended Approach |
|---|---|
| Cloud infrastructure provisioning | Declarative (Terraform, CloudFormation) |
| Server configuration | Imperative (Ansible) or Declarative (Puppet) |
| Complex orchestration workflows | Imperative (Ansible) |
| Kubernetes applications | Declarative (Helm, Kustomize) |
| One-time migrations | Imperative scripts |
| Continuous state enforcement | Declarative |
IaC Tool Landscape¶
Categorization by Purpose¶
┌─────────────────────────────────────────────────────────────────────────┐
│ Infrastructure as Code Tools │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────┐ ┌─────────────────────────────────┐ │
│ │ PROVISIONING │ │ CONFIGURATION MANAGEMENT │ │
│ │ (Create infrastructure) │ │ (Configure systems) │ │
│ │ │ │ │ │
│ │ • Terraform / OpenTofu │ │ • Ansible │ │
│ │ • Pulumi │ │ • Puppet │ │
│ │ • AWS CloudFormation │ │ • Chef │ │
│ │ • Azure ARM / Bicep │ │ • SaltStack │ │
│ │ • Google Cloud DM │ │ │ │
│ │ • Crossplane │ │ │ │
│ └─────────────────────────────┘ └─────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────┐ ┌─────────────────────────────────┐ │
│ │ KUBERNETES-NATIVE │ │ POLICY & COMPLIANCE │ │
│ │ (K8s workloads) │ │ (Governance) │ │
│ │ │ │ │ │
│ │ • Helm │ │ • Open Policy Agent (OPA) │ │
│ │ • Kustomize │ │ • HashiCorp Sentinel │ │
│ │ • Crossplane │ │ • Checkov │ │
│ │ • ArgoCD / Flux │ │ • tfsec / Trivy │ │
│ └─────────────────────────────┘ └─────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Quick Comparison Matrix¶
| Tool | Type | Language | State | Multi-Cloud | Best For |
|---|---|---|---|---|---|
| Terraform | Declarative | HCL | External | Yes | General provisioning |
| OpenTofu | Declarative | HCL | External | Yes | Open-source Terraform |
| Pulumi | Declarative | TS/Python/Go | External | Yes | Developer-centric IaC |
| CloudFormation | Declarative | YAML/JSON | AWS-managed | AWS only | AWS-native shops |
| ARM/Bicep | Declarative | JSON/Bicep | Azure-managed | Azure only | Azure-native shops |
| Ansible | Imperative | YAML | Stateless | Yes | Configuration mgmt |
| Helm | Declarative | YAML+Go tmpl | K8s secrets | K8s only | K8s app packaging |
| Crossplane | Declarative | YAML (K8s) | K8s | Yes | K8s-native infra |
Terraform¶
Terraform, developed by HashiCorp (with OpenTofu as its open-source fork following licensing changes to BUSL), is the leading declarative IaC tool. It embodies IaC principles by defining infrastructure in HashiCorp Configuration Language (HCL), which is human-readable and versionable.
Architecture Overview¶
┌─────────────────────────────────────────────────────────────────────────┐
│ Terraform Architecture │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ │
│ │ Configuration │ .tf files (HCL) │
│ │ Files │ Define desired state │
│ └────────┬─────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Terraform Core │────►│ State File │ │
│ │ │ │ (.tfstate) │ │
│ │ • Parse config │ │ │ │
│ │ • Build graph │ │ Maps config to │ │
│ │ • Plan changes │ │ real resources │ │
│ │ • Apply changes │ │ │ │
│ └────────┬─────────┘ └──────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ Providers │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ AWS │ │ Azure │ │ GCP │ │ Custom │ ... │ │
│ │ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ │
│ └───────┼────────────┼────────────┼────────────┼───────────────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Cloud Provider APIs │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
HCL Language Deep Dive¶
HCL (HashiCorp Configuration Language) is designed specifically for infrastructure definition.
Basic Syntax:
# Block types: resource, data, variable, output, locals, module, provider
# Provider configuration
provider "aws" {
region = "us-west-2"
default_tags {
tags = {
ManagedBy = "Terraform"
}
}
}
# Resource definition
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.micro"
tags = {
Name = "WebServer"
}
}
Variables and Types:
# variables.tf - Input variables
variable "environment" {
description = "Deployment environment"
type = string
default = "development"
validation {
condition = contains(["development", "staging", "production"], var.environment)
error_message = "Environment must be development, staging, or production."
}
}
variable "instance_config" {
description = "Instance configuration"
type = object({
instance_type = string
volume_size = number
enable_monitoring = bool
})
default = {
instance_type = "t3.micro"
volume_size = 20
enable_monitoring = false
}
}
variable "allowed_cidrs" {
description = "List of allowed CIDR blocks"
type = list(string)
default = ["10.0.0.0/8"]
}
variable "tags" {
description = "Resource tags"
type = map(string)
default = {}
}
Local Values:
locals {
# Computed values used throughout configuration
name_prefix = "${var.project}-${var.environment}"
common_tags = merge(var.tags, {
Environment = var.environment
Project = var.project
ManagedBy = "Terraform"
})
# Conditional logic
instance_type = var.environment == "production" ? "t3.large" : "t3.micro"
}
Data Sources (read existing resources):
# Query existing resources
data "aws_ami" "ubuntu" {
most_recent = true
owners = ["099720109477"] # Canonical
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
}
}
data "aws_vpc" "existing" {
tags = {
Name = "main-vpc"
}
}
# Use in resources
resource "aws_instance" "web" {
ami = data.aws_ami.ubuntu.id
subnet_id = data.aws_vpc.existing.main_route_table_id
instance_type = "t3.micro"
}
Outputs:
output "instance_ip" {
description = "Public IP of the instance"
value = aws_instance.web.public_ip
}
output "instance_details" {
description = "Full instance details"
value = {
id = aws_instance.web.id
public_ip = aws_instance.web.public_ip
private_ip = aws_instance.web.private_ip
}
sensitive = false
}
Control Flow and Expressions¶
Count (create multiple similar resources):
variable "instance_count" {
default = 3
}
resource "aws_instance" "web" {
count = var.instance_count
ami = data.aws_ami.ubuntu.id
instance_type = "t3.micro"
tags = {
Name = "web-${count.index}" # web-0, web-1, web-2
}
}
# Reference: aws_instance.web[0], aws_instance.web[1], etc.
# All instances: aws_instance.web[*].public_ip
For_each (create resources from a map/set):
variable "instances" {
default = {
web = {
instance_type = "t3.micro"
az = "us-west-2a"
}
api = {
instance_type = "t3.small"
az = "us-west-2b"
}
worker = {
instance_type = "t3.medium"
az = "us-west-2c"
}
}
}
resource "aws_instance" "servers" {
for_each = var.instances
ami = data.aws_ami.ubuntu.id
instance_type = each.value.instance_type
availability_zone = each.value.az
tags = {
Name = each.key # web, api, worker
}
}
# Reference: aws_instance.servers["web"], aws_instance.servers["api"]
Dynamic Blocks (generate nested blocks):
variable "ingress_rules" {
default = [
{ port = 80, cidr = "0.0.0.0/0", description = "HTTP" },
{ port = 443, cidr = "0.0.0.0/0", description = "HTTPS" },
{ port = 22, cidr = "10.0.0.0/8", description = "SSH internal" },
]
}
resource "aws_security_group" "web" {
name = "web-sg"
dynamic "ingress" {
for_each = var.ingress_rules
content {
from_port = ingress.value.port
to_port = ingress.value.port
protocol = "tcp"
cidr_blocks = [ingress.value.cidr]
description = ingress.value.description
}
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
Conditional Expressions:
# Ternary operator
resource "aws_instance" "web" {
instance_type = var.environment == "production" ? "t3.large" : "t3.micro"
# Conditional resource creation
count = var.create_instance ? 1 : 0
}
# Conditional in for_each
resource "aws_eip" "web" {
for_each = var.environment == "production" ? toset(["primary", "secondary"]) : toset([])
instance = aws_instance.web[0].id
}
State Management Deep Dive¶
State is Terraform's mechanism for mapping configuration to real-world resources.
State File Structure (.tfstate):
{
"version": 4,
"terraform_version": "1.6.0",
"serial": 42,
"lineage": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"outputs": {
"instance_ip": {
"value": "54.123.45.67",
"type": "string"
}
},
"resources": [
{
"mode": "managed",
"type": "aws_instance",
"name": "web",
"provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
"instances": [
{
"schema_version": 1,
"attributes": {
"id": "i-0123456789abcdef0",
"ami": "ami-0c55b159cbfafe1f0",
"instance_type": "t3.micro",
"public_ip": "54.123.45.67",
"private_ip": "10.0.1.50",
"tags": {
"Name": "WebServer"
}
// ... many more attributes
}
}
]
}
]
}
Remote State Backends:
# S3 backend with DynamoDB locking (recommended for AWS)
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "prod/infrastructure.tfstate"
region = "us-west-2"
encrypt = true
dynamodb_table = "terraform-locks" # For state locking
}
}
# Azure Blob Storage
terraform {
backend "azurerm" {
resource_group_name = "terraform-state-rg"
storage_account_name = "tfstate12345"
container_name = "tfstate"
key = "prod.terraform.tfstate"
}
}
# Google Cloud Storage
terraform {
backend "gcs" {
bucket = "my-terraform-state"
prefix = "terraform/state"
}
}
# HCP Terraform (formerly Terraform Cloud)
terraform {
cloud {
organization = "my-org"
workspaces {
name = "my-workspace"
}
}
}
State Operations:
# List resources in state
terraform state list
# Show specific resource
terraform state show aws_instance.web
# Move resource (rename or move to module)
terraform state mv aws_instance.web aws_instance.webserver
# Remove resource from state (doesn't destroy actual resource)
terraform state rm aws_instance.web
# Import existing resource into state
terraform import aws_instance.web i-0123456789abcdef0
# Pull remote state locally
terraform state pull > backup.tfstate
# Push local state to remote
terraform state push backup.tfstate
# Force unlock state (use carefully)
terraform force-unlock LOCK_ID
State Locking:
┌─────────────┐ ┌─────────────────────┐ ┌─────────────────┐
│ User A │────►│ DynamoDB Lock │◄────│ User B │
│ terraform │ │ Table │ │ terraform │
│ apply │ │ │ │ apply │
└─────────────┘ │ Lock: User A │ └─────────────────┘
│ ID: abc123 │ │
│ Created: 10:00 │ │
└─────────────────────┘ │
▼
"Error: state locked"
Modules¶
Modules are reusable packages of Terraform configuration.
Module Structure:
modules/
└── vpc/
├── main.tf # Primary resources
├── variables.tf # Input variables
├── outputs.tf # Output values
├── versions.tf # Provider/Terraform version constraints
├── README.md # Documentation
└── examples/ # Usage examples
└── complete/
└── main.tf
Module Definition (modules/vpc/main.tf):
# modules/vpc/main.tf
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = merge(var.tags, {
Name = "${var.name}-vpc"
})
}
resource "aws_subnet" "public" {
count = length(var.public_subnet_cidrs)
vpc_id = aws_vpc.main.id
cidr_block = var.public_subnet_cidrs[count.index]
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = merge(var.tags, {
Name = "${var.name}-public-${count.index + 1}"
Tier = "Public"
})
}
resource "aws_subnet" "private" {
count = length(var.private_subnet_cidrs)
vpc_id = aws_vpc.main.id
cidr_block = var.private_subnet_cidrs[count.index]
availability_zone = var.availability_zones[count.index]
tags = merge(var.tags, {
Name = "${var.name}-private-${count.index + 1}"
Tier = "Private"
})
}
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = merge(var.tags, {
Name = "${var.name}-igw"
})
}
resource "aws_nat_gateway" "main" {
count = var.enable_nat_gateway ? length(var.public_subnet_cidrs) : 0
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index].id
tags = merge(var.tags, {
Name = "${var.name}-nat-${count.index + 1}"
})
}
resource "aws_eip" "nat" {
count = var.enable_nat_gateway ? length(var.public_subnet_cidrs) : 0
domain = "vpc"
tags = merge(var.tags, {
Name = "${var.name}-nat-eip-${count.index + 1}"
})
}
Module Variables (modules/vpc/variables.tf):
variable "name" {
description = "Name prefix for resources"
type = string
}
variable "vpc_cidr" {
description = "CIDR block for VPC"
type = string
default = "10.0.0.0/16"
}
variable "public_subnet_cidrs" {
description = "CIDR blocks for public subnets"
type = list(string)
default = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
}
variable "private_subnet_cidrs" {
description = "CIDR blocks for private subnets"
type = list(string)
default = ["10.0.11.0/24", "10.0.12.0/24", "10.0.13.0/24"]
}
variable "availability_zones" {
description = "Availability zones"
type = list(string)
}
variable "enable_nat_gateway" {
description = "Enable NAT Gateway for private subnets"
type = bool
default = true
}
variable "tags" {
description = "Tags to apply to resources"
type = map(string)
default = {}
}
Module Outputs (modules/vpc/outputs.tf):
output "vpc_id" {
description = "ID of the VPC"
value = aws_vpc.main.id
}
output "vpc_cidr" {
description = "CIDR block of the VPC"
value = aws_vpc.main.cidr_block
}
output "public_subnet_ids" {
description = "IDs of public subnets"
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
description = "IDs of private subnets"
value = aws_subnet.private[*].id
}
output "nat_gateway_ids" {
description = "IDs of NAT Gateways"
value = aws_nat_gateway.main[*].id
}
Using Modules:
# Local module
module "vpc" {
source = "./modules/vpc"
name = "production"
vpc_cidr = "10.0.0.0/16"
availability_zones = ["us-west-2a", "us-west-2b", "us-west-2c"]
enable_nat_gateway = true
tags = {
Environment = "production"
}
}
# Public registry module
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.0.0"
name = "my-vpc"
cidr = "10.0.0.0/16"
azs = ["us-west-2a", "us-west-2b", "us-west-2c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
enable_nat_gateway = true
}
# Git repository module
module "vpc" {
source = "git::https://github.com/org/terraform-modules.git//vpc?ref=v1.2.0"
# ...
}
# Use module outputs
resource "aws_instance" "web" {
subnet_id = module.vpc.public_subnet_ids[0]
# ...
}
Terraform Workflow¶
Complete Workflow:
# 1. Initialize working directory
terraform init
# Downloads providers, modules, configures backend
# 2. Format code
terraform fmt -recursive
# Rewrites files to canonical format
# 3. Validate configuration
terraform validate
# Checks syntax and internal consistency
# 4. Plan changes
terraform plan -out=tfplan
# Shows what will change, saves plan file
# 5. Review plan output carefully!
# + create, - destroy, ~ update, -/+ replace
# 6. Apply changes
terraform apply tfplan
# Executes the saved plan
# Alternative: plan and apply in one (prompts for confirmation)
terraform apply
# 7. Destroy (when needed)
terraform destroy
# Removes all managed resources
Plan Output Interpretation:
Terraform will perform the following actions:
# aws_instance.web will be created
+ resource "aws_instance" "web" {
+ ami = "ami-0c55b159cbfafe1f0"
+ instance_type = "t3.micro"
+ id = (known after apply)
+ public_ip = (known after apply)
}
# aws_instance.api will be updated in-place
~ resource "aws_instance" "api" {
id = "i-0123456789abcdef0"
~ instance_type = "t3.micro" -> "t3.small"
}
# aws_instance.worker must be replaced
-/+ resource "aws_instance" "worker" {
~ ami = "ami-old123" -> "ami-new456" # forces replacement
~ id = "i-0987654321fedcba0" -> (known after apply)
}
# aws_instance.deprecated will be destroyed
- resource "aws_instance" "deprecated" {
- id = "i-todelete123"
- instance_type = "t2.micro"
}
Plan: 1 to add, 1 to change, 2 to destroy.
Lifecycle Management¶
Control how Terraform manages resources:
resource "aws_instance" "web" {
ami = data.aws_ami.ubuntu.id
instance_type = "t3.micro"
lifecycle {
# Create new before destroying old (zero-downtime updates)
create_before_destroy = true
# Prevent accidental destruction
prevent_destroy = true
# Ignore changes to specific attributes (avoid drift detection)
ignore_changes = [
tags["LastModified"],
user_data,
]
# Custom replacement triggers
replace_triggered_by = [
aws_ami.ubuntu.id
]
}
}
# Preconditions and postconditions
resource "aws_instance" "web" {
instance_type = var.instance_type
lifecycle {
precondition {
condition = contains(["t3.micro", "t3.small", "t3.medium"], var.instance_type)
error_message = "Instance type must be t3.micro, t3.small, or t3.medium."
}
postcondition {
condition = self.public_ip != ""
error_message = "Instance must have a public IP address."
}
}
}
Terraform Best Practices¶
Project Structure:
terraform-infrastructure/
├── modules/ # Reusable modules
│ ├── networking/
│ ├── compute/
│ ├── database/
│ └── security/
├── environments/ # Environment-specific configs
│ ├── dev/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── terraform.tfvars
│ │ └── backend.tf
│ ├── staging/
│ └── production/
├── .gitignore
├── .terraform-version # tfenv version file
└── README.md
.gitignore:
# Local .terraform directories
**/.terraform/*
# .tfstate files
*.tfstate
*.tfstate.*
# Crash log files
crash.log
crash.*.log
# Exclude all .tfvars files, which may contain sensitive data
*.tfvars
*.tfvars.json
# Ignore override files
override.tf
override.tf.json
*_override.tf
*_override.tf.json
# Ignore CLI config files
.terraformrc
terraform.rc
# Ignore lock file for module development
# .terraform.lock.hcl # Usually commit this for consistent provider versions
Naming Conventions:
# Resources: lowercase with underscores
resource "aws_instance" "web_server" { }
resource "aws_security_group" "allow_https" { }
# Variables: lowercase with underscores
variable "instance_type" { }
variable "enable_monitoring" { }
# Outputs: lowercase with underscores
output "instance_public_ip" { }
# Modules: lowercase with hyphens (directory names)
module "web-cluster" {
source = "./modules/web-cluster"
}
Pulumi¶
Pulumi takes a different approach to IaC by allowing you to use general-purpose programming languages (TypeScript, Python, Go, C#, Java, YAML) instead of domain-specific languages.
Philosophy¶
Traditional IaC DSL (Terraform HCL):
┌─────────────────────────────────────┐
│ resource "aws_instance" "web" { │
│ ami = var.ami │
│ instance_type = "t3.micro" │
│ } │
└─────────────────────────────────────┘
Limited expressiveness
Pulumi (Real Programming Languages):
┌─────────────────────────────────────┐
│ const instance = new aws.ec2. │
│ Instance("web", { │
│ ami: ami.id, │
│ instanceType: "t3.micro", │
│ }); │
│ │
│ // Use loops, conditionals, │
│ // functions, classes, packages │
└─────────────────────────────────────┘
Full language power
Architecture¶
┌─────────────────────────────────────────────────────────────────────────┐
│ Pulumi Architecture │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ │
│ │ Program │ TypeScript/Python/Go/C#/Java/YAML │
│ │ (Your Code) │ │
│ └────────┬─────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Pulumi Engine │────►│ State Backend │ │
│ │ │ │ │ │
│ │ • Deployment │ │ • Pulumi Cloud │ │
│ │ • Diff/Preview │ │ • S3/Azure/GCS │ │
│ │ • Resource Mgmt │ │ • Local file │ │
│ └────────┬─────────┘ └──────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ Resource Providers │ │
│ │ (Same providers as Terraform - bridged or native) │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Examples by Language¶
TypeScript:
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
// Configuration
const config = new pulumi.Config();
const instanceCount = config.getNumber("instanceCount") || 3;
// Get latest Ubuntu AMI
const ami = aws.ec2.getAmi({
mostRecent: true,
owners: ["099720109477"],
filters: [{
name: "name",
values: ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"],
}],
});
// Create VPC
const vpc = new aws.ec2.Vpc("main", {
cidrBlock: "10.0.0.0/16",
enableDnsHostnames: true,
tags: { Name: "main-vpc" },
});
// Create instances using a loop
const instances: aws.ec2.Instance[] = [];
for (let i = 0; i < instanceCount; i++) {
instances.push(new aws.ec2.Instance(`web-${i}`, {
ami: ami.then(a => a.id),
instanceType: "t3.micro",
tags: { Name: `web-${i}` },
}));
}
// Export outputs
export const instanceIds = instances.map(i => i.id);
export const publicIps = instances.map(i => i.publicIp);
Python:
import pulumi
import pulumi_aws as aws
# Configuration
config = pulumi.Config()
instance_count = config.get_int("instanceCount") or 3
# Get latest Ubuntu AMI
ami = aws.ec2.get_ami(
most_recent=True,
owners=["099720109477"],
filters=[{
"name": "name",
"values": ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"],
}]
)
# Create VPC
vpc = aws.ec2.Vpc("main",
cidr_block="10.0.0.0/16",
enable_dns_hostnames=True,
tags={"Name": "main-vpc"}
)
# Create instances using list comprehension
instances = [
aws.ec2.Instance(f"web-{i}",
ami=ami.id,
instance_type="t3.micro",
tags={"Name": f"web-{i}"}
)
for i in range(instance_count)
]
# Export outputs
pulumi.export("instance_ids", [i.id for i in instances])
pulumi.export("public_ips", [i.public_ip for i in instances])
Go:
package main
import (
"fmt"
"github.com/pulumi/pulumi-aws/sdk/v6/go/aws/ec2"
"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
"github.com/pulumi/pulumi/sdk/v3/go/pulumi/config"
)
func main() {
pulumi.Run(func(ctx *pulumi.Context) error {
cfg := config.New(ctx, "")
instanceCount := cfg.GetInt("instanceCount")
if instanceCount == 0 {
instanceCount = 3
}
// Create VPC
vpc, err := ec2.NewVpc(ctx, "main", &ec2.VpcArgs{
CidrBlock: pulumi.String("10.0.0.0/16"),
EnableDnsHostnames: pulumi.Bool(true),
Tags: pulumi.StringMap{
"Name": pulumi.String("main-vpc"),
},
})
if err != nil {
return err
}
// Create instances
var instanceIds pulumi.StringArray
for i := 0; i < instanceCount; i++ {
instance, err := ec2.NewInstance(ctx, fmt.Sprintf("web-%d", i), &ec2.InstanceArgs{
Ami: pulumi.String("ami-0c55b159cbfafe1f0"),
InstanceType: pulumi.String("t3.micro"),
})
if err != nil {
return err
}
instanceIds = append(instanceIds, instance.ID())
}
ctx.Export("vpcId", vpc.ID())
ctx.Export("instanceIds", instanceIds)
return nil
})
}
Advanced Pulumi Features¶
Component Resources (reusable abstractions):
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
interface WebClusterArgs {
instanceCount: number;
instanceType: string;
vpcId: pulumi.Input<string>;
subnetIds: pulumi.Input<string>[];
}
class WebCluster extends pulumi.ComponentResource {
public readonly instances: aws.ec2.Instance[];
public readonly loadBalancer: aws.lb.LoadBalancer;
public readonly url: pulumi.Output<string>;
constructor(name: string, args: WebClusterArgs, opts?: pulumi.ComponentResourceOptions) {
super("custom:infrastructure:WebCluster", name, {}, opts);
// Security group
const sg = new aws.ec2.SecurityGroup(`${name}-sg`, {
vpcId: args.vpcId,
ingress: [
{ protocol: "tcp", fromPort: 80, toPort: 80, cidrBlocks: ["0.0.0.0/0"] },
],
egress: [
{ protocol: "-1", fromPort: 0, toPort: 0, cidrBlocks: ["0.0.0.0/0"] },
],
}, { parent: this });
// Create instances
this.instances = [];
for (let i = 0; i < args.instanceCount; i++) {
this.instances.push(new aws.ec2.Instance(`${name}-instance-${i}`, {
instanceType: args.instanceType,
ami: "ami-0c55b159cbfafe1f0",
subnetId: args.subnetIds[i % args.subnetIds.length],
vpcSecurityGroupIds: [sg.id],
}, { parent: this }));
}
// Load balancer
this.loadBalancer = new aws.lb.LoadBalancer(`${name}-lb`, {
loadBalancerType: "application",
securityGroups: [sg.id],
subnets: args.subnetIds,
}, { parent: this });
this.url = pulumi.interpolate`http://${this.loadBalancer.dnsName}`;
this.registerOutputs({
url: this.url,
});
}
}
// Usage
const cluster = new WebCluster("web", {
instanceCount: 3,
instanceType: "t3.micro",
vpcId: vpc.id,
subnetIds: publicSubnetIds,
});
export const clusterUrl = cluster.url;
Stack References (cross-stack dependencies):
// infrastructure/index.ts (Stack A)
export const vpcId = vpc.id;
export const subnetIds = subnets.map(s => s.id);
// application/index.ts (Stack B)
const infra = new pulumi.StackReference("org/infrastructure/prod");
const vpcId = infra.getOutput("vpcId");
const subnetIds = infra.getOutput("subnetIds");
Pulumi vs Terraform¶
| Aspect | Pulumi | Terraform |
|---|---|---|
| Language | General-purpose (TS, Python, Go, etc.) | HCL (domain-specific) |
| Learning curve | Lower for developers | Lower for ops |
| Testing | Standard language testing frameworks | Terratest, custom |
| IDE support | Full (autocomplete, refactoring) | Limited |
| Abstraction | Full OOP (classes, inheritance) | Modules only |
| State | Pulumi Cloud, S3, local | S3, remote backends |
| Provider ecosystem | Same as Terraform (bridged) | Native |
AWS CloudFormation¶
AWS CloudFormation is Amazon's native IaC service for provisioning AWS resources.
Template Structure¶
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Complete web application infrastructure'
# Input parameters
Parameters:
EnvironmentType:
Description: Environment type
Type: String
Default: development
AllowedValues:
- development
- staging
- production
ConstraintDescription: Must be development, staging, or production
InstanceType:
Description: EC2 instance type
Type: String
Default: t3.micro
AllowedValues:
- t3.micro
- t3.small
- t3.medium
# Conditional logic
Conditions:
IsProduction: !Equals [!Ref EnvironmentType, production]
CreateNATGateway: !Or
- !Equals [!Ref EnvironmentType, staging]
- !Equals [!Ref EnvironmentType, production]
# Mappings (lookup tables)
Mappings:
RegionAMI:
us-east-1:
HVM64: ami-0123456789abcdef0
us-west-2:
HVM64: ami-0fedcba9876543210
# Resources
Resources:
# VPC
VPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: 10.0.0.0/16
EnableDnsHostnames: true
EnableDnsSupport: true
Tags:
- Key: Name
Value: !Sub '${AWS::StackName}-vpc'
- Key: Environment
Value: !Ref EnvironmentType
# Public Subnet
PublicSubnet:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
CidrBlock: 10.0.1.0/24
AvailabilityZone: !Select [0, !GetAZs '']
MapPublicIpOnLaunch: true
Tags:
- Key: Name
Value: !Sub '${AWS::StackName}-public-subnet'
# Internet Gateway
InternetGateway:
Type: AWS::EC2::InternetGateway
Properties:
Tags:
- Key: Name
Value: !Sub '${AWS::StackName}-igw'
AttachGateway:
Type: AWS::EC2::VPCGatewayAttachment
Properties:
VpcId: !Ref VPC
InternetGatewayId: !Ref InternetGateway
# Security Group
WebSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Allow HTTP/HTTPS traffic
VpcId: !Ref VPC
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 80
ToPort: 80
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: 443
ToPort: 443
CidrIp: 0.0.0.0/0
Tags:
- Key: Name
Value: !Sub '${AWS::StackName}-web-sg'
# EC2 Instance
WebInstance:
Type: AWS::EC2::Instance
Properties:
InstanceType: !If [IsProduction, t3.large, !Ref InstanceType]
ImageId: !FindInMap [RegionAMI, !Ref 'AWS::Region', HVM64]
SubnetId: !Ref PublicSubnet
SecurityGroupIds:
- !Ref WebSecurityGroup
Tags:
- Key: Name
Value: !Sub '${AWS::StackName}-web'
DependsOn: AttachGateway
# Conditional NAT Gateway
NATGateway:
Type: AWS::EC2::NatGateway
Condition: CreateNATGateway
Properties:
AllocationId: !GetAtt NATElasticIP.AllocationId
SubnetId: !Ref PublicSubnet
NATElasticIP:
Type: AWS::EC2::EIP
Condition: CreateNATGateway
Properties:
Domain: vpc
# Outputs
Outputs:
VPCId:
Description: VPC ID
Value: !Ref VPC
Export:
Name: !Sub '${AWS::StackName}-VPCId'
InstancePublicIP:
Description: Public IP of web instance
Value: !GetAtt WebInstance.PublicIp
WebsiteURL:
Description: Website URL
Value: !Sub 'http://${WebInstance.PublicDnsName}'
Intrinsic Functions¶
# !Ref - Reference parameter or resource
VpcId: !Ref VPC
# !GetAtt - Get resource attribute
PublicIp: !GetAtt WebInstance.PublicIp
# !Sub - String substitution
Name: !Sub '${AWS::StackName}-${EnvironmentType}-web'
# !Join - Join strings
SecurityGroups: !Join [',', [!Ref SG1, !Ref SG2]]
# !Select - Select from list
AZ: !Select [0, !GetAZs '']
# !Split - Split string into list
Subnets: !Split [',', !Ref SubnetList]
# !If - Conditional
InstanceType: !If [IsProduction, t3.large, t3.micro]
# !Equals, !And, !Or, !Not - Conditions
Condition: !Equals [!Ref Env, production]
# !FindInMap - Lookup in mappings
AMI: !FindInMap [RegionAMI, !Ref 'AWS::Region', HVM64]
# !ImportValue - Import from another stack
VpcId: !ImportValue SharedVPCId
# !Cidr - Generate CIDR blocks
Subnets: !Cidr [!GetAtt VPC.CidrBlock, 4, 8]
Nested Stacks and Cross-Stack References¶
# Parent stack using nested stacks
Resources:
NetworkStack:
Type: AWS::CloudFormation::Stack
Properties:
TemplateURL: https://s3.amazonaws.com/mybucket/network.yaml
Parameters:
Environment: !Ref Environment
ComputeStack:
Type: AWS::CloudFormation::Stack
Properties:
TemplateURL: https://s3.amazonaws.com/mybucket/compute.yaml
Parameters:
VpcId: !GetAtt NetworkStack.Outputs.VPCId
SubnetIds: !GetAtt NetworkStack.Outputs.SubnetIds
CloudFormation vs Terraform¶
| Aspect | CloudFormation | Terraform |
|---|---|---|
| Provider | AWS only | Multi-cloud |
| State | AWS-managed | Self-managed or remote |
| Syntax | JSON/YAML | HCL |
| Drift detection | Built-in | terraform plan |
| Rollback | Automatic on failure | Manual |
| Cost | Free | Free (HCP Terraform paid) |
| Ecosystem | AWS-native services | Large provider ecosystem |
Ansible¶
Ansible is an open-source automation tool owned by Red Hat, primarily excelling in configuration management, application deployment, orchestration, and task automation. It's "imperative/agentless, great for configuration management; can extend to provisioning."
Architecture¶
┌─────────────────────────────────────────────────────────────────────────┐
│ Ansible Architecture │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ │
│ │ Control Node │ Where Ansible runs │
│ │ │ (your workstation, CI server) │
│ │ • Playbooks │ │
│ │ • Inventory │ │
│ │ • Modules │ │
│ └────────┬─────────┘ │
│ │ │
│ │ SSH / WinRM (agentless) │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ Managed Nodes │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Server1 │ │ Server2 │ │ Server3 │ │ ... │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
│ │ │ │
│ │ No agents required - just Python and SSH access │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Inventory¶
# inventory/hosts.ini - Static inventory
[webservers]
web1.example.com ansible_host=10.0.1.10
web2.example.com ansible_host=10.0.1.11
web3.example.com ansible_host=10.0.1.12
[dbservers]
db1.example.com ansible_host=10.0.2.10
db2.example.com ansible_host=10.0.2.11
[loadbalancers]
lb1.example.com ansible_host=10.0.0.10
# Group of groups
[production:children]
webservers
dbservers
loadbalancers
# Group variables
[webservers:vars]
http_port=80
max_connections=1000
[dbservers:vars]
db_port=5432
# inventory/hosts.yml - YAML inventory
all:
children:
production:
children:
webservers:
hosts:
web1.example.com:
ansible_host: 10.0.1.10
http_port: 80
web2.example.com:
ansible_host: 10.0.1.11
dbservers:
hosts:
db1.example.com:
ansible_host: 10.0.2.10
db_port: 5432
vars:
backup_enabled: true
Playbooks¶
# deploy-webapp.yml - Complete playbook example
---
- name: Deploy Web Application
hosts: webservers
become: true
gather_facts: true
vars:
app_name: myapp
app_version: "2.1.0"
app_port: 8080
app_user: webapp
deploy_dir: /opt/{{ app_name }}
vars_files:
- vars/secrets.yml # Encrypted with ansible-vault
pre_tasks:
- name: Update apt cache
apt:
update_cache: yes
cache_valid_time: 3600
when: ansible_os_family == "Debian"
tasks:
- name: Install required packages
apt:
name:
- nginx
- python3
- python3-pip
- python3-venv
state: present
- name: Create application user
user:
name: "{{ app_user }}"
system: yes
shell: /usr/sbin/nologin
home: "{{ deploy_dir }}"
create_home: yes
- name: Create deployment directory
file:
path: "{{ deploy_dir }}"
state: directory
owner: "{{ app_user }}"
group: "{{ app_user }}"
mode: '0755'
- name: Deploy application code
unarchive:
src: "https://releases.example.com/{{ app_name }}-{{ app_version }}.tar.gz"
dest: "{{ deploy_dir }}"
remote_src: yes
owner: "{{ app_user }}"
group: "{{ app_user }}"
notify: Restart application
- name: Create virtual environment
pip:
requirements: "{{ deploy_dir }}/requirements.txt"
virtualenv: "{{ deploy_dir }}/venv"
virtualenv_command: python3 -m venv
- name: Configure application
template:
src: templates/app-config.yml.j2
dest: "{{ deploy_dir }}/config.yml"
owner: "{{ app_user }}"
group: "{{ app_user }}"
mode: '0640'
notify: Restart application
- name: Deploy systemd service
template:
src: templates/app.service.j2
dest: /etc/systemd/system/{{ app_name }}.service
mode: '0644'
notify:
- Reload systemd
- Restart application
- name: Configure nginx reverse proxy
template:
src: templates/nginx-site.conf.j2
dest: /etc/nginx/sites-available/{{ app_name }}
mode: '0644'
notify: Reload nginx
- name: Enable nginx site
file:
src: /etc/nginx/sites-available/{{ app_name }}
dest: /etc/nginx/sites-enabled/{{ app_name }}
state: link
notify: Reload nginx
- name: Ensure services are running
service:
name: "{{ item }}"
state: started
enabled: yes
loop:
- "{{ app_name }}"
- nginx
handlers:
- name: Reload systemd
systemd:
daemon_reload: yes
- name: Restart application
service:
name: "{{ app_name }}"
state: restarted
- name: Reload nginx
service:
name: nginx
state: reloaded
post_tasks:
- name: Verify application is responding
uri:
url: "http://localhost:{{ app_port }}/health"
status_code: 200
retries: 5
delay: 3
Roles¶
roles/
└── webserver/
├── defaults/ # Default variables (lowest priority)
│ └── main.yml
├── vars/ # Role variables (higher priority)
│ └── main.yml
├── tasks/ # Task files
│ ├── main.yml # Entry point
│ ├── install.yml
│ ├── configure.yml
│ └── service.yml
├── handlers/ # Handlers
│ └── main.yml
├── templates/ # Jinja2 templates
│ ├── nginx.conf.j2
│ └── vhost.conf.j2
├── files/ # Static files
│ └── ssl-params.conf
├── meta/ # Role metadata
│ └── main.yml
└── README.md
# roles/webserver/tasks/main.yml
---
- name: Include installation tasks
include_tasks: install.yml
- name: Include configuration tasks
include_tasks: configure.yml
- name: Include service tasks
include_tasks: service.yml
# roles/webserver/tasks/install.yml
---
- name: Install nginx
apt:
name: nginx
state: present
when: ansible_os_family == "Debian"
- name: Install nginx (RHEL)
yum:
name: nginx
state: present
when: ansible_os_family == "RedHat"
# roles/webserver/handlers/main.yml
---
- name: Restart nginx
service:
name: nginx
state: restarted
- name: Reload nginx
service:
name: nginx
state: reloaded
# Using roles in playbook
---
- name: Configure web servers
hosts: webservers
become: true
roles:
- role: common
- role: webserver
vars:
nginx_worker_processes: auto
nginx_worker_connections: 4096
- role: ssl-certificates
when: enable_ssl | default(false)
Jinja2 Templates¶
# templates/nginx-site.conf.j2
upstream {{ app_name }} {
{% for host in groups['webservers'] %}
server {{ hostvars[host]['ansible_host'] }}:{{ app_port }};
{% endfor %}
}
server {
listen 80;
server_name {{ server_name }};
{% if enable_ssl | default(false) %}
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
server_name {{ server_name }};
ssl_certificate /etc/ssl/certs/{{ app_name }}.crt;
ssl_certificate_key /etc/ssl/private/{{ app_name }}.key;
{% endif %}
location / {
proxy_pass http://{{ app_name }};
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
location /health {
access_log off;
return 200 "healthy\n";
}
}
Ansible Vault¶
# Create encrypted file
ansible-vault create secrets.yml
# Edit encrypted file
ansible-vault edit secrets.yml
# Encrypt existing file
ansible-vault encrypt plain-secrets.yml
# Decrypt file
ansible-vault decrypt secrets.yml
# View encrypted file
ansible-vault view secrets.yml
# Run playbook with vault password
ansible-playbook playbook.yml --ask-vault-pass
ansible-playbook playbook.yml --vault-password-file ~/.vault_pass
# secrets.yml (encrypted)
db_password: supersecretpassword
api_key: abc123xyz
ssl_private_key: |
-----BEGIN PRIVATE KEY-----
...
-----END PRIVATE KEY-----
Best Practices¶
# Use YAML anchors and aliases for DRY
defaults: &defaults
become: true
gather_facts: true
- name: Configure webservers
hosts: webservers
<<: *defaults
- name: Configure dbservers
hosts: dbservers
<<: *defaults
# Use blocks for error handling
- name: Deploy with rollback
block:
- name: Deploy new version
# ... deployment tasks
- name: Run smoke tests
uri:
url: "http://localhost/health"
status_code: 200
rescue:
- name: Rollback to previous version
# ... rollback tasks
always:
- name: Send notification
# ... notification tasks
Helm¶
Helm is the package manager for Kubernetes, enabling you to define, install, and upgrade complex Kubernetes applications.
Chart Structure¶
mychart/
├── Chart.yaml # Chart metadata
├── Chart.lock # Dependency lock file
├── values.yaml # Default configuration values
├── values.schema.json # JSON Schema for values validation
├── templates/ # Template files
│ ├── NOTES.txt # Post-install notes
│ ├── _helpers.tpl # Template helpers
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── ingress.yaml
│ ├── configmap.yaml
│ ├── secret.yaml
│ ├── hpa.yaml
│ └── serviceaccount.yaml
├── charts/ # Dependency charts
├── crds/ # Custom Resource Definitions
└── README.md
Chart.yaml¶
apiVersion: v2
name: myapp
description: A Helm chart for my application
type: application
version: 1.2.3 # Chart version
appVersion: "2.0.0" # Application version
keywords:
- web
- api
- microservice
home: https://example.com/myapp
sources:
- https://github.com/example/myapp
maintainers:
- name: John Doe
email: john@example.com
url: https://johndoe.dev
dependencies:
- name: postgresql
version: "12.x.x"
repository: "https://charts.bitnami.com/bitnami"
condition: postgresql.enabled
- name: redis
version: "17.x.x"
repository: "https://charts.bitnami.com/bitnami"
condition: redis.enabled
Values.yaml¶
# values.yaml - Default values for myapp
# Number of replicas
replicaCount: 1
image:
repository: myregistry.io/myapp
pullPolicy: IfNotPresent
tag: "" # Defaults to appVersion
imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
serviceAccount:
create: true
annotations: {}
name: ""
podAnnotations: {}
podSecurityContext:
fsGroup: 1000
securityContext:
runAsNonRoot: true
runAsUser: 1000
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
service:
type: ClusterIP
port: 80
ingress:
enabled: false
className: "nginx"
annotations: {}
hosts:
- host: chart-example.local
paths:
- path: /
pathType: ImplementationSpecific
tls: []
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 80
targetMemoryUtilizationPercentage: 80
nodeSelector: {}
tolerations: []
affinity: {}
# Application-specific configuration
config:
logLevel: info
database:
host: localhost
port: 5432
name: myapp
cache:
enabled: true
ttl: 3600
# Feature flags
features:
newUI: false
betaAPI: false
# Dependencies
postgresql:
enabled: true
auth:
username: myapp
database: myapp
redis:
enabled: false
Templates¶
# templates/_helpers.tpl - Template helpers
{{/*
Expand the name of the chart.
*/}}
{{- define "myapp.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
{{- end }}
{{/*
Create a default fully qualified app name.
*/}}
{{- define "myapp.fullname" -}}
{{- if .Values.fullnameOverride }}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- $name := default .Chart.Name .Values.nameOverride }}
{{- if contains $name .Release.Name }}
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
{{- end }}
{{- end }}
{{- end }}
{{/*
Create chart name and version as used by the chart label.
*/}}
{{- define "myapp.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }}
{{- end }}
{{/*
Common labels
*/}}
{{- define "myapp.labels" -}}
helm.sh/chart: {{ include "myapp.chart" . }}
{{ include "myapp.selectorLabels" . }}
{{- if .Chart.AppVersion }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}
{{/*
Selector labels
*/}}
{{- define "myapp.selectorLabels" -}}
app.kubernetes.io/name: {{ include "myapp.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}
{{/*
Create the name of the service account to use
*/}}
{{- define "myapp.serviceAccountName" -}}
{{- if .Values.serviceAccount.create }}
{{- default (include "myapp.fullname" .) .Values.serviceAccount.name }}
{{- else }}
{{- default "default" .Values.serviceAccount.name }}
{{- end }}
{{- end }}
# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "myapp.fullname" . }}
labels:
{{- include "myapp.labels" . | nindent 4 }}
spec:
{{- if not .Values.autoscaling.enabled }}
replicas: {{ .Values.replicaCount }}
{{- end }}
selector:
matchLabels:
{{- include "myapp.selectorLabels" . | nindent 6 }}
template:
metadata:
annotations:
checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
{{- with .Values.podAnnotations }}
{{- toYaml . | nindent 8 }}
{{- end }}
labels:
{{- include "myapp.selectorLabels" . | nindent 8 }}
spec:
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
serviceAccountName: {{ include "myapp.serviceAccountName" . }}
securityContext:
{{- toYaml .Values.podSecurityContext | nindent 8 }}
containers:
- name: {{ .Chart.Name }}
securityContext:
{{- toYaml .Values.securityContext | nindent 12 }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- name: http
containerPort: {{ .Values.service.port }}
protocol: TCP
envFrom:
- configMapRef:
name: {{ include "myapp.fullname" . }}-config
- secretRef:
name: {{ include "myapp.fullname" . }}-secrets
livenessProbe:
httpGet:
path: /health/live
port: http
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: http
initialDelaySeconds: 5
periodSeconds: 5
resources:
{{- toYaml .Values.resources | nindent 12 }}
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir: {}
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
# templates/service.yaml
apiVersion: v1
kind: Service
metadata:
name: {{ include "myapp.fullname" . }}
labels:
{{- include "myapp.labels" . | nindent 4 }}
spec:
type: {{ .Values.service.type }}
ports:
- port: {{ .Values.service.port }}
targetPort: http
protocol: TCP
name: http
selector:
{{- include "myapp.selectorLabels" . | nindent 4 }}
# templates/ingress.yaml
{{- if .Values.ingress.enabled -}}
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: {{ include "myapp.fullname" . }}
labels:
{{- include "myapp.labels" . | nindent 4 }}
{{- with .Values.ingress.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
{{- if .Values.ingress.className }}
ingressClassName: {{ .Values.ingress.className }}
{{- end }}
{{- if .Values.ingress.tls }}
tls:
{{- range .Values.ingress.tls }}
- hosts:
{{- range .hosts }}
- {{ . | quote }}
{{- end }}
secretName: {{ .secretName }}
{{- end }}
{{- end }}
rules:
{{- range .Values.ingress.hosts }}
- host: {{ .host | quote }}
http:
paths:
{{- range .paths }}
- path: {{ .path }}
pathType: {{ .pathType }}
backend:
service:
name: {{ include "myapp.fullname" $ }}
port:
number: {{ $.Values.service.port }}
{{- end }}
{{- end }}
{{- end }}
Helm Commands¶
# Repository management
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
helm search repo nginx
# Install a chart
helm install myrelease ./mychart
helm install myrelease ./mychart -f custom-values.yaml
helm install myrelease ./mychart --set replicaCount=3
helm install myrelease ./mychart --namespace mynamespace --create-namespace
# Upgrade a release
helm upgrade myrelease ./mychart
helm upgrade --install myrelease ./mychart # Install if not exists
# Rollback
helm rollback myrelease 1 # Rollback to revision 1
helm history myrelease # View release history
# Uninstall
helm uninstall myrelease
# Template rendering (dry-run)
helm template myrelease ./mychart
helm template myrelease ./mychart --debug # With debug info
# Validate chart
helm lint ./mychart
# Package chart
helm package ./mychart
helm package ./mychart --version 1.2.3 --app-version 2.0.0
# Pull chart
helm pull bitnami/nginx --untar
# Dependencies
helm dependency update ./mychart
helm dependency build ./mychart
Environment-Specific Values¶
# values-dev.yaml
replicaCount: 1
image:
tag: "latest"
ingress:
enabled: false
resources:
limits:
cpu: 200m
memory: 256Mi
# values-staging.yaml
replicaCount: 2
image:
tag: "staging"
ingress:
enabled: true
hosts:
- host: staging.example.com
paths:
- path: /
pathType: Prefix
# values-prod.yaml
replicaCount: 3
image:
tag: "v2.0.0"
ingress:
enabled: true
hosts:
- host: app.example.com
paths:
- path: /
pathType: Prefix
tls:
- hosts:
- app.example.com
secretName: app-tls
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
# Deploy to different environments
helm upgrade --install myapp ./mychart -f values-dev.yaml -n dev
helm upgrade --install myapp ./mychart -f values-staging.yaml -n staging
helm upgrade --install myapp ./mychart -f values-prod.yaml -n prod
Crossplane¶
Crossplane extends Kubernetes to manage cloud infrastructure using Kubernetes-native APIs (Custom Resources).
Architecture¶
┌─────────────────────────────────────────────────────────────────────────┐
│ Crossplane Architecture │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ │
│ │ kubectl / │ Standard K8s tooling │
│ │ GitOps (ArgoCD) │ │
│ └────────┬─────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ Kubernetes API Server │ │
│ └────────┬─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ Crossplane Core │ │
│ │ • Composition Engine • Package Manager │ │
│ │ • Resource Controllers • RBAC Integration │ │
│ └────────┬─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ Providers │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ provider- │ │ provider- │ │ provider- │ │ │
│ │ │ aws │ │ azure │ │ gcp │ ... │ │
│ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │
│ └─────────┼────────────────┼────────────────┼──────────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Cloud Provider APIs │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Basic Usage¶
# Install AWS provider
apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
name: provider-aws
spec:
package: xpkg.upbound.io/upbound/provider-aws:v0.47.0
---
# Configure AWS credentials
apiVersion: aws.upbound.io/v1beta1
kind: ProviderConfig
metadata:
name: default
spec:
credentials:
source: Secret
secretRef:
namespace: crossplane-system
name: aws-creds
key: credentials
# Create AWS resources using Kubernetes manifests
apiVersion: ec2.aws.upbound.io/v1beta1
kind: VPC
metadata:
name: production-vpc
spec:
forProvider:
region: us-west-2
cidrBlock: 10.0.0.0/16
enableDnsHostnames: true
enableDnsSupport: true
tags:
Name: production-vpc
Environment: production
---
apiVersion: ec2.aws.upbound.io/v1beta1
kind: Subnet
metadata:
name: production-public-1
spec:
forProvider:
region: us-west-2
vpcIdRef:
name: production-vpc
cidrBlock: 10.0.1.0/24
availabilityZone: us-west-2a
mapPublicIpOnLaunch: true
tags:
Name: production-public-1
---
apiVersion: rds.aws.upbound.io/v1beta1
kind: Instance
metadata:
name: production-db
spec:
forProvider:
region: us-west-2
instanceClass: db.t3.medium
engine: postgres
engineVersion: "15"
allocatedStorage: 100
dbName: myapp
username: admin
passwordSecretRef:
name: db-password
namespace: default
key: password
vpcSecurityGroupIdRefs:
- name: production-db-sg
dbSubnetGroupNameRef:
name: production-db-subnet-group
publiclyAccessible: false
writeConnectionSecretToRef:
name: production-db-connection
namespace: default
Compositions (Platform Abstractions)¶
# Define a reusable composition
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
name: xdatabases.example.org
spec:
group: example.org
names:
kind: XDatabase
plural: xdatabases
claimNames:
kind: Database
plural: databases
versions:
- name: v1alpha1
served: true
referenceable: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
size:
type: string
enum: [small, medium, large]
engine:
type: string
enum: [postgres, mysql]
required:
- size
- engine
---
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
name: aws-postgres
labels:
provider: aws
engine: postgres
spec:
compositeTypeRef:
apiVersion: example.org/v1alpha1
kind: XDatabase
resources:
- name: rds-instance
base:
apiVersion: rds.aws.upbound.io/v1beta1
kind: Instance
spec:
forProvider:
region: us-west-2
engine: postgres
engineVersion: "15"
publiclyAccessible: false
patches:
- type: FromCompositeFieldPath
fromFieldPath: spec.size
toFieldPath: spec.forProvider.instanceClass
transforms:
- type: map
map:
small: db.t3.small
medium: db.t3.medium
large: db.t3.large
- type: FromCompositeFieldPath
fromFieldPath: metadata.name
toFieldPath: spec.forProvider.dbName
# Claim a database (simple interface for developers)
apiVersion: example.org/v1alpha1
kind: Database
metadata:
name: myapp-db
namespace: myapp
spec:
size: medium
engine: postgres
compositionSelector:
matchLabels:
provider: aws
engine: postgres
writeConnectionSecretToRef:
name: myapp-db-connection
Crossplane vs Terraform¶
| Aspect | Crossplane | Terraform |
|---|---|---|
| Runtime | Kubernetes controller | CLI tool |
| State | Kubernetes etcd | External state file |
| Drift correction | Continuous reconciliation | On terraform apply |
| GitOps native | Yes (ArgoCD/Flux) | Via CI/CD pipelines |
| Platform abstractions | Compositions | Modules |
| Learning curve | Kubernetes knowledge required | Self-contained |
| Multi-tenancy | Kubernetes RBAC | HCP Terraform workspaces |
Policy as Code¶
Policy as Code ensures compliance and security by defining rules programmatically.
Open Policy Agent (OPA)¶
OPA is a general-purpose policy engine using Rego language.
# terraform-policies/deny_public_s3.rego
package terraform.analysis
import input as tfplan
# Deny public S3 buckets
deny[msg] {
resource := tfplan.resource_changes[_]
resource.type == "aws_s3_bucket"
resource.change.after.acl == "public-read"
msg := sprintf("S3 bucket '%s' must not be public", [resource.address])
}
# Require encryption on RDS instances
deny[msg] {
resource := tfplan.resource_changes[_]
resource.type == "aws_db_instance"
not resource.change.after.storage_encrypted
msg := sprintf("RDS instance '%s' must have storage encryption enabled", [resource.address])
}
# Enforce tagging
deny[msg] {
resource := tfplan.resource_changes[_]
required_tags := {"Environment", "Owner", "CostCenter"}
provided_tags := {tag | resource.change.after.tags[tag]}
missing := required_tags - provided_tags
count(missing) > 0
msg := sprintf("Resource '%s' is missing required tags: %v", [resource.address, missing])
}
# Restrict instance types
deny[msg] {
resource := tfplan.resource_changes[_]
resource.type == "aws_instance"
allowed_types := {"t3.micro", "t3.small", "t3.medium"}
not allowed_types[resource.change.after.instance_type]
msg := sprintf("Instance '%s' uses unauthorized type '%s'. Allowed: %v",
[resource.address, resource.change.after.instance_type, allowed_types])
}
# Use with Terraform
terraform plan -out=tfplan.binary
terraform show -json tfplan.binary > tfplan.json
opa eval --data terraform-policies/ --input tfplan.json "data.terraform.analysis.deny"
HashiCorp Sentinel¶
Sentinel is HashiCorp's policy-as-code framework for HCP Terraform.
# sentinel/require-tags.sentinel
import "tfplan/v2" as tfplan
required_tags = ["Environment", "Owner", "CostCenter"]
# Get all resources that support tags
taggable_resources = filter tfplan.resource_changes as _, rc {
rc.mode is "managed" and
rc.change.after is not null and
keys(rc.change.after) contains "tags"
}
# Check each resource for required tags
missing_tags = {}
for taggable_resources as address, rc {
tags = rc.change.after.tags else {}
missing = filter required_tags as tag {
tags[tag] is undefined or tags[tag] is null or tags[tag] is ""
}
if length(missing) > 0 {
missing_tags[address] = missing
}
}
# Main rule
main = rule {
length(missing_tags) is 0
}
# Provide helpful error message
print("Resources missing required tags:", missing_tags) when not main
# sentinel/restrict-regions.sentinel
import "tfplan/v2" as tfplan
allowed_regions = ["us-west-2", "us-east-1", "eu-west-1"]
# Find AWS provider configurations
aws_providers = filter tfplan.providers as alias, p {
p.provider_name is "registry.terraform.io/hashicorp/aws"
}
# Check regions
violations = filter aws_providers as alias, p {
p.config.region not in allowed_regions
}
main = rule {
length(violations) is 0
}
Checkov (Static Analysis)¶
# Scan Terraform files
checkov -d ./terraform --framework terraform
# Scan specific file
checkov -f main.tf
# Output formats
checkov -d . --output json
checkov -d . --output sarif # For GitHub Advanced Security
# Skip specific checks
checkov -d . --skip-check CKV_AWS_18,CKV_AWS_19
# Custom policies
checkov -d . --external-checks-dir ./custom-policies
# Custom Checkov policy (YAML)
# custom-policies/require_encryption.yaml
metadata:
name: "Ensure S3 buckets have server-side encryption enabled"
id: "CUSTOM_AWS_1"
category: "encryption"
definition:
and:
- cond_type: "attribute"
resource_types:
- "aws_s3_bucket"
attribute: "server_side_encryption_configuration"
operator: "exists"
tfsec / Trivy¶
# tfsec scan
tfsec ./terraform
# With specific severity
tfsec ./terraform --minimum-severity HIGH
# Trivy (includes tfsec)
trivy config ./terraform
# Output as SARIF for CI/CD
trivy config ./terraform --format sarif --output results.sarif
GitOps and IaC¶
GitOps applies Git workflows to infrastructure management.
GitOps Principles¶
┌─────────────────────────────────────────────────────────────────────────┐
│ GitOps Workflow │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Developer │───►│ Git Repo │───►│ CI/CD │ │
│ │ Push Code │ │ (Source of │ │ Pipeline │ │
│ └──────────────┘ │ Truth) │ └──────┬───────┘ │
│ └──────────────┘ │ │
│ ▲ │ │
│ │ ▼ │
│ ┌──────┴───────┐ ┌──────────────┐ │
│ │ Reconcile │◄──│ GitOps │ │
│ │ Loop │ │ Operator │ │
│ └──────────────┘ │ (ArgoCD/ │ │
│ │ Flux) │ │
│ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Kubernetes │ │
│ │ Cluster │ │
│ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
ArgoCD with Helm¶
# ArgoCD Application for Helm chart
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/org/helm-charts
targetRevision: main
path: charts/myapp
helm:
valueFiles:
- values.yaml
- values-prod.yaml
parameters:
- name: image.tag
value: "v2.0.0"
destination:
server: https://kubernetes.default.svc
namespace: myapp
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
Terraform with GitOps (Atlantis)¶
# atlantis.yaml - Repository configuration
version: 3
projects:
- name: production
dir: environments/production
workspace: production
terraform_version: v1.6.0
autoplan:
when_modified: ["*.tf", "../modules/**/*.tf"]
enabled: true
apply_requirements: [approved, mergeable]
- name: staging
dir: environments/staging
workspace: staging
terraform_version: v1.6.0
autoplan:
when_modified: ["*.tf", "../modules/**/*.tf"]
enabled: true
# GitHub Actions for Terraform
name: Terraform
on:
pull_request:
paths:
- 'terraform/**'
push:
branches:
- main
paths:
- 'terraform/**'
jobs:
terraform:
runs-on: ubuntu-latest
defaults:
run:
working-directory: terraform
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.6.0
- name: Terraform Init
run: terraform init
- name: Terraform Format Check
run: terraform fmt -check -recursive
- name: Terraform Validate
run: terraform validate
- name: Terraform Plan
if: github.event_name == 'pull_request'
run: terraform plan -no-color
continue-on-error: true
- name: Terraform Apply
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
run: terraform apply -auto-approve
Testing IaC¶
Terratest (Go-based Testing)¶
// test/vpc_test.go
package test
import (
"testing"
"github.com/gruntwork-io/terratest/modules/aws"
"github.com/gruntwork-io/terratest/modules/terraform"
"github.com/stretchr/testify/assert"
)
func TestVPCModule(t *testing.T) {
t.Parallel()
terraformOptions := terraform.WithDefaultRetryableErrors(t, &terraform.Options{
TerraformDir: "../modules/vpc",
Vars: map[string]interface{}{
"vpc_cidr": "10.0.0.0/16",
"environment": "test",
"name": "terratest-vpc",
},
EnvVars: map[string]string{
"AWS_DEFAULT_REGION": "us-west-2",
},
})
// Clean up resources after test
defer terraform.Destroy(t, terraformOptions)
// Deploy infrastructure
terraform.InitAndApply(t, terraformOptions)
// Get outputs
vpcId := terraform.Output(t, terraformOptions, "vpc_id")
publicSubnetIds := terraform.OutputList(t, terraformOptions, "public_subnet_ids")
// Validate VPC exists
vpc := aws.GetVpcById(t, vpcId, "us-west-2")
assert.Equal(t, "10.0.0.0/16", aws.GetCidrBlock(vpc))
// Validate subnets
assert.Equal(t, 3, len(publicSubnetIds))
// Validate tags
tags := aws.GetTagsForVpc(t, vpcId, "us-west-2")
assert.Equal(t, "test", tags["Environment"])
}
terraform test (Native Testing)¶
# tests/vpc.tftest.hcl
run "create_vpc" {
command = apply
variables {
vpc_cidr = "10.0.0.0/16"
environment = "test"
name = "test-vpc"
}
assert {
condition = aws_vpc.main.cidr_block == "10.0.0.0/16"
error_message = "VPC CIDR block is incorrect"
}
assert {
condition = aws_vpc.main.enable_dns_hostnames == true
error_message = "DNS hostnames should be enabled"
}
assert {
condition = length(aws_subnet.public) == 3
error_message = "Should create 3 public subnets"
}
}
run "validate_tags" {
command = plan
variables {
vpc_cidr = "10.0.0.0/16"
environment = "production"
name = "prod-vpc"
}
assert {
condition = aws_vpc.main.tags["Environment"] == "production"
error_message = "Environment tag should be 'production'"
}
}
Ansible Molecule Testing¶
# molecule/default/molecule.yml
dependency:
name: galaxy
driver:
name: docker
platforms:
- name: instance
image: geerlingguy/docker-ubuntu2204-ansible
pre_build_image: true
privileged: true
command: /lib/systemd/systemd
provisioner:
name: ansible
inventory:
host_vars:
instance:
ansible_user: root
verifier:
name: ansible
# molecule/default/converge.yml
---
- name: Converge
hosts: all
tasks:
- name: Include role
include_role:
name: webserver
# molecule/default/verify.yml
---
- name: Verify
hosts: all
gather_facts: false
tasks:
- name: Check nginx is installed
package:
name: nginx
state: present
check_mode: true
register: nginx_check
failed_when: nginx_check.changed
- name: Check nginx is running
service:
name: nginx
state: started
check_mode: true
register: nginx_service
failed_when: nginx_service.changed
- name: Verify nginx responds
uri:
url: http://localhost
status_code: 200
# Run molecule tests
molecule test
# Individual stages
molecule create # Create test instances
molecule converge # Run playbook
molecule verify # Run verification
molecule destroy # Clean up
IaC Best Practices Summary¶
Directory Structure¶
infrastructure/
├── .github/
│ └── workflows/
│ ├── terraform.yml
│ └── ansible.yml
├── terraform/
│ ├── modules/
│ │ ├── networking/
│ │ ├── compute/
│ │ ├── database/
│ │ └── security/
│ ├── environments/
│ │ ├── dev/
│ │ ├── staging/
│ │ └── production/
│ └── tests/
├── ansible/
│ ├── inventory/
│ │ ├── dev/
│ │ ├── staging/
│ │ └── production/
│ ├── roles/
│ ├── playbooks/
│ └── group_vars/
├── helm/
│ └── charts/
│ └── myapp/
├── policies/
│ ├── opa/
│ └── sentinel/
└── docs/
└── runbooks/
Security Checklist¶
- [ ] Never commit secrets to version control
- [ ] Use secret management (Vault, AWS Secrets Manager, etc.)
- [ ] Encrypt state files (S3 server-side encryption, etc.)
- [ ] Apply least privilege to IaC service accounts
- [ ] Enable state locking to prevent concurrent modifications
- [ ] Implement policy-as-code for compliance
- [ ] Scan IaC for security misconfigurations (Checkov, tfsec)
- [ ] Review infrastructure changes via pull requests
- [ ] Audit who made what changes and when
Common Anti-Patterns¶
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Hardcoded secrets | Security risk | Use secret management tools |
| No state locking | Race conditions | Enable DynamoDB/backend locking |
| Single monolithic state | Blast radius | Split into multiple states |
| No testing | Unreliable changes | Implement terratest/molecule |
| Manual changes | Configuration drift | Enforce IaC-only changes |
| Copy-paste code | Maintenance burden | Use modules/roles |
| No code review | Quality issues | Require PR approvals |
| Ignoring drift | Unknown state | Regular drift detection |
Migration Strategy¶
For existing infrastructure:
- Import: Use
terraform importto bring existing resources under management - Document: Create accurate state of current infrastructure
- Incremental: Migrate piece by piece, not all at once
- Validate: Compare imported state with actual infrastructure
- Test: Run plans to ensure no unexpected changes
# Import existing AWS resources
terraform import aws_vpc.main vpc-0123456789abcdef0
terraform import aws_subnet.public[0] subnet-0123456789abcdef0
terraform import aws_instance.web i-0123456789abcdef0
# Generate configuration from state
terraform show -no-color > imported.tf
Conclusion¶
Infrastructure as Code has evolved from simple automation scripts to sophisticated, enterprise-grade tooling that enables organizations to manage complex, multi-cloud environments reliably and securely.
Key Takeaways:
- Choose the right tool: Terraform for provisioning, Ansible for configuration, Helm for Kubernetes
- Embrace declarative: Prefer declarative approaches for predictability
- Version everything: Git is the source of truth
- Test thoroughly: Unit tests, integration tests, policy checks
- Automate completely: CI/CD pipelines for all infrastructure changes
- Security first: Secrets management, least privilege, audit trails
The future of IaC points toward:
- Platform Engineering: Self-service infrastructure via internal developer platforms
- GitOps Maturity: Declarative, version-controlled, continuously reconciled
- AI-Assisted IaC: Automated code generation, drift detection, optimization
- Multi-Cloud Abstraction: Tools like Crossplane providing unified control planes