Skip to content
OpenTofu / Terraform

Nstance OpenTofu/Terraform Modules

Nstance provides OpenTofu/Terraform modules with a unified, cloud-agnostic interface for deploying Nstance in Amazon Web Services (AWS) and/or Google Cloud (GCP). Each module has a consistent variable interface with cloud-specific implementations underneath:

  • cluster generates a cluster ID, a baseline configuration, and provisions or links cluster-wide resources: a S3/GCS bucket, and encrypted secrets.

  • account creates IAM roles/instance profiles per AWS account/GCP project.

  • network VPC/network setup per account/project & region.

  • shard deploys one Nstance zone “shard” (nstance-server instance + group instances).

The delineation of these modules enables cluster deployments to scale from single account/zone to multi-cloud/account/zone with a unified configuration.

Deployed Resources Per-Module

Resource Typeclusteraccountnetworkshard
Cluster ID / Root CA
S3/GCS Bucket
Encryption Key (Secrets Manager)
Server IAM Role
Agent IAM Role
Instance Profiles
VPC/VPC Network
Internet Gateway
NAT Gateway/Cloud NAT
Route Tables
VPC Endpoints (S3, SSM, etc.)
Group Subnets (for agents)
Server Subnets
Load Balancers (NLB / Regional LB)
Security Groups/Firewall Rules
Server Instances
Group Instances*
Shard Config (S3 object)

*Group Instances are provisioned by nstance-server, not OpenTofu/Terraform.

Cloud-Specific Modules

Each cloud provider has its own published module repository:

  • AWS: nstance-dev/nstance/aws//modules/{module}
    • Source: github.com/nstance-dev/terraform-aws-nstance//{module}
  • GCP: nstance-dev/nstance/gcp//modules/{module}
    • Source: github.com/nstance-dev/terraform-gcp-nstance//{module}

Region and project are inferred from the provider configuration via data sources (data.aws_region.current or data.google_client_config.current), to minimise the number of required variables per module.

Development Structure

Module source code lives in the main github.com/nstance-dev/nstance repository under deploy/tf/. Modules share a unified variable interface — common variable definitions live in deploy/tf/common/ and are symlinked into each cloud module. During release, the cloud-specific modules are synced to their respective repositories with symlinks replaced by actual files.

deploy/tf/
├── common/             # Shared variable definitions (symlinked into cloud modules)
│   ├── cluster/
│   │   └── variables.tf
│   ├── account/
│   │   └── variables.tf
│   ├── network/
│   │   └── variables.tf
│   └── shard/
│       └── variables.tf
│
├── aws/                # AWS-specific implementations → synced to terraform-aws-nstance
│   ├── cluster/
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   ├── versions.tf
│   │   └── variables.tf -> ../../common/cluster/variables.tf
│   ├── account/
│   ├── network/
│   └── shard/
│
├── gcp/                # GCP-specific implementations → synced to terraform-gcp-nstance
│   ├── cluster/
│   │   ├── main.tf
│   │   ├── outputs.tf
│   │   ├── versions.tf
│   │   └── variables.tf -> ../../common/cluster/variables.tf
│   ├── account/
│   ├── network/
│   └── shard/
│
└── examples/           # Example configurations (synced to respective repos)
    ├── aws/
    │   ├── single-shard/
    │   └── multi-az/
    ├── gcp/
    │   ├── single-shard/
    │   └── multi-az/
    └── multi-cloud/

Prerequisites

  • OpenTofu >= 1.6.0 or Terraform >= 1.5.0
  • Cloud provider CLI (aws or gcloud) configured with appropriate credentials.
  • GitHub releases available for nstance-server and nstance-agent (or custom binary URLs).

GCP Note: The cluster module automatically enables required GCP APIs (Compute, Secret Manager, Storage, IAM, IAP) on first apply, if not already enabled. If all services need to be enabled it adds ~30-60 seconds to the initial deployment but eliminates manual gcloud services enable commands.

Security

  • All instances run in private subnets by default, with public/NAT egress options available.
  • Automatic VPC endpoints eliminate need for internet access to cloud services.
  • Instance metadata service configuration uses secure defaults (i.e. IMDSv2 on AWS).
  • Instance volumes are encrypted by default.
  • IAM roles are separated per use case, each with least-privilege permissions.
  • Encryption key stored in secrets manager (not in Terraform state).
  • S3 buckets are encrypted by default.
  • S3/GCS buckets have deletion protection by default.

IPv4+IPv6 Dual-Stack Support

Both AWS and GCP support IPv4+IPv6 dual-stack networking. IPv6 is enabled by default, but can be disabled by setting enable_ipv6 = false for the network module.

Provider Differences

AWSGCP
IPv6 typeAmazon-provided public /56Internal ULA /48 (private)
Address scopeGlobally routableVPC-internal only
AssignmentAuto-generated on VPC creationAuto-generated on VPC creation
Subnet CIDRsSpecify ipv6_netnum (0-255) per subnetSpecify ipv6_netnum (0-65535) per subnet
Private egressEgress-only Internet GatewayCloud NAT (same as IPv4)

Using IPv6

When enable_ipv6 = true (the default), each subnet needs an IPv6 CIDR. You can specify this in two ways:

  • ipv6_netnum (recommended) - Subnet number that auto-computes a /64 from the VPC’s cloud-assigned block (AWS: 0-255 from /56, GCP: 0-65535 from /48)
  • ipv6_cidr - Explicit IPv6 CIDR block
module "network" {
  source  = "nstance-dev/nstance/aws//modules/network"
  version = "~> 1.0"

  vpc_cidr_ipv4 = "172.18.0.0/16"

  subnets = {
    "public" = {
      "us-west-2a" = [{
        ipv4_cidr   = "172.18.0.0/24"
        ipv6_netnum = 0  # Auto-computes /64 from VPC's /56
        public      = true
        nat_gateway = true
      }]
    }
    "nstance" = {
      "us-west-2a" = [{
        ipv4_cidr   = "172.18.1.0/28"
        ipv6_netnum = 1
        nat_subnet  = "public"
      }]
    }
  }
}

Disabling IPv6

To disable IPv6 and use IPv4-only networking:

module "network" {
  source  = "nstance-dev/nstance/aws//modules/network"
  version = "~> 1.0"

  vpc_cidr_ipv4 = "172.18.0.0/16"
  enable_ipv6   = false

  subnets = {
    "nstance" = {
      "us-west-2a" = [{
        ipv4_cidr  = "172.18.1.0/28"
        nat_subnet = "public"
      }]
    }
  }
}

Quick Start

Minimal Single-Shard Deployment (AWS)

provider "aws" {
  region = "us-west-2"
}

module "cluster" {
  source  = "nstance-dev/nstance/aws//modules/cluster"
  version = "~> 1.0"
}

module "account" {
  source  = "nstance-dev/nstance/aws//modules/account"
  version = "~> 1.0"

  cluster = module.cluster
}

module "network" {
  source  = "nstance-dev/nstance/aws//modules/network"
  version = "~> 1.0"

  cluster       = module.cluster
  vpc_cidr_ipv4 = "172.18.0.0/16"

  # Define subnets by role and zone
  subnets = {
    # Public subnet with NAT gateway for outbound traffic
    "public" = {
      "us-west-2a" = [{
        ipv4_cidr   = "172.18.0.0/24"
        public      = true
        nat_gateway = true
      }]
    }
    # Nstance Server subnet routes through NAT
    "nstance" = {
      "us-west-2a" = [{
        ipv4_cidr  = "172.18.1.0/28"
        nat_subnet = "public"
      }]
    }
    # Worker subnet routes through NAT
    "workers" = {
      "us-west-2a" = [{
        ipv4_cidr  = "172.18.10.0/24"
        nat_subnet = "public"
      }]
    }
  }
}

module "shard" {
  source  = "nstance-dev/nstance/aws//modules/shard"
  version = "~> 1.0"

  cluster = module.cluster
  account = module.account
  network = module.network

  shard   = "us-west-2a"
  zone    = "us-west-2a"
  # server_subnet defaults to "nstance" - uses first subnet from that role in zone

  groups = {
    "default" = {
      "workers" = {
        size    = 1
        subnet_pool = "workers" # References key from subnets map
      }
    }
  }
}

Production Multi-AZ Deployment (AWS)

This example demonstrates a production-ready multi-AZ deployment with:

  • Existing VPC
  • Public subnets with NAT gateways (one per AZ for HA)
  • Existing database subnets (referenced only)
  • Private subnets for control-plane, ingress, and workers
  • NLB routing to ingress subnets
provider "aws" {
  region = "us-east-1"
}

module "cluster" {
  source  = "nstance-dev/nstance/aws//modules/cluster"
  version = "~> 1.0"
}

module "account" {
  source  = "nstance-dev/nstance/aws//modules/account"
  version = "~> 1.0"

  cluster = module.cluster
}

module "network" {
  source  = "nstance-dev/nstance/aws//modules/network"
  version = "~> 1.0"

  cluster = module.cluster

  # Use existing VPC
  vpc_id = "vpc-prod123"

  subnets = {
    # Public subnets with NAT gateways (one per AZ for high availability)
    "public" = {
      "us-east-1a" = [{ ipv4_cidr = "10.0.0.0/24", public = true, nat_gateway = true }]
      "us-east-1b" = [{ ipv4_cidr = "10.0.1.0/24", public = true, nat_gateway = true }]
      "us-east-1c" = [{ ipv4_cidr = "10.0.2.0/24", public = true, nat_gateway = true }]
    }

    # Reference existing database subnets (no routing changes needed)
    "database" = {
      "us-east-1a" = [{ existing = "subnet-db-1a" }]
      "us-east-1b" = [{ existing = "subnet-db-1b" }]
      "us-east-1c" = [{ existing = "subnet-db-1c" }]
    }

    # Server subnets for nstance-server instances
    "nstance" = {
      "us-east-1a" = [{ ipv4_cidr = "10.0.10.0/28", nat_subnet = "public" }]
      "us-east-1b" = [{ ipv4_cidr = "10.0.11.0/28", nat_subnet = "public" }]
      "us-east-1c" = [{ ipv4_cidr = "10.0.12.0/28", nat_subnet = "public" }]
    }

    # Control plane, ingress, and worker nodes
    "control-plane" = {
      "us-east-1a" = [{ ipv4_cidr = "10.0.20.0/24", nat_subnet = "public" }]
      "us-east-1b" = [{ ipv4_cidr = "10.0.21.0/24", nat_subnet = "public" }]
      "us-east-1c" = [{ ipv4_cidr = "10.0.22.0/24", nat_subnet = "public" }]
    }
    "ingress" = {
      "us-east-1a" = [{ ipv4_cidr = "10.0.30.0/24", nat_subnet = "public" }]
      "us-east-1b" = [{ ipv4_cidr = "10.0.31.0/24", nat_subnet = "public" }]
      "us-east-1c" = [{ ipv4_cidr = "10.0.32.0/24", nat_subnet = "public" }]
    }
    "workers" = {
      "us-east-1a" = [{ ipv4_cidr = "10.0.100.0/22", nat_subnet = "public" }]
      "us-east-1b" = [{ ipv4_cidr = "10.0.104.0/22", nat_subnet = "public" }]
      "us-east-1c" = [{ ipv4_cidr = "10.0.108.0/22", nat_subnet = "public" }]
    }
  }

  # Public load balancer on ports 80 and 443, placed in ingress subnets
  load_balancers = {
    www = { ports = [80, 443], subnets = "ingress", public = true }
  }
}

# Create shards for each AZ
module "shard_1a" {
  source  = "nstance-dev/nstance/aws//modules/shard"
  version = "~> 1.0"

  cluster = module.cluster
  account = module.account
  network = module.network

  shard = "us-east-1a"
  zone  = "us-east-1a"

  groups = {
    "default" = {
      "control-plane" = { size = 3, subnets = "control-plane" }
      "ingress"       = { size = 2, subnets = "ingress", load_balancers = ["www"] }
      "workers"       = { size = 10, subnets = "workers" }
    }
  }
}

module "shard_1b" {
  source  = "nstance-dev/nstance/aws//modules/shard"
  version = "~> 1.0"

  cluster = module.cluster
  account = module.account
  network = module.network

  shard   = "us-east-1b"
  zone    = "us-east-1b"

  groups = {
    "default" = {
      "control-plane" = { size = 3, subnets = "control-plane" }
      "ingress"       = { size = 2, subnets = "ingress", load_balancers = ["www"] }
      "workers"       = { size = 10, subnets = "workers" }
    }
  }
}

module "shard_1c" {
  source  = "nstance-dev/nstance/aws//modules/shard"
  version = "~> 1.0"

  cluster = module.cluster
  account = module.account
  network = module.network

  shard = "us-east-1c"
  zone  = "us-east-1c"

  groups = {
    "default" = {
      "control-plane" = { size = 3, subnets = "control-plane" }
      "ingress"       = { size = 2, subnets = "ingress", load_balancers = ["www"] }
      "workers"       = { size = 10, subnets = "workers" }
    }
  }
}

See the examples/ directory for additional configurations including GCP deployments and multi-cloud setups.

Module Documentation

Common Variables

All modules support these common variables for consistent naming and tagging:

VariableDescriptionDefault
name_prefixPrefix for all resource names"nstance"
tagsResource tags/labels (map of strings){}

Server Config Object

Each shard has a config file with server configuration in it. We support creating cluster-wide default configuration in the cluster module, and then expect to pass these down into each shard module invocation. Note that the shard module will overwrite select provider/account/region/zone-specific fields.

server_config = {
  request_timeout        = "30s"   # Request timeout
  create_rate_limit      = "100ms" # Duration between instance creates
  health_check_interval  = "60s"   # Expected agent health report interval
  default_drain_timeout  = "5m"    # Drain timeout before force delete (set to "0s" to disable Kubernetes drain coordination)
  image_refresh_interval = "6h"    # Image resolution refresh interval

  # Nested objects matching ServerConfig structure
  garbage_collection = {
    interval                 = "2m"   # How often to run GC
    registration_timeout     = "5m"   # Wait for registration before terminating
    deleted_record_retention = "30m"  # Keep deleted records for
  }

  leader_election = {
    frequent_interval   = "5s"  # Polling during transitions
    infrequent_interval = "30s" # Polling during stable periods
    leader_timeout      = "15s" # Time before considering leader failed
  }

  expiry = {
    eligible_age = ""  # Age for opportunistic expiry (e.g., "168h")
    forced_age   = ""  # Age for forced expiry (e.g., "720h")
    ondemand_age = ""  # Max age for on-demand instances
  }

  error_exit_jitter = {
    min_delay = "10s"  # Min delay before exit on error
    max_delay = "40s"  # Max delay before exit on error
  }

  bind = {
    health_addr       = "0.0.0.0:8990"  # HTTP health endpoint bind address
    election_addr     = "0.0.0.0:8991"  # HTTPS leader election bind address
    registration_addr = "0.0.0.0:8992"  # gRPC registration service bind address
    operator_addr     = "0.0.0.0:8993"  # gRPC operator service bind address
    agent_addr        = "0.0.0.0:8994"  # gRPC agent service bind address
  }

  advertise = {
    health_addr       = ":8990"  # Advertised health address
    election_addr     = ":8991"  # Advertised election address
    registration_addr = ":8992"  # Advertised registration address
    operator_addr     = ":8993"  # Advertised operator address
    agent_addr        = ":8994"  # Advertised agent address
  }
}

Cluster Module

Generates shared cluster resources:

  • Cluster ID (user-provided, lowercase alphanumeric with hyphens not leading/trailing/repeating, max 32 chars)
  • S3/GCS bucket for config and state
  • Encryption key in AWS/GCP Secrets Manager (only when secrets_provider="object-storage")

Key Variables:

NameDescriptionDefault
name_prefixPrefix for resource names"nstance"
cluster_idCluster ID (required)-
shardsOptional list of valid shard IDs for validation[]
bucketExisting S3/GCS bucket (if empty, a new bucket is created)""
versioningEnable object versioning on the bucket (increases storage costs)false
secrets_providerSecrets storage provider: object-storage (encrypted in bucket), aws-secrets-manager, or gcp-secret-manager"object-storage"
encryption_keyExisting encryption key secret (AWS: ARN, GCP: secret name). Only used when secrets_provider="object-storage". If empty, created.""
server_configServer configuration (if specified, merged over defaults){}

Outputs:

NameDescription
idCluster ID
name_prefixName prefix for resources
shardsList of valid shard IDs
bucketS3 bucket name (AWS) or GCS bucket name (GCP)
bucket_arnS3 bucket ARN (AWS only)
secrets_providerSecrets storage provider
encryption_key_sourceEncryption key source identifier for the secrets store
server_configServer configuration (defaults merged with user overrides)

Account Module

Creates IAM roles/service accounts:

  • Server role with EC2, S3, Secrets Manager, ELB permissions
  • Agent role with minimal EC2 describe permissions
  • Instance profiles (AWS)

Key Variables:

NameDescriptionDefault
clusterCluster module output-
name_prefixPrefix for resource names (defaults to cluster.name_prefix)null
enable_ssmEnable SSM access (AWS)true

Outputs:

NameDescription
server_iam_role_arnServer IAM role ARN (AWS) or service account email (GCP)
agent_iam_role_arnAgent IAM role ARN (AWS) or service account email (GCP)
server_instance_profile_arnServer instance profile ARN (AWS only)
agent_instance_profile_arnAgent instance profile ARN (AWS only)

Network Module

Creates VPC/network infrastructure:

  • VPC with specified CIDR
  • Internet Gateway
  • NAT Gateway / Cloud NAT
  • Route tables
  • VPC Endpoints (S3, SSM) on AWS
  • Group subnets (optional, via subnets variable)

Key Variables:

NameDescriptionDefault
clusterCluster module output (required)-
vpc_idExisting VPC ID (if set, skips VPC/IGW creation)""
vpc_cidr_ipv4VPC IPv4 CIDR block (required when creating new VPC, must be empty when using existing)""
name_prefixPrefix for resource names (defaults to cluster.name_prefix)null
enable_ipv6Enable IPv6 dual-stack supporttrue
enable_ssmCreate SSM VPC endpoints (AWS)true
subnetsSubnet definitions by role key and zone (see below){}
load_balancersLoad balancer definitions (see below){}

Shard validation uses cluster.shards - define valid shard IDs in the cluster module.

Subnets Variable Structure:

Each subnet definition supports the following attributes:

AttributeDescription
ipv4_cidrIPv4 CIDR block to create a new subnet
ipv6_netnumSubnet number for auto-computed IPv6 /64 (AWS: 0-255, GCP: 0-65535)
ipv6_cidrExplicit IPv6 CIDR block (alternative to ipv6_netnum)
existingReference an existing subnet by ID (mutually exclusive with ipv4_cidr)
public(bool) Route via Internet Gateway, assign public IPs
nat_gateway(bool) Place a NAT gateway in this subnet
nat_subnet(string) Route outbound traffic via NAT gateway in this role (same AZ)
shards(list) Restrict subnet to specific shard IDs

Routing Behavior:

  • public = true → Routes via Internet Gateway, instances get public IPs
  • nat_subnet = "X" → Routes via NAT gateway placed in role X’s subnet (same AZ)
  • Neither → Isolated subnet with user-managed routing

Routing fields (public, nat_subnet) work on both new AND existing subnets.

subnets = {
  # Public subnet with NAT gateway
  "public" = {
    "us-west-2a" = [{
      ipv4_cidr   = "172.18.0.0/24"
      public      = true
      nat_gateway = true
    }]
  }
  # Private subnet routing through NAT
  "private" = {
    "us-west-2a" = [{
      ipv4_cidr  = "172.18.10.0/24"
      nat_subnet = "public"  # Routes via NAT in "public" role (same AZ)
    }]
  }
  # Existing subnet with NAT routing
  "existing-private" = {
    "us-west-2a" = [{
      existing   = "subnet-abc123"
      nat_subnet = "public"  # Can add routing to existing subnets
    }]
  }
  # Isolated subnet (no routing)
  "isolated" = {
    "us-west-2a" = [{ existing = "subnet-db123" }]
  }
  # Shard-specific subnet
  "workers" = {
    "us-west-2a" = [{
      ipv4_cidr  = "172.18.20.0/24"
      nat_subnet = "public"
      shards     = ["us-west-2a-1"]  # Only available to this shard
    }]
  }
}

When shards variable is specified, the network module validates that all shard IDs in subnet shards filters are in the allowed list.

Load Balancers Variable Structure:

Each load balancer definition supports the following attributes:

AttributeDescription
ports(list of numbers) Ports to expose on the load balancer
subnets(string) Subnet role key from the subnets variable
public(bool, required) Whether the LB is internet-facing (true) or internal (false)

On AWS, public load balancers require public subnets (with IGW routes). The module validates this at plan time.

load_balancers = {
  # Public load balancer on ports 80 and 443, placed in ingress subnets
  "www" = {
    ports   = [80, 443]
    subnets = "ingress"
    public  = true
  }
  # Internal load balancer for API traffic
  "api" = {
    ports   = [8080]
    subnets = "workers"
    public  = false
  }
}

Instances are registered with load balancers via the shard module’s load_balancers field on groups.

Provider Differences:

FeatureAWSGCP
NAT GatewayPer-AZ (one NAT gateway per AZ for HA)Regional (Cloud NAT covers all subnets)
Route TablesPer-AZ private route tablesNot applicable (Cloud Router handles)
Public SubnetsRoute via IGW, public IPs assignedMarked for reference (load balancer placement)

Outputs:

NameDescription
vpc_idVPC ID (AWS) or network self_link (GCP)
vpc_cidr_ipv4VPC IPv4 CIDR block
vpc_cidr_ipv6VPC IPv6 CIDR block (null if disabled)
public_subnet_idsMap of AZ/zone → subnet ID/name for public subnets
nat_gateway_idsMap of AZ → NAT gateway ID (AWS) or {"regional": name} (GCP)
private_route_table_idsMap of AZ → route table ID (AWS only, empty for GCP)
subnet_idsMap of all managed subnet IDs by key (role key/zone/index)
subnetsSubnet metadata by role/zone with {id, shards, public} for each subnet
load_balancersMap of LB name → {dns_name, arn, target_group_arns} (AWS) or {ip_address, instance_groups} (GCP)

Shard Module

Deploys a single shard:

  • Security groups / firewall rules
  • Server instances
  • Shard config (S3/GCS object)
  • Load balancer (optional)

Note: All subnets (server and groups) are created by the network module and accessed via var.network.subnets. The shard module filters subnets internally based on shard and zone.

When cluster.shards is non-empty, the shard module validates that var.shard is in the list.

Key Variables:

NameDescriptionDefault
clusterCluster module output-
accountAccount module output-
networkNetwork module output (includes subnets with metadata)-
shardUnique shard identifier (must be in cluster.shards if set)-
zoneAvailability zone-
name_prefixPrefix for resource names (defaults to cluster.name_prefix)null
server_subnetSubnet role key from network.subnets for server instances"nstance"
dynamic_subnet_poolsList of subnet pools allowed for dynamic groups (empty = all)[]
groupsMap of group configurations (each group references a role from network subnets)-
templatesInstance templates (if empty, uses default; if specified, used as-is){}

Subnet Filtering:

The shard module automatically filters var.network.subnets to include only subnets that:

  1. Are in the shard’s zone
  2. Either have no shards filter (shared) or include the shard’s shard (isolated)

If no subnets are found after filtering, a validation error is raised (catches shard/zone typos).

Outputs:

NameDescription
shardThe shard ID
zoneThe zone for this shard
server_ipsList of server private IPs
server_idsList of server instance IDs
config_keyS3/GCS key for shard config
nlb_dnsLoad balancer DNS name (if enabled)

Architecture

┌───────────────────────────────────────────────────────────────────────────────────────────────┐
│                                   VPC (from network module)                                   │
│                                                                                               │
│                                      Internet Gateway                                         │
│                                             │                                                 │
│                    ┌────────────────────────┴────────────────────────┐                        │
│                    ▼                                                 ▼                        │
│  ┌────────────────────────────────────┐    ┌────────────────────────────────────┐             │
│  │   Public Subnet (AZ-A)             │    │   Public Subnet (AZ-B)             │             │
│  │  ┌──────────────────────────────┐  │    │  ┌──────────────────────────────┐  │             │
│  │  │   NAT Gateway (AZ-A)         │  │    │  │   NAT Gateway (AZ-B)         │  │             │
│  │  └──────────────────────────────┘  │    │  └──────────────────────────────┘  │             │
│  └────────────────────────────────────┘    └────────────────────────────────────┘             │
│                    │                                                 │                        │
│                    ▼                                                 ▼                        │
│  ┌────────────────────────────────────┐    ┌────────────────────────────────────┐             │
│  │   Shard Module A (AZ-A)            │    │   Shard Module B (AZ-B)            │             │
│  │  ┌──────────────────────────────┐  │    │  ┌──────────────────────────────┐  │             │
│  │  │   Server Subnet (Private)    │  │    │  │   Server Subnet (Private)    │  │             │
│  │  │  ┌────────────────────────┐  │  │    │  │  ┌────────────────────────┐  │  │             │
│  │  │  │ nstance-server         │  │  │    │  │  │ nstance-server         │  │  │             │
│  │  │  └────────────────────────┘  │  │    │  │  └────────────────────────┘  │  │             │
│  │  └──────────────────────────────┘  │    │  └──────────────────────────────┘  │             │
│  │                                    │    │                                    │             │
│  │  ┌──────────────────────────────┐  │    │  ┌──────────────────────────────┐  │             │
│  │  │   Group Subnets (Private)    │  │    │  │   Group Subnets (Private)    │  │             │
│  │  │  ┌────────────────────────┐  │  │    │  │  ┌────────────────────────┐  │  │             │
│  │  │  │ nstance-agent          │  │  │    │  │  │ nstance-agent          │  │  │             │
│  │  │  │ (provisioned by server)│  │  │    │  │  │ (provisioned by server)│  │  │             │
│  │  │  └────────────────────────┘  │  │    │  │  └────────────────────────┘  │  │             │
│  │  └──────────────────────────────┘  │    │  └──────────────────────────────┘  │             │
│  └────────────────────────────────────┘    └────────────────────────────────────┘             │
└───────────────────────────────────────────────────────────────────────────────────────────────┘
                                              │
                                              ▼
                          ┌─────────────────────────────────────┐
                          │   Cluster Module (shared storage)   │
                          │  S3/GCS bucket, Secrets Manager     │
                          └─────────────────────────────────────┘

Tearing Down Infrastructure

S3/GCS buckets are protected from accidental deletion. With force_destroy unset (the default), tofu destroy will fail on non-empty buckets. If versioning is enabled, versioned objects must also be removed first.

Preserve state when deleting a cluster (recommended for reprovisioning):

# 1. Destroy the nstance-server instances to stop them from managing instances
# AWS:
tofu destroy -target=module.shard.aws_autoscaling_group.server
# GCP:
tofu destroy -target=module.shard.google_compute_instance_group_manager.server

# 2. Terminate any remaining Nstance-managed instances (nstance-server provisions these outside of OpenTofu)

# AWS:
INSTANCE_IDS=$(aws ec2 describe-instances \
  --filters "Name=tag:nstance:managed,Values=true" "Name=tag:nstance:cluster-id,Values=<cluster-id>" "Name=instance-state-name,Values=running,stopped,pending" \
  --query 'Reservations[].Instances[].InstanceId' --output text)
if [ -n "$INSTANCE_IDS" ]; then
  aws ec2 terminate-instances --instance-ids $INSTANCE_IDS
  aws ec2 wait instance-terminated --instance-ids $INSTANCE_IDS
fi

# GCP:
gcloud compute instances list \
  --filter="labels.nstance-managed=true AND labels.nstance-cluster-id=<cluster-id>" \
  --format="value(name,zone)" | while read NAME ZONE; do
    gcloud compute instances delete "$NAME" --zone="$ZONE" --quiet
  done

# 3. Destroy remaining compute and networking (keeps bucket and secrets intact)
tofu destroy -target=module.account -target=module.network

Full teardown including deleting cluster state (bucket and secrets):

# 1. Destroy the nstance-server instances to stop them from managing instances
# AWS:
tofu destroy -target=module.shard.aws_autoscaling_group.server
# GCP:
tofu destroy -target=module.shard.google_compute_instance_group_manager.server

# 2. Terminate any remaining Nstance-managed instances

# AWS:
INSTANCE_IDS=$(aws ec2 describe-instances \
  --filters "Name=tag:nstance:managed,Values=true" "Name=tag:nstance:cluster-id,Values=<cluster-id>" "Name=instance-state-name,Values=running,stopped,pending" \
  --query 'Reservations[].Instances[].InstanceId' --output text)
if [ -n "$INSTANCE_IDS" ]; then
  aws ec2 terminate-instances --instance-ids $INSTANCE_IDS
  aws ec2 wait instance-terminated --instance-ids $INSTANCE_IDS
fi

# GCP:
gcloud compute instances list \
  --filter="labels.nstance-managed=true AND labels.nstance-cluster-id=<cluster-id>" \
  --format="value(name,zone)" | while read NAME ZONE; do
    gcloud compute instances delete "$NAME" --zone="$ZONE" --quiet
  done

# 3. Destroy remaining infrastructure except the bucket
tofu destroy -target=module.account -target=module.network

# 4. Force-delete the bucket (including all object versions and delete markers)

# AWS:
BUCKET_NAME=$(tofu state show 'module.cluster.aws_s3_bucket.nstance[0]' | awk -F'"' '/^[[:space:]]*bucket[[:space:]]*=/ { print $2 }')
aws s3 rb "s3://${BUCKET_NAME}" --force
tofu state rm 'module.cluster.aws_s3_bucket.nstance[0]'

# GCP:
BUCKET_NAME=$(tofu state show 'module.cluster.google_storage_bucket.nstance[0]' | awk -F'"' '/^[[:space:]]*name[[:space:]]*=/ { print $2 }')
gcloud storage rm -r "gs://${BUCKET_NAME}"
tofu state rm 'module.cluster.google_storage_bucket.nstance[0]'

# 5. Destroy remaining resources
tofu destroy