Nstance OpenTofu/Terraform Modules
Nstance provides OpenTofu/Terraform modules with a unified, cloud-agnostic interface for deploying Nstance in Amazon Web Services (AWS) and/or Google Cloud (GCP). Each module has a consistent variable interface with cloud-specific implementations underneath:
clustergenerates a cluster ID, a baseline configuration, and provisions or links cluster-wide resources: a S3/GCS bucket, and encrypted secrets.accountcreates IAM roles/instance profiles per AWS account/GCP project.networkVPC/network setup per account/project & region.sharddeploys one Nstance zone “shard” (nstance-server instance + group instances).
The delineation of these modules enables cluster deployments to scale from single account/zone to multi-cloud/account/zone with a unified configuration.
Deployed Resources Per-Module
| Resource Type | cluster | account | network | shard |
|---|---|---|---|---|
| Cluster ID / Root CA | ✓ | |||
| S3/GCS Bucket | ✓ | |||
| Encryption Key (Secrets Manager) | ✓ | |||
| Server IAM Role | ✓ | |||
| Agent IAM Role | ✓ | |||
| Instance Profiles | ✓ | |||
| VPC/VPC Network | ✓ | |||
| Internet Gateway | ✓ | |||
| NAT Gateway/Cloud NAT | ✓ | |||
| Route Tables | ✓ | |||
| VPC Endpoints (S3, SSM, etc.) | ✓ | |||
| Group Subnets (for agents) | ✓ | |||
| Server Subnets | ✓ | |||
| Load Balancers (NLB / Regional LB) | ✓ | |||
| Security Groups/Firewall Rules | ✓ | |||
| Server Instances | ✓ | |||
| Group Instances | * | |||
| Shard Config (S3 object) | ✓ |
*Group Instances are provisioned by nstance-server, not OpenTofu/Terraform.
Cloud-Specific Modules
Each cloud provider has its own published module repository:
- AWS:
nstance-dev/nstance/aws//modules/{module}- Source:
github.com/nstance-dev/terraform-aws-nstance//{module}
- Source:
- GCP:
nstance-dev/nstance/gcp//modules/{module}- Source:
github.com/nstance-dev/terraform-gcp-nstance//{module}
- Source:
Region and project are inferred from the provider configuration via data sources (data.aws_region.current or data.google_client_config.current), to minimise the number of required variables per module.
Development Structure
Module source code lives in the main github.com/nstance-dev/nstance repository under deploy/tf/. Modules share a unified variable interface — common variable definitions live in deploy/tf/common/ and are symlinked into each cloud module. During release, the cloud-specific modules are synced to their respective repositories with symlinks replaced by actual files.
deploy/tf/
├── common/ # Shared variable definitions (symlinked into cloud modules)
│ ├── cluster/
│ │ └── variables.tf
│ ├── account/
│ │ └── variables.tf
│ ├── network/
│ │ └── variables.tf
│ └── shard/
│ └── variables.tf
│
├── aws/ # AWS-specific implementations → synced to terraform-aws-nstance
│ ├── cluster/
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── versions.tf
│ │ └── variables.tf -> ../../common/cluster/variables.tf
│ ├── account/
│ ├── network/
│ └── shard/
│
├── gcp/ # GCP-specific implementations → synced to terraform-gcp-nstance
│ ├── cluster/
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ ├── versions.tf
│ │ └── variables.tf -> ../../common/cluster/variables.tf
│ ├── account/
│ ├── network/
│ └── shard/
│
└── examples/ # Example configurations (synced to respective repos)
├── aws/
│ ├── single-shard/
│ └── multi-az/
├── gcp/
│ ├── single-shard/
│ └── multi-az/
└── multi-cloud/Prerequisites
- OpenTofu >= 1.6.0 or Terraform >= 1.5.0
- Cloud provider CLI (
awsorgcloud) configured with appropriate credentials. - GitHub releases available for nstance-server and nstance-agent (or custom binary URLs).
GCP Note: The cluster module automatically enables required GCP APIs (Compute, Secret Manager, Storage, IAM, IAP) on first apply, if not already enabled. If all services need to be enabled it adds ~30-60 seconds to the initial deployment but eliminates manual gcloud services enable commands.
Security
- All instances run in private subnets by default, with public/NAT egress options available.
- Automatic VPC endpoints eliminate need for internet access to cloud services.
- Instance metadata service configuration uses secure defaults (i.e. IMDSv2 on AWS).
- Instance volumes are encrypted by default.
- IAM roles are separated per use case, each with least-privilege permissions.
- Encryption key stored in secrets manager (not in Terraform state).
- S3 buckets are encrypted by default.
- S3/GCS buckets have deletion protection by default.
IPv4+IPv6 Dual-Stack Support
Both AWS and GCP support IPv4+IPv6 dual-stack networking. IPv6 is enabled by default, but can be disabled by setting enable_ipv6 = false for the network module.
Provider Differences
| AWS | GCP | |
|---|---|---|
| IPv6 type | Amazon-provided public /56 | Internal ULA /48 (private) |
| Address scope | Globally routable | VPC-internal only |
| Assignment | Auto-generated on VPC creation | Auto-generated on VPC creation |
| Subnet CIDRs | Specify ipv6_netnum (0-255) per subnet | Specify ipv6_netnum (0-65535) per subnet |
| Private egress | Egress-only Internet Gateway | Cloud NAT (same as IPv4) |
Using IPv6
When enable_ipv6 = true (the default), each subnet needs an IPv6 CIDR. You can specify this in two ways:
ipv6_netnum(recommended) - Subnet number that auto-computes a /64 from the VPC’s cloud-assigned block (AWS: 0-255 from /56, GCP: 0-65535 from /48)ipv6_cidr- Explicit IPv6 CIDR block
module "network" {
source = "nstance-dev/nstance/aws//modules/network"
version = "~> 1.0"
vpc_cidr_ipv4 = "172.18.0.0/16"
subnets = {
"public" = {
"us-west-2a" = [{
ipv4_cidr = "172.18.0.0/24"
ipv6_netnum = 0 # Auto-computes /64 from VPC's /56
public = true
nat_gateway = true
}]
}
"nstance" = {
"us-west-2a" = [{
ipv4_cidr = "172.18.1.0/28"
ipv6_netnum = 1
nat_subnet = "public"
}]
}
}
}Disabling IPv6
To disable IPv6 and use IPv4-only networking:
module "network" {
source = "nstance-dev/nstance/aws//modules/network"
version = "~> 1.0"
vpc_cidr_ipv4 = "172.18.0.0/16"
enable_ipv6 = false
subnets = {
"nstance" = {
"us-west-2a" = [{
ipv4_cidr = "172.18.1.0/28"
nat_subnet = "public"
}]
}
}
}Quick Start
Minimal Single-Shard Deployment (AWS)
provider "aws" {
region = "us-west-2"
}
module "cluster" {
source = "nstance-dev/nstance/aws//modules/cluster"
version = "~> 1.0"
}
module "account" {
source = "nstance-dev/nstance/aws//modules/account"
version = "~> 1.0"
cluster = module.cluster
}
module "network" {
source = "nstance-dev/nstance/aws//modules/network"
version = "~> 1.0"
cluster = module.cluster
vpc_cidr_ipv4 = "172.18.0.0/16"
# Define subnets by role and zone
subnets = {
# Public subnet with NAT gateway for outbound traffic
"public" = {
"us-west-2a" = [{
ipv4_cidr = "172.18.0.0/24"
public = true
nat_gateway = true
}]
}
# Nstance Server subnet routes through NAT
"nstance" = {
"us-west-2a" = [{
ipv4_cidr = "172.18.1.0/28"
nat_subnet = "public"
}]
}
# Worker subnet routes through NAT
"workers" = {
"us-west-2a" = [{
ipv4_cidr = "172.18.10.0/24"
nat_subnet = "public"
}]
}
}
}
module "shard" {
source = "nstance-dev/nstance/aws//modules/shard"
version = "~> 1.0"
cluster = module.cluster
account = module.account
network = module.network
shard = "us-west-2a"
zone = "us-west-2a"
# server_subnet defaults to "nstance" - uses first subnet from that role in zone
groups = {
"default" = {
"workers" = {
size = 1
subnet_pool = "workers" # References key from subnets map
}
}
}
}Production Multi-AZ Deployment (AWS)
This example demonstrates a production-ready multi-AZ deployment with:
- Existing VPC
- Public subnets with NAT gateways (one per AZ for HA)
- Existing database subnets (referenced only)
- Private subnets for control-plane, ingress, and workers
- NLB routing to ingress subnets
provider "aws" {
region = "us-east-1"
}
module "cluster" {
source = "nstance-dev/nstance/aws//modules/cluster"
version = "~> 1.0"
}
module "account" {
source = "nstance-dev/nstance/aws//modules/account"
version = "~> 1.0"
cluster = module.cluster
}
module "network" {
source = "nstance-dev/nstance/aws//modules/network"
version = "~> 1.0"
cluster = module.cluster
# Use existing VPC
vpc_id = "vpc-prod123"
subnets = {
# Public subnets with NAT gateways (one per AZ for high availability)
"public" = {
"us-east-1a" = [{ ipv4_cidr = "10.0.0.0/24", public = true, nat_gateway = true }]
"us-east-1b" = [{ ipv4_cidr = "10.0.1.0/24", public = true, nat_gateway = true }]
"us-east-1c" = [{ ipv4_cidr = "10.0.2.0/24", public = true, nat_gateway = true }]
}
# Reference existing database subnets (no routing changes needed)
"database" = {
"us-east-1a" = [{ existing = "subnet-db-1a" }]
"us-east-1b" = [{ existing = "subnet-db-1b" }]
"us-east-1c" = [{ existing = "subnet-db-1c" }]
}
# Server subnets for nstance-server instances
"nstance" = {
"us-east-1a" = [{ ipv4_cidr = "10.0.10.0/28", nat_subnet = "public" }]
"us-east-1b" = [{ ipv4_cidr = "10.0.11.0/28", nat_subnet = "public" }]
"us-east-1c" = [{ ipv4_cidr = "10.0.12.0/28", nat_subnet = "public" }]
}
# Control plane, ingress, and worker nodes
"control-plane" = {
"us-east-1a" = [{ ipv4_cidr = "10.0.20.0/24", nat_subnet = "public" }]
"us-east-1b" = [{ ipv4_cidr = "10.0.21.0/24", nat_subnet = "public" }]
"us-east-1c" = [{ ipv4_cidr = "10.0.22.0/24", nat_subnet = "public" }]
}
"ingress" = {
"us-east-1a" = [{ ipv4_cidr = "10.0.30.0/24", nat_subnet = "public" }]
"us-east-1b" = [{ ipv4_cidr = "10.0.31.0/24", nat_subnet = "public" }]
"us-east-1c" = [{ ipv4_cidr = "10.0.32.0/24", nat_subnet = "public" }]
}
"workers" = {
"us-east-1a" = [{ ipv4_cidr = "10.0.100.0/22", nat_subnet = "public" }]
"us-east-1b" = [{ ipv4_cidr = "10.0.104.0/22", nat_subnet = "public" }]
"us-east-1c" = [{ ipv4_cidr = "10.0.108.0/22", nat_subnet = "public" }]
}
}
# Public load balancer on ports 80 and 443, placed in ingress subnets
load_balancers = {
www = { ports = [80, 443], subnets = "ingress", public = true }
}
}
# Create shards for each AZ
module "shard_1a" {
source = "nstance-dev/nstance/aws//modules/shard"
version = "~> 1.0"
cluster = module.cluster
account = module.account
network = module.network
shard = "us-east-1a"
zone = "us-east-1a"
groups = {
"default" = {
"control-plane" = { size = 3, subnets = "control-plane" }
"ingress" = { size = 2, subnets = "ingress", load_balancers = ["www"] }
"workers" = { size = 10, subnets = "workers" }
}
}
}
module "shard_1b" {
source = "nstance-dev/nstance/aws//modules/shard"
version = "~> 1.0"
cluster = module.cluster
account = module.account
network = module.network
shard = "us-east-1b"
zone = "us-east-1b"
groups = {
"default" = {
"control-plane" = { size = 3, subnets = "control-plane" }
"ingress" = { size = 2, subnets = "ingress", load_balancers = ["www"] }
"workers" = { size = 10, subnets = "workers" }
}
}
}
module "shard_1c" {
source = "nstance-dev/nstance/aws//modules/shard"
version = "~> 1.0"
cluster = module.cluster
account = module.account
network = module.network
shard = "us-east-1c"
zone = "us-east-1c"
groups = {
"default" = {
"control-plane" = { size = 3, subnets = "control-plane" }
"ingress" = { size = 2, subnets = "ingress", load_balancers = ["www"] }
"workers" = { size = 10, subnets = "workers" }
}
}
}See the examples/ directory for additional configurations including GCP deployments and multi-cloud setups.
Module Documentation
Common Variables
All modules support these common variables for consistent naming and tagging:
| Variable | Description | Default |
|---|---|---|
name_prefix | Prefix for all resource names | "nstance" |
tags | Resource tags/labels (map of strings) | {} |
Server Config Object
Each shard has a config file with server configuration in it. We support creating cluster-wide default configuration in the cluster module, and then expect to pass these down into each shard module invocation. Note that the shard module will overwrite select provider/account/region/zone-specific fields.
server_config = {
request_timeout = "30s" # Request timeout
create_rate_limit = "100ms" # Duration between instance creates
health_check_interval = "60s" # Expected agent health report interval
default_drain_timeout = "5m" # Drain timeout before force delete (set to "0s" to disable Kubernetes drain coordination)
image_refresh_interval = "6h" # Image resolution refresh interval
# Nested objects matching ServerConfig structure
garbage_collection = {
interval = "2m" # How often to run GC
registration_timeout = "5m" # Wait for registration before terminating
deleted_record_retention = "30m" # Keep deleted records for
}
leader_election = {
frequent_interval = "5s" # Polling during transitions
infrequent_interval = "30s" # Polling during stable periods
leader_timeout = "15s" # Time before considering leader failed
}
expiry = {
eligible_age = "" # Age for opportunistic expiry (e.g., "168h")
forced_age = "" # Age for forced expiry (e.g., "720h")
ondemand_age = "" # Max age for on-demand instances
}
error_exit_jitter = {
min_delay = "10s" # Min delay before exit on error
max_delay = "40s" # Max delay before exit on error
}
bind = {
health_addr = "0.0.0.0:8990" # HTTP health endpoint bind address
election_addr = "0.0.0.0:8991" # HTTPS leader election bind address
registration_addr = "0.0.0.0:8992" # gRPC registration service bind address
operator_addr = "0.0.0.0:8993" # gRPC operator service bind address
agent_addr = "0.0.0.0:8994" # gRPC agent service bind address
}
advertise = {
health_addr = ":8990" # Advertised health address
election_addr = ":8991" # Advertised election address
registration_addr = ":8992" # Advertised registration address
operator_addr = ":8993" # Advertised operator address
agent_addr = ":8994" # Advertised agent address
}
}Cluster Module
Generates shared cluster resources:
- Cluster ID (user-provided, lowercase alphanumeric with hyphens not leading/trailing/repeating, max 32 chars)
- S3/GCS bucket for config and state
- Encryption key in AWS/GCP Secrets Manager (only when
secrets_provider="object-storage")
Key Variables:
| Name | Description | Default |
|---|---|---|
name_prefix | Prefix for resource names | "nstance" |
cluster_id | Cluster ID (required) | - |
shards | Optional list of valid shard IDs for validation | [] |
bucket | Existing S3/GCS bucket (if empty, a new bucket is created) | "" |
versioning | Enable object versioning on the bucket (increases storage costs) | false |
secrets_provider | Secrets storage provider: object-storage (encrypted in bucket), aws-secrets-manager, or gcp-secret-manager | "object-storage" |
encryption_key | Existing encryption key secret (AWS: ARN, GCP: secret name). Only used when secrets_provider="object-storage". If empty, created. | "" |
server_config | Server configuration (if specified, merged over defaults) | {} |
Outputs:
| Name | Description |
|---|---|
id | Cluster ID |
name_prefix | Name prefix for resources |
shards | List of valid shard IDs |
bucket | S3 bucket name (AWS) or GCS bucket name (GCP) |
bucket_arn | S3 bucket ARN (AWS only) |
secrets_provider | Secrets storage provider |
encryption_key_source | Encryption key source identifier for the secrets store |
server_config | Server configuration (defaults merged with user overrides) |
Account Module
Creates IAM roles/service accounts:
- Server role with EC2, S3, Secrets Manager, ELB permissions
- Agent role with minimal EC2 describe permissions
- Instance profiles (AWS)
Key Variables:
| Name | Description | Default |
|---|---|---|
cluster | Cluster module output | - |
name_prefix | Prefix for resource names (defaults to cluster.name_prefix) | null |
enable_ssm | Enable SSM access (AWS) | true |
Outputs:
| Name | Description |
|---|---|
server_iam_role_arn | Server IAM role ARN (AWS) or service account email (GCP) |
agent_iam_role_arn | Agent IAM role ARN (AWS) or service account email (GCP) |
server_instance_profile_arn | Server instance profile ARN (AWS only) |
agent_instance_profile_arn | Agent instance profile ARN (AWS only) |
Network Module
Creates VPC/network infrastructure:
- VPC with specified CIDR
- Internet Gateway
- NAT Gateway / Cloud NAT
- Route tables
- VPC Endpoints (S3, SSM) on AWS
- Group subnets (optional, via
subnetsvariable)
Key Variables:
| Name | Description | Default |
|---|---|---|
cluster | Cluster module output (required) | - |
vpc_id | Existing VPC ID (if set, skips VPC/IGW creation) | "" |
vpc_cidr_ipv4 | VPC IPv4 CIDR block (required when creating new VPC, must be empty when using existing) | "" |
name_prefix | Prefix for resource names (defaults to cluster.name_prefix) | null |
enable_ipv6 | Enable IPv6 dual-stack support | true |
enable_ssm | Create SSM VPC endpoints (AWS) | true |
subnets | Subnet definitions by role key and zone (see below) | {} |
load_balancers | Load balancer definitions (see below) | {} |
Shard validation uses cluster.shards - define valid shard IDs in the cluster module.
Subnets Variable Structure:
Each subnet definition supports the following attributes:
| Attribute | Description |
|---|---|
ipv4_cidr | IPv4 CIDR block to create a new subnet |
ipv6_netnum | Subnet number for auto-computed IPv6 /64 (AWS: 0-255, GCP: 0-65535) |
ipv6_cidr | Explicit IPv6 CIDR block (alternative to ipv6_netnum) |
existing | Reference an existing subnet by ID (mutually exclusive with ipv4_cidr) |
public | (bool) Route via Internet Gateway, assign public IPs |
nat_gateway | (bool) Place a NAT gateway in this subnet |
nat_subnet | (string) Route outbound traffic via NAT gateway in this role (same AZ) |
shards | (list) Restrict subnet to specific shard IDs |
Routing Behavior:
public = true→ Routes via Internet Gateway, instances get public IPsnat_subnet = "X"→ Routes via NAT gateway placed in role X’s subnet (same AZ)- Neither → Isolated subnet with user-managed routing
Routing fields (public, nat_subnet) work on both new AND existing subnets.
subnets = {
# Public subnet with NAT gateway
"public" = {
"us-west-2a" = [{
ipv4_cidr = "172.18.0.0/24"
public = true
nat_gateway = true
}]
}
# Private subnet routing through NAT
"private" = {
"us-west-2a" = [{
ipv4_cidr = "172.18.10.0/24"
nat_subnet = "public" # Routes via NAT in "public" role (same AZ)
}]
}
# Existing subnet with NAT routing
"existing-private" = {
"us-west-2a" = [{
existing = "subnet-abc123"
nat_subnet = "public" # Can add routing to existing subnets
}]
}
# Isolated subnet (no routing)
"isolated" = {
"us-west-2a" = [{ existing = "subnet-db123" }]
}
# Shard-specific subnet
"workers" = {
"us-west-2a" = [{
ipv4_cidr = "172.18.20.0/24"
nat_subnet = "public"
shards = ["us-west-2a-1"] # Only available to this shard
}]
}
}When shards variable is specified, the network module validates that all shard IDs in subnet shards filters are in the allowed list.
Load Balancers Variable Structure:
Each load balancer definition supports the following attributes:
| Attribute | Description |
|---|---|
ports | (list of numbers) Ports to expose on the load balancer |
subnets | (string) Subnet role key from the subnets variable |
public | (bool, required) Whether the LB is internet-facing (true) or internal (false) |
On AWS, public load balancers require public subnets (with IGW routes). The module validates this at plan time.
load_balancers = {
# Public load balancer on ports 80 and 443, placed in ingress subnets
"www" = {
ports = [80, 443]
subnets = "ingress"
public = true
}
# Internal load balancer for API traffic
"api" = {
ports = [8080]
subnets = "workers"
public = false
}
}Instances are registered with load balancers via the shard module’s load_balancers field on groups.
Provider Differences:
| Feature | AWS | GCP |
|---|---|---|
| NAT Gateway | Per-AZ (one NAT gateway per AZ for HA) | Regional (Cloud NAT covers all subnets) |
| Route Tables | Per-AZ private route tables | Not applicable (Cloud Router handles) |
| Public Subnets | Route via IGW, public IPs assigned | Marked for reference (load balancer placement) |
Outputs:
| Name | Description |
|---|---|
vpc_id | VPC ID (AWS) or network self_link (GCP) |
vpc_cidr_ipv4 | VPC IPv4 CIDR block |
vpc_cidr_ipv6 | VPC IPv6 CIDR block (null if disabled) |
public_subnet_ids | Map of AZ/zone → subnet ID/name for public subnets |
nat_gateway_ids | Map of AZ → NAT gateway ID (AWS) or {"regional": name} (GCP) |
private_route_table_ids | Map of AZ → route table ID (AWS only, empty for GCP) |
subnet_ids | Map of all managed subnet IDs by key (role key/zone/index) |
subnets | Subnet metadata by role/zone with {id, shards, public} for each subnet |
load_balancers | Map of LB name → {dns_name, arn, target_group_arns} (AWS) or {ip_address, instance_groups} (GCP) |
Shard Module
Deploys a single shard:
- Security groups / firewall rules
- Server instances
- Shard config (S3/GCS object)
- Load balancer (optional)
Note: All subnets (server and groups) are created by the network module and accessed via var.network.subnets.
The shard module filters subnets internally based on shard and zone.
When cluster.shards is non-empty, the shard module validates that var.shard is in the list.
Key Variables:
| Name | Description | Default |
|---|---|---|
cluster | Cluster module output | - |
account | Account module output | - |
network | Network module output (includes subnets with metadata) | - |
shard | Unique shard identifier (must be in cluster.shards if set) | - |
zone | Availability zone | - |
name_prefix | Prefix for resource names (defaults to cluster.name_prefix) | null |
server_subnet | Subnet role key from network.subnets for server instances | "nstance" |
dynamic_subnet_pools | List of subnet pools allowed for dynamic groups (empty = all) | [] |
groups | Map of group configurations (each group references a role from network subnets) | - |
templates | Instance templates (if empty, uses default; if specified, used as-is) | {} |
Subnet Filtering:
The shard module automatically filters var.network.subnets to include only subnets that:
- Are in the shard’s zone
- Either have no
shardsfilter (shared) or include the shard’sshard(isolated)
If no subnets are found after filtering, a validation error is raised (catches shard/zone typos).
Outputs:
| Name | Description |
|---|---|
shard | The shard ID |
zone | The zone for this shard |
server_ips | List of server private IPs |
server_ids | List of server instance IDs |
config_key | S3/GCS key for shard config |
nlb_dns | Load balancer DNS name (if enabled) |
Architecture
┌───────────────────────────────────────────────────────────────────────────────────────────────┐
│ VPC (from network module) │
│ │
│ Internet Gateway │
│ │ │
│ ┌────────────────────────┴────────────────────────┐ │
│ ▼ ▼ │
│ ┌────────────────────────────────────┐ ┌────────────────────────────────────┐ │
│ │ Public Subnet (AZ-A) │ │ Public Subnet (AZ-B) │ │
│ │ ┌──────────────────────────────┐ │ │ ┌──────────────────────────────┐ │ │
│ │ │ NAT Gateway (AZ-A) │ │ │ │ NAT Gateway (AZ-B) │ │ │
│ │ └──────────────────────────────┘ │ │ └──────────────────────────────┘ │ │
│ └────────────────────────────────────┘ └────────────────────────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌────────────────────────────────────┐ ┌────────────────────────────────────┐ │
│ │ Shard Module A (AZ-A) │ │ Shard Module B (AZ-B) │ │
│ │ ┌──────────────────────────────┐ │ │ ┌──────────────────────────────┐ │ │
│ │ │ Server Subnet (Private) │ │ │ │ Server Subnet (Private) │ │ │
│ │ │ ┌────────────────────────┐ │ │ │ │ ┌────────────────────────┐ │ │ │
│ │ │ │ nstance-server │ │ │ │ │ │ nstance-server │ │ │ │
│ │ │ └────────────────────────┘ │ │ │ │ └────────────────────────┘ │ │ │
│ │ └──────────────────────────────┘ │ │ └──────────────────────────────┘ │ │
│ │ │ │ │ │
│ │ ┌──────────────────────────────┐ │ │ ┌──────────────────────────────┐ │ │
│ │ │ Group Subnets (Private) │ │ │ │ Group Subnets (Private) │ │ │
│ │ │ ┌────────────────────────┐ │ │ │ │ ┌────────────────────────┐ │ │ │
│ │ │ │ nstance-agent │ │ │ │ │ │ nstance-agent │ │ │ │
│ │ │ │ (provisioned by server)│ │ │ │ │ │ (provisioned by server)│ │ │ │
│ │ │ └────────────────────────┘ │ │ │ │ └────────────────────────┘ │ │ │
│ │ └──────────────────────────────┘ │ │ └──────────────────────────────┘ │ │
│ └────────────────────────────────────┘ └────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Cluster Module (shared storage) │
│ S3/GCS bucket, Secrets Manager │
└─────────────────────────────────────┘Tearing Down Infrastructure
S3/GCS buckets are protected from accidental deletion. With force_destroy unset (the default), tofu destroy will fail on non-empty buckets. If versioning is enabled, versioned objects must also be removed first.
Preserve state when deleting a cluster (recommended for reprovisioning):
# 1. Destroy the nstance-server instances to stop them from managing instances
# AWS:
tofu destroy -target=module.shard.aws_autoscaling_group.server
# GCP:
tofu destroy -target=module.shard.google_compute_instance_group_manager.server
# 2. Terminate any remaining Nstance-managed instances (nstance-server provisions these outside of OpenTofu)
# AWS:
INSTANCE_IDS=$(aws ec2 describe-instances \
--filters "Name=tag:nstance:managed,Values=true" "Name=tag:nstance:cluster-id,Values=<cluster-id>" "Name=instance-state-name,Values=running,stopped,pending" \
--query 'Reservations[].Instances[].InstanceId' --output text)
if [ -n "$INSTANCE_IDS" ]; then
aws ec2 terminate-instances --instance-ids $INSTANCE_IDS
aws ec2 wait instance-terminated --instance-ids $INSTANCE_IDS
fi
# GCP:
gcloud compute instances list \
--filter="labels.nstance-managed=true AND labels.nstance-cluster-id=<cluster-id>" \
--format="value(name,zone)" | while read NAME ZONE; do
gcloud compute instances delete "$NAME" --zone="$ZONE" --quiet
done
# 3. Destroy remaining compute and networking (keeps bucket and secrets intact)
tofu destroy -target=module.account -target=module.networkFull teardown including deleting cluster state (bucket and secrets):
# 1. Destroy the nstance-server instances to stop them from managing instances
# AWS:
tofu destroy -target=module.shard.aws_autoscaling_group.server
# GCP:
tofu destroy -target=module.shard.google_compute_instance_group_manager.server
# 2. Terminate any remaining Nstance-managed instances
# AWS:
INSTANCE_IDS=$(aws ec2 describe-instances \
--filters "Name=tag:nstance:managed,Values=true" "Name=tag:nstance:cluster-id,Values=<cluster-id>" "Name=instance-state-name,Values=running,stopped,pending" \
--query 'Reservations[].Instances[].InstanceId' --output text)
if [ -n "$INSTANCE_IDS" ]; then
aws ec2 terminate-instances --instance-ids $INSTANCE_IDS
aws ec2 wait instance-terminated --instance-ids $INSTANCE_IDS
fi
# GCP:
gcloud compute instances list \
--filter="labels.nstance-managed=true AND labels.nstance-cluster-id=<cluster-id>" \
--format="value(name,zone)" | while read NAME ZONE; do
gcloud compute instances delete "$NAME" --zone="$ZONE" --quiet
done
# 3. Destroy remaining infrastructure except the bucket
tofu destroy -target=module.account -target=module.network
# 4. Force-delete the bucket (including all object versions and delete markers)
# AWS:
BUCKET_NAME=$(tofu state show 'module.cluster.aws_s3_bucket.nstance[0]' | awk -F'"' '/^[[:space:]]*bucket[[:space:]]*=/ { print $2 }')
aws s3 rb "s3://${BUCKET_NAME}" --force
tofu state rm 'module.cluster.aws_s3_bucket.nstance[0]'
# GCP:
BUCKET_NAME=$(tofu state show 'module.cluster.google_storage_bucket.nstance[0]' | awk -F'"' '/^[[:space:]]*name[[:space:]]*=/ { print $2 }')
gcloud storage rm -r "gs://${BUCKET_NAME}"
tofu state rm 'module.cluster.google_storage_bucket.nstance[0]'
# 5. Destroy remaining resources
tofu destroy