Skip to content
Google Cloud

Google Cloud Integration Guide for Nstance

This document provides a comprehensive overview of how Nstance integrates with Google Cloud services, including the specific APIs used, IAM permissions required, and operational considerations.

Overview

Nstance Server leverages multiple Google Cloud services for infrastructure management, configuration storage, secrets management, and load balancing. The server uses the Google Cloud Go client libraries (cloud.google.com/go) and the older google.golang.org/api client for some operations.

Google Cloud Services Used

1. Compute Engine

Nstance uses Compute Engine for virtual machine lifecycle management across multiple operations:

  • Instance Provisioning: Creating new VM instances with custom configurations
  • Instance Termination: Deleting unhealthy or expired instances
  • Health Monitoring: Querying instance status for reconciliation decisions
  • Leader Network Management: Assigning/removing alias IPs for shard leadership
  • Capacity Planning: Checking subnet IP address availability
  • Load Balancing: Managing instance group membership

Key Features:

  • Support for all Compute Engine machine types
  • Custom service accounts and network tags for firewall targeting
  • Startup-script injection via instance metadata for agent initialization
  • Label-based instance identification and filtering (see Instance Labels)
  • Preemptible and Spot VM support with termination detection

2. Cloud Storage (GCS)

GCS serves as the object storage backend for all persistent state:

  • Configuration Storage: Static and dynamic group configurations
  • Instance Metadata: Registration records and certificates
  • Leader Election: Distributed coordination using GCS lockfiles with generation-based optimistic locking
  • Certificate Logs: Audit trail of certificate issuances
  • (Optionally) Secrets Storage: Encrypted CA keys and custom secrets

See Data Storage for the full bucket layout.

GCS-Specific Behavior:

  • Uses object generation numbers as ETags for optimistic locking (instead of S3 ETags)
  • Precondition checks via GenerationMatch and DoesNotExist conditions
  • 412 Precondition Failed responses indicate concurrent modification

3. Secret Manager

Used as an optional secrets backend for secure storage of sensitive cryptographic material:

  • Encryption Key: For encrypting data stored in object storage
  • Certificate Authority Keys: Private keys for CA operations
  • Service Account Keys: Kubernetes service account signing keys
  • Custom Secrets: Distributed to instances via agent

Security Model:

  • Secrets are automatically versioned — each write creates a new version
  • The latest version alias is used for reads
  • Secrets are encrypted at rest and in transit
  • Secret names are prefixed (e.g., nstance/) for access scoping

Configuration: Set secrets.provider to gcp-secret-manager and provide the project_id in the secrets configuration. Alternatively, use object-storage with an encryption key to store secrets in GCS instead.

4. Instance Groups (Load Balancing)

Manages unmanaged instance groups for service exposure via load balancers:

  • Instance Registration: Adding healthy instances to instance groups
  • Instance Deregistration: Removing instances during termination/drain
  • Membership Listing: Querying current group membership for cache warming

Supported Load Balancers:

  • Network Load Balancers (TCP/UDP) via backend services with instance groups
  • Internal Load Balancers via instance groups
  • Automatic registration/deregistration based on instance lifecycle

Google Cloud SDK API Usage

Compute Engine APIs

SDK MethodIAM PermissionPurpose
instances.insertcompute.instances.createCreate new VM instances
instances.deletecompute.instances.deleteDelete instances
instances.getcompute.instances.getQuery instance status/metadata
instances.listcompute.instances.listList instances with label filters
instances.updateNetworkInterfacecompute.instances.updateNetworkInterfaceAssign/remove alias IPs for leader network
subnetworks.getcompute.subnetworks.getCheck subnet IP availability
instanceGroups.addInstancescompute.instanceGroups.updateRegister instances with instance groups
instanceGroups.removeInstancescompute.instanceGroups.updateDeregister instances from instance groups
instanceGroups.listInstancescompute.instanceGroups.listList instance group membership

Cloud Storage APIs

SDK MethodIAM PermissionPurpose
objects.getstorage.objects.getRetrieve stored data
objects.createstorage.objects.createStore/update data
objects.deletestorage.objects.deleteRemove data
objects.get (attrs)storage.objects.getCheck object existence/metadata
objects.liststorage.objects.listEnumerate stored objects

Secret Manager APIs

SDK MethodIAM PermissionPurpose
secretVersions.accesssecretmanager.versions.accessRetrieve secret values
secrets.addVersionsecretmanager.versions.addAdd new secret versions
secrets.createsecretmanager.secrets.createCreate new secrets
secrets.getsecretmanager.secrets.getCheck if a secret exists
secrets.deletesecretmanager.secrets.deleteRemove secrets
secrets.listsecretmanager.secrets.listList secrets by prefix

IAM Permissions

Nstance Server requires a dedicated service account with specific IAM roles. The recommended approach is to create a custom IAM role with the minimum required permissions.

Compute Engine Permissions

compute.instances.create
compute.instances.delete
compute.instances.get
compute.instances.list
compute.instances.updateNetworkInterface
compute.subnetworks.get
compute.subnetworks.use
compute.instanceGroups.update
compute.instanceGroups.list
compute.disks.create
compute.zones.get

Cloud Storage Permissions

storage.objects.get
storage.objects.create
storage.objects.delete
storage.objects.list
storage.buckets.get

Secret Manager Permissions (if using gcp-secret-manager)

secretmanager.versions.access
secretmanager.versions.add
secretmanager.secrets.create
secretmanager.secrets.get
secretmanager.secrets.delete
secretmanager.secrets.list

Key Considerations

  • Resource Scoping: Storage permissions should be scoped to specific buckets using IAM conditions
  • Secret Prefixing: Secret Manager access can be limited using IAM conditions on the nstance/ resource name prefix
  • Project Scope: All permissions should be scoped to the deployment project
  • Service Account: VMs created by Nstance can use a separate service account (configured via the ServiceAccount arg) from the Nstance Server itself

Instance Labels

GCP uses labels (not tags) for instance metadata and identification. Labels must be lowercase with hyphens only (no colons or underscores allowed). Nstance automatically manages the following labels on all instances:

Label KeyExample ValuePurpose
nstance-managedtrueIdentifies Nstance-managed instances
nstance-instance-idknc0000000001r010000000000000Nstance instance identifier
nstance-cluster-idmy-clusterCluster identifier
nstance-shardus-central1-aZone shard identifier
nstance-groupworkersGroup name
nstance-templatedefaultTemplate name
nstance-instance-kindmachinepoolInstance kind

Label values are automatically sanitized: converted to lowercase with underscores replaced by hyphens.

Additional custom labels can be added via the Labels arg in configuration.

Important: GCP network tags are used only for firewall rule targeting and are not used for instance identification or filtering. Use the NetworkTags arg to apply firewall tags to instances.

Agent Instance Metadata

Nstance Agent on GCP uses the Instance Metadata Service (IMDS) at http://metadata.google.internal/computeMetadata/v1/ to discover its own identity. All requests require the Metadata-Flavor: Google header.

  • instance/name — Retrieve provider instance ID
  • instance/scheduling/preemptible — Detect if running as a preemptible/Spot VM
  • instance/preempted — Detect preemption termination notice

The agent also supports reading instance ID from the cloud-init cache as a fast path.

Leader Network Management

On GCP, leader network assignment uses alias IP ranges instead of ENIs (as on AWS):

  • Assign: Adds a /32 alias IP range to the instance’s primary network interface (nic0)
  • Release: Removes the alias IP range from the network interface
  • Both operations use instances.updateNetworkInterface with the NIC fingerprint for safe concurrent updates

This allows the shard leader to be reachable at a stable internal IP address regardless of which instance currently holds leadership.

Troubleshooting

Common Issues

  1. IAM Permission Errors: Verify the service account has all required permissions listed above
  2. GCS Access Denied: Check bucket-level IAM policies and ensure the service account has storage.objects.* permissions
  3. Instance Creation Failures: Verify subnet capacity, quota limits, and that the specified machine type is available in the target zone
  4. Leader Network Issues: Ensure the alias IP range is within the subnet’s secondary range (if applicable)
  5. Secret Manager Errors: Confirm the project_id is correct and the service account has Secret Manager permissions
  6. Instance Group Errors: Verify the instance group exists in the correct zone and the instance is in the same network

Debugging Commands

# Check VM instance status
gcloud compute instances describe INSTANCE_NAME --zone=ZONE

# List Nstance-managed instances
gcloud compute instances list --filter="labels.nstance-managed=true"

# List instances in a specific shard
gcloud compute instances list --filter="labels.nstance-shard=us-central1-a"

# List GCS objects
gcloud storage ls gs://your-bucket/ --recursive

# Check a specific object
gcloud storage cat gs://your-bucket/shard/us-central1-a/config.jsonc

# Access a secret
gcloud secrets versions access latest --secret=nstance-ca-key

# List secrets with prefix
gcloud secrets list --filter="name:nstance"

# List instance group members
gcloud compute instance-groups unmanaged list-instances GROUP_NAME --zone=ZONE

# Check subnet details
gcloud compute networks subnets describe SUBNET_NAME --region=REGION

# View instance serial port output (useful for debugging startup-script)
gcloud compute instances get-serial-port-output INSTANCE_NAME --zone=ZONE

Further Reading