Google Cloud Integration Guide for Nstance
This document provides a comprehensive overview of how Nstance integrates with Google Cloud services, including the specific APIs used, IAM permissions required, and operational considerations.
Overview
Nstance Server leverages multiple Google Cloud services for infrastructure management, configuration storage, secrets management, and load balancing. The server uses the Google Cloud Go client libraries (cloud.google.com/go) and the older google.golang.org/api client for some operations.
Google Cloud Services Used
1. Compute Engine
Nstance uses Compute Engine for virtual machine lifecycle management across multiple operations:
- Instance Provisioning: Creating new VM instances with custom configurations
- Instance Termination: Deleting unhealthy or expired instances
- Health Monitoring: Querying instance status for reconciliation decisions
- Leader Network Management: Assigning/removing alias IPs for shard leadership
- Capacity Planning: Checking subnet IP address availability
- Load Balancing: Managing instance group membership
Key Features:
- Support for all Compute Engine machine types
- Custom service accounts and network tags for firewall targeting
- Startup-script injection via instance metadata for agent initialization
- Label-based instance identification and filtering (see Instance Labels)
- Preemptible and Spot VM support with termination detection
2. Cloud Storage (GCS)
GCS serves as the object storage backend for all persistent state:
- Configuration Storage: Static and dynamic group configurations
- Instance Metadata: Registration records and certificates
- Leader Election: Distributed coordination using GCS lockfiles with generation-based optimistic locking
- Certificate Logs: Audit trail of certificate issuances
- (Optionally) Secrets Storage: Encrypted CA keys and custom secrets
See Data Storage for the full bucket layout.
GCS-Specific Behavior:
- Uses object generation numbers as ETags for optimistic locking (instead of S3 ETags)
- Precondition checks via
GenerationMatchandDoesNotExistconditions - 412 Precondition Failed responses indicate concurrent modification
3. Secret Manager
Used as an optional secrets backend for secure storage of sensitive cryptographic material:
- Encryption Key: For encrypting data stored in object storage
- Certificate Authority Keys: Private keys for CA operations
- Service Account Keys: Kubernetes service account signing keys
- Custom Secrets: Distributed to instances via agent
Security Model:
- Secrets are automatically versioned — each write creates a new version
- The
latestversion alias is used for reads - Secrets are encrypted at rest and in transit
- Secret names are prefixed (e.g.,
nstance/) for access scoping
Configuration: Set secrets.provider to gcp-secret-manager and provide the project_id in the secrets configuration. Alternatively, use object-storage with an encryption key to store secrets in GCS instead.
4. Instance Groups (Load Balancing)
Manages unmanaged instance groups for service exposure via load balancers:
- Instance Registration: Adding healthy instances to instance groups
- Instance Deregistration: Removing instances during termination/drain
- Membership Listing: Querying current group membership for cache warming
Supported Load Balancers:
- Network Load Balancers (TCP/UDP) via backend services with instance groups
- Internal Load Balancers via instance groups
- Automatic registration/deregistration based on instance lifecycle
Google Cloud SDK API Usage
Compute Engine APIs
| SDK Method | IAM Permission | Purpose |
|---|---|---|
instances.insert | compute.instances.create | Create new VM instances |
instances.delete | compute.instances.delete | Delete instances |
instances.get | compute.instances.get | Query instance status/metadata |
instances.list | compute.instances.list | List instances with label filters |
instances.updateNetworkInterface | compute.instances.updateNetworkInterface | Assign/remove alias IPs for leader network |
subnetworks.get | compute.subnetworks.get | Check subnet IP availability |
instanceGroups.addInstances | compute.instanceGroups.update | Register instances with instance groups |
instanceGroups.removeInstances | compute.instanceGroups.update | Deregister instances from instance groups |
instanceGroups.listInstances | compute.instanceGroups.list | List instance group membership |
Cloud Storage APIs
| SDK Method | IAM Permission | Purpose |
|---|---|---|
objects.get | storage.objects.get | Retrieve stored data |
objects.create | storage.objects.create | Store/update data |
objects.delete | storage.objects.delete | Remove data |
objects.get (attrs) | storage.objects.get | Check object existence/metadata |
objects.list | storage.objects.list | Enumerate stored objects |
Secret Manager APIs
| SDK Method | IAM Permission | Purpose |
|---|---|---|
secretVersions.access | secretmanager.versions.access | Retrieve secret values |
secrets.addVersion | secretmanager.versions.add | Add new secret versions |
secrets.create | secretmanager.secrets.create | Create new secrets |
secrets.get | secretmanager.secrets.get | Check if a secret exists |
secrets.delete | secretmanager.secrets.delete | Remove secrets |
secrets.list | secretmanager.secrets.list | List secrets by prefix |
IAM Permissions
Nstance Server requires a dedicated service account with specific IAM roles. The recommended approach is to create a custom IAM role with the minimum required permissions.
Compute Engine Permissions
compute.instances.create
compute.instances.delete
compute.instances.get
compute.instances.list
compute.instances.updateNetworkInterface
compute.subnetworks.get
compute.subnetworks.use
compute.instanceGroups.update
compute.instanceGroups.list
compute.disks.create
compute.zones.getCloud Storage Permissions
storage.objects.get
storage.objects.create
storage.objects.delete
storage.objects.list
storage.buckets.getSecret Manager Permissions (if using gcp-secret-manager)
secretmanager.versions.access
secretmanager.versions.add
secretmanager.secrets.create
secretmanager.secrets.get
secretmanager.secrets.delete
secretmanager.secrets.listKey Considerations
- Resource Scoping: Storage permissions should be scoped to specific buckets using IAM conditions
- Secret Prefixing: Secret Manager access can be limited using IAM conditions on the
nstance/resource name prefix - Project Scope: All permissions should be scoped to the deployment project
- Service Account: VMs created by Nstance can use a separate service account (configured via the
ServiceAccountarg) from the Nstance Server itself
Instance Labels
GCP uses labels (not tags) for instance metadata and identification. Labels must be lowercase with hyphens only (no colons or underscores allowed). Nstance automatically manages the following labels on all instances:
| Label Key | Example Value | Purpose |
|---|---|---|
nstance-managed | true | Identifies Nstance-managed instances |
nstance-instance-id | knc0000000001r010000000000000 | Nstance instance identifier |
nstance-cluster-id | my-cluster | Cluster identifier |
nstance-shard | us-central1-a | Zone shard identifier |
nstance-group | workers | Group name |
nstance-template | default | Template name |
nstance-instance-kind | machinepool | Instance kind |
Label values are automatically sanitized: converted to lowercase with underscores replaced by hyphens.
Additional custom labels can be added via the Labels arg in configuration.
Important: GCP network tags are used only for firewall rule targeting and are not used for instance identification or filtering. Use the
NetworkTagsarg to apply firewall tags to instances.
Agent Instance Metadata
Nstance Agent on GCP uses the Instance Metadata Service (IMDS) at http://metadata.google.internal/computeMetadata/v1/ to discover its own identity. All requests require the Metadata-Flavor: Google header.
instance/name— Retrieve provider instance IDinstance/scheduling/preemptible— Detect if running as a preemptible/Spot VMinstance/preempted— Detect preemption termination notice
The agent also supports reading instance ID from the cloud-init cache as a fast path.
Leader Network Management
On GCP, leader network assignment uses alias IP ranges instead of ENIs (as on AWS):
- Assign: Adds a
/32alias IP range to the instance’s primary network interface (nic0) - Release: Removes the alias IP range from the network interface
- Both operations use
instances.updateNetworkInterfacewith the NIC fingerprint for safe concurrent updates
This allows the shard leader to be reachable at a stable internal IP address regardless of which instance currently holds leadership.
Troubleshooting
Common Issues
- IAM Permission Errors: Verify the service account has all required permissions listed above
- GCS Access Denied: Check bucket-level IAM policies and ensure the service account has
storage.objects.*permissions - Instance Creation Failures: Verify subnet capacity, quota limits, and that the specified machine type is available in the target zone
- Leader Network Issues: Ensure the alias IP range is within the subnet’s secondary range (if applicable)
- Secret Manager Errors: Confirm the
project_idis correct and the service account has Secret Manager permissions - Instance Group Errors: Verify the instance group exists in the correct zone and the instance is in the same network
Debugging Commands
# Check VM instance status
gcloud compute instances describe INSTANCE_NAME --zone=ZONE
# List Nstance-managed instances
gcloud compute instances list --filter="labels.nstance-managed=true"
# List instances in a specific shard
gcloud compute instances list --filter="labels.nstance-shard=us-central1-a"
# List GCS objects
gcloud storage ls gs://your-bucket/ --recursive
# Check a specific object
gcloud storage cat gs://your-bucket/shard/us-central1-a/config.jsonc
# Access a secret
gcloud secrets versions access latest --secret=nstance-ca-key
# List secrets with prefix
gcloud secrets list --filter="name:nstance"
# List instance group members
gcloud compute instance-groups unmanaged list-instances GROUP_NAME --zone=ZONE
# Check subnet details
gcloud compute networks subnets describe SUBNET_NAME --region=REGION
# View instance serial port output (useful for debugging startup-script)
gcloud compute instances get-serial-port-output INSTANCE_NAME --zone=ZONEFurther Reading
- Google Cloud Go Client Libraries — SDK reference
- Compute Engine Documentation — VM lifecycle and networking
- Cloud Storage Documentation — Object storage reference
- Secret Manager Documentation — Secrets management
- IAM Best Practices — Security recommendations