Skip to content
Auto-Scaling

Auto-Scaling

Nstance Server automatically reconciles instance groups to maintain desired capacity through an event-driven reconciliation system.

Group Reconciliation

The Nstance Server (when elected as shard leader) continuously reconciles groups to ensure actual instance counts match desired group sizes through an event-driven reconciliation system:

Reconciliation Triggers:

  • Initial Reconciliation: On server startup or when becoming shard leader, all groups are reconciled to desired state
  • Group Configuration Changes: When group size, instance type, or vars are updated (via Operator API or config changes)
  • Health-Based Replacement: When instances become unhealthy (gRPC disconnect, missed health reports, provider status checks)
  • Instance Expiry: When instances exceed configured server-wide age limits (eligibleAge or forcedAge)
  • Instance Deletion: When instances are deleted, groups are backfilled to maintain desired size

Reconciliation Logic:

  • Instance Counting: Only counts managed instances created by the reconciler; on-demand instances created via the Operator API are excluded from reconciliation decisions
  • Scale Up: If actual < desired, create new instances (rate-limited, with subnet capacity checking)
  • Scale Down: If actual > desired, delete oldest managed instances (waits for unhealthy instances to be replaced first)
  • Unhealthy Replacement: Unhealthy managed instances are automatically replaced to maintain group health and size
  • Instance Expiry: Instances exceeding age limits are expired with replacement, following drain coordination (see Instance Expiry)

Priority Order:

Reconciliation operations are prioritized as follows:

  1. Scale Down (reduce group size)
  2. Forced Expiry (compliance requirements)
  3. Unhealthy Replacement (maintain health)
  4. Opportunistic Expiry (routine rotation)
  5. Scale Up (increase group size)

Dynamic Groups Storage

  • Static groups are defined in config/{shard}.jsonc (enabling restricted editing for those groups)
  • Dynamic groups (created via Operator API) are stored in groups/{shard}.jsonc (and have unrestricted editing)
  • Dynamic groups override static groups by key, but only unrestricted fields (e.g. size, instance_type, vars) can be changed
  • Restricted fields (e.g. template, subnet_pool) from static groups cannot be overridden, preventing breaking changes to critical groups

Group Deletion

When a group is deleted (removed from config or via Operator API):

  • The reconciler gracefully scales down all managed instances to 0
  • Drain coordination is followed for each instance (if drain_timeout > 0)
  • This ensures clean shutdown rather than immediate termination