Nstance Development Environment
This document describes the local development environment for Nstance, which simulates the full production architecture without requiring cloud infrastructure or real object storage servers / Kubernetes clusters.
There’s also Development with Kind which explains how to run a dev environment using Kind for testing nstance-operator with a real Kubernetes cluster instead the mock dev-k8s server used in this document.
Architecture Overview
┌──────────────────────────────────────────────────────────────────────────────┐
│ Development Environment │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ dev-s3 │ │ server │ │ dev-k8s │ │ operator │ │
│ │ (gofakes3) │ │ (nstance- │ │ (fake k8s │ │ (nstance- │ │
│ │ :8989 │◄────│ server) │────►│ API) │◄────│ operator) │ │
│ │ │ │ │ │ :6443 │ │ │ │
│ └─────────────┘ └──────┬──────┘ └──────┬──────┘ └─────────────┘ │
│ │ │ │
│ │ │ │
│ ┌──────▼──────┐ ┌──────▼──────┐ │
│ │ tmux │ │ Nodes │ │
│ │ session: │ │ (fake) │ │
│ │ nstance- │────►│ created by │ │
│ │ dev-agents │ │tmux provider│ │
│ │ │ │ │ │
│ │ ┌─────────┐ │ └─────────────┘ │
│ │ │ agent 1 │ │ │
│ │ └─────────┘ │ │
│ │ ┌─────────┐ │ │
│ │ │ agent 2 │ │ │
│ │ └─────────┘ │ │
│ └─────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────────────┘Components
1. dev-s3 (Fake S3 Server)
A fake S3 server using gofakes3 that stores files on the local filesystem.
- Port: 8989
- Storage:
temp/dev-s3/ - Bucket:
dev
Used by nstance-server to store:
- Configuration files
- Secrets and encryption keys
- CA certificates and keys
- Instance state
2. nstance-server
The main Nstance control plane server running with the tmux provider.
Ports:
| Port | Service | Description |
|---|---|---|
| 8990 | Health | HTTP health checks |
| 8991 | Leader | Leader election health |
| 8992 | Registration | gRPC - Agent/Operator registration (TLS, no client cert) |
| 8993 | Operator | gRPC - Operator sync/drain (mTLS) |
| 8994 | Agent | gRPC - Agent communication (mTLS) |
Tmux Provider Behavior:
- Creates agents as tmux windows (running nstance-agent processes directly) instead of cloud VMs
- Runs agents via Air for hot-reload of agent code
- Creates fake Kubernetes Node resources as JSON files in dev-k8s temp dir when instances are created
- Cleans up Node resource JSON files when instances are deleted
3. dev-k8s (Fake Kubernetes API)
A minimal fake Kubernetes API server that stores/reads resources to/from JSON files.
- Port: 6443
- Storage:
temp/dev-k8s/
Supported Resources:
- Core API (
/api/v1): Secrets, ConfigMaps, Namespaces, Nodes, Pods - Nstance CRDs (
infrastructure.cluster.x-k8s.io/v1beta1): NstanceCluster, NstanceMachine, NstanceMachinePool, NstanceMachineTemplate - Cluster API (
cluster.x-k8s.io/v1beta2): Cluster, Machine, MachinePool - Coordination (
coordination.k8s.io/v1): Lease
How it works:
- Resources are stored as JSON files in
temp/dev-k8s/{resource}/{namespace}/{name}.json - Cluster-scoped resources (Nodes) are stored in
temp/dev-k8s/{resource}/{name}.json - No schema validation - accepts any valid JSON
- Supports watch via file system notifications (fsnotify) so you can change resources simply by editing the respective JSON file
4. nstance-operator
The Kubernetes operator that syncs instance groups between nstance-server and Kubernetes.
Configuration:
- Uses a generated kubeconfig pointing to dev-k8s
- Connects to nstance-server on separate ports for registration (8992) and operations (8993)
- Uses JSON content type instead of protobuf (dev-k8s limitation)
Startup Flow:
- Waits for nstance-server to be healthy
- Waits for dev-k8s to be healthy
- Creates CA ConfigMap from
temp/dev-s3/cluster/ca.crt - Generates registration nonce via
nstance-adminCLI tool and writes to a Secret - Creates operator config with shard endpoints
- Starts operator with Air for hot-reload
5. tmux Agent Session
Agents run in a dedicated tmux session for isolation from Overmind’s tmux session.
- Session name:
nstance-dev-agents - Window naming:
nstance-agent-{instanceID}
Each agent window runs Air for hot-reload, enabling code changes to automatically restart agents.
Quick Start
Prerequisites
make check # Verifies: go, tmux, air, overmindStarting the Environment
# Full stack: s3 + server + dev-k8s + operator (clean start recommended)
make clean-dev && make dev-tmux-k8s
# Server only: s3 + server (for admin CLI testing or running operator separately)
make clean-dev && make dev-tmuxThis starts components via Overmind:
s3: dev-s3 fake object storage serverserver: nstance-server with tmux dev provider (2 instances)k8s: dev-k8s fake Kubernetes API (dev-tmux-k8sonly)operator: nstance-operator (dev-tmux-k8sonly)
Viewing Logs
Overmind shows combined logs. To view specific component logs:
# In the Overmind terminal, press:
# Ctrl+C to stop all
# Or use overmind commands:
overmind connect server # Connect to server process
overmind connect operator # Connect to operator processViewing Agent Logs
Agents run in a separate tmux session:
tmux attach -t nstance-dev-agents
# Use Ctrl+B, N to switch between agent windows
# Use Ctrl+B, D to detachDirectory Structure
temp/
├── cache/ # nstance-server cache (SQLite, etc.)
│ └── db/
│ └── nstance.db
├── dev-k8s/ # dev-k8s resource storage
│ ├── configmaps/
│ │ └── default/
│ │ └── nstance-cluster-ca.json
│ ├── secrets/
│ │ └── default/
│ │ ├── nstance-operator-cert.json
│ │ ├── nstance-operator-key.json
│ │ └── nstance-operator-nonce.json
│ ├── nstancemachinepools/
│ │ └── default/
│ │ └── nstance-test.json
│ ├── machinepools/
│ │ └── default/
│ │ └── nstance-test.json
│ └── nodes/
│ └── {instanceID}.json # Created by tmux provider
├── dev-s3/ # dev-s3 file storage (object storage bucket contents)
│ ├── ca.crt # Cluster CA certificate
│ ├── ca.key # Cluster CA private key (encrypted)
│ ├── config/
│ │ └── config.jsonc # Server configuration
│ └── secret/
│ └── ... # Encrypted secrets
└── operator/ # Operator runtime config
├── config.yaml # Shard endpoints
└── kubeconfig # dev-k8s kubeconfigConfiguration
Server Configuration
The server reads configuration from examples/config-tmux.jsonc. Key settings for dev mode:
{
"server": {
"provider": {
"kind": "tmux", // Uses tmux provider (local agents in tmux)
"region": "dev",
"zone": "deva"
},
"bind": {
"health_addr": "0.0.0.0:8990",
"election_addr": "0.0.0.0:8991",
"registration_addr": "0.0.0.0:8992",
"operator_addr": "0.0.0.0:8993",
"agent_addr": "0.0.0.0:8994"
}
}
}Operator Configuration
Generated automatically by scripts/dev-operator.sh:
cluster_id: example-cluster
tenant: default
shards:
dev:
registration_addr: "127.0.0.1:8992" # For initial mTLS registration
operator_addr: "127.0.0.1:8993" # For ongoing sync/drain operationsNote: Uses 127.0.0.1 instead of localhost to avoid IPv6 resolution issues on macOS.
How Registration Works
- Operator starts and loads CA certificate from
nstance-cluster-caConfigMap - Operator generates keypair and stores in
nstance-operator-keySecret - Operator loads nonce from
nstance-operator-nonceSecret - Operator connects to registration port (8992) with TLS (server auth only)
- Server issues certificate signed by cluster CA
- Operator stores certificate in
nstance-operator-certSecret - Operator connects to operator port (8993) with mTLS for sync/drain
How Instance Creation Works
- Reconciler decides to create an instance
- Tmux provider creates temp directory with identity files (nonce, CA cert)
- Tmux provider creates tmux window running
air -c scripts/air/agent.toml - Tmux provider creates fake Node JSON in
temp/dev-k8s/nodes/ - Agent starts, registers with server using nonce
- Server issues client certificate to agent
- Agent connects to agent service (8994) with mTLS
- Operator can see the Node via dev-k8s and perform drain operations
Troubleshooting
Operator can’t connect to server
Check that you’re using 127.0.0.1 instead of localhost in shard endpoints. macOS resolves localhost to IPv6 first, but the server binds to 0.0.0.0 (IPv4 only).
Operator can’t read secrets after creating them
This is a controller-runtime cache issue. The code now builds TLS config directly from PEM data instead of re-reading from Kubernetes.
“unknown service” errors from operator
The operator might be connecting to the wrong port. Registration happens on 8992, but sync/drain operations use 8993.
Agents not appearing
Check the tmux session:
tmux attach -t nstance-dev-agentsIf the session doesn’t exist, the tmux provider will create it on next instance creation.
dev-k8s doesn’t have a resource type
Add the resource to cmd/dev-k8s/handle_discovery.go. The CRUD handlers are generic and work with any resource.
Cleaning Up
# Clean all dev state
make clean-dev
# This removes:
# - temp/ directory (all dev state)
# - Kills tmux agent sessionUsing with a Real Kubernetes Cluster (kind)
Here we’ll cover how to run the Nstance operator locally and have it connecting to a kind (Kubernetes in Docker) cluster, and a local Nstance server using the dev-tmux provider, for fast development iteration and debugging.
Start an Nstance Server
The first thing you’ll want to do is ensure you have your nstance-server running with a dev-s3, as this will generate a fresh CA certificate, which the operator will require a copy of.
1. Start the nstance-server (in one terminal):
make clean-dev && make dev-tmuxThis starts s3 + server only (no dev-k8s or operator), which is what you want since the operator will run separately against kind.
Start a Kubernetes Cluster
2. Create a kind configuration file at temp/kind-config.yaml:
mkdir -p temp
cat > temp/kind-config.yaml << 'EOF'
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
EOF3. Create and start the kind cluster:
kind create cluster --name nstance-dev --config temp/kind-config.yamlCheck your current kubectl context is set to kind-nstance-dev:
kubectl config current-contextVerify cluster is running:
kubectl cluster-infoPrepare the Kubernetes Cluster
4. Install cert-manager (required by CAPI for webhook TLS certificates):
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml
kubectl wait --for=condition=Available deployment --all -n cert-manager --timeout=120s5. Deploy the Cluster API (CAPI) components:
The Nstance operator is a CAPI infrastructure provider and requires the core CAPI CRDs, controllers, and validation webhooks:
CAPI_VERSION=$(curl -sL https://api.github.com/repos/kubernetes-sigs/cluster-api/releases/latest | jq -r .tag_name)
curl -sL "https://github.com/kubernetes-sigs/cluster-api/releases/download/$CAPI_VERSION/core-components.yaml" \
| sed -E 's/\$\{[A-Za-z_][A-Za-z0-9_]*:=([^}]*)\}/\1/g' \
| kubectl apply --server-side -f -
kubectl wait --for=condition=Available deployment --all -n capi-system --timeout=120s6. Deploy the Nstance CRDs:
kubectl apply -k config/crd/7. Deploy the Nstance Cluster CA certificate ConfigMap
The nstance-server you started in step 1 will generate a new CA certificate and upload it to the dev-s3 server, stored at ./temp/dev-s3/cluster/ca.crt.
Let’s create a new ConfigMap with it embedded, for the Nstance Operator to use:
kubectl create configmap nstance-cluster-ca \
--from-file=ca.crt=temp/dev-s3/cluster/ca.crt8. Export the Kind kubeconfig for the Operator:
mkdir -p temp/operator
kind get kubeconfig --name nstance-dev > temp/operator/kubeconfig9. Create the operator config file (read from --config flag, not a ConfigMap):
The shard IDs and ports must match the dev-tmux server instances. By default, dev-tmux runs 2 server instances (server=2) with port scheme: base + (instance-1) * 10.
cat > temp/operator/config.yaml << 'EOF'
cluster_id: example-cluster
tenant: default
shards:
dev-1:
registration_addr: "127.0.0.1:8992"
operator_addr: "127.0.0.1:8993"
dev-2:
registration_addr: "127.0.0.1:9002"
operator_addr: "127.0.0.1:9003"
EOF10. Generate the registration nonce & store in a Kubernetes Secret:
Compile and use the nstance-admin binary and run it against the dev-s3 service (started in step 1):
make nstance-admin
NONCE_JWT=$(AWS_ACCESS_KEY_ID=dev \
AWS_SECRET_ACCESS_KEY=dev \
AWS_ENDPOINT_URL=http://localhost:8989 \
AWS_S3_USE_PATH_STYLE=true \
NSTANCE_ENCRYPTION_KEY=thisisatest32bytekey123456789012 \
./bin/nstance-admin cluster nonce \
--cluster-id example-cluster \
--storage-bucket dev \
--key-provider env \
--output \
-)
kubectl create secret generic nstance-operator-nonce \
--from-literal=nonce.jwt="$NONCE_JWT"This will generate a nonce valid for 3 hours by default (extend e.g. with --expiry 24h passed to the nstance-admin cluster nonce command above).
Run the Operator Locally Against the Kubernetes Cluster
11. Run the operator:
make dev-operatorThis runs the operator with Air for hot-reload, and configures the namespace, kubeconfig, and nstance-operator config file automatically. Note the differences in runtime dependencies when run locally vs in-cluster:
namespace: (for Secrets & ConfigMaps)
- in-cluster:
/var/run/secrets/kubernetes.io/serviceaccount/namespace - locally:
NSTANCE_NAMESPACE=default
kubeconfig:
- in-cluster:
/var/run/secrets/kubernetes.io/serviceaccount/. - locally:
temp/operator/kubeconfig
nstance-operator config:
- in-cluster:
/etc/nstance/operator/config.yaml(Helm chart mounts from ConfigMap). - locally:
temp/operator/config.yaml(specified via--configargument).
What Happens
The operator will:
- Connect to the Kind cluster via the kubeconfig file
- Load the cluster CA from the
nstance-cluster-caConfigMap - Load or generate an Ed25519 keypair (stored in
nstance-operator-keySecret) - Register with nstance-server using the nonce from
nstance-operator-nonceSecret - Receive and store a client certificate in
nstance-operator-certSecret - Connect to the operator gRPC port (8993) with mTLS for sync/drain operations
- Reconcile
NstanceMachinePool,NstanceMachine, andNstanceShardGroupCRDs
This setup allows for rapid development cycles without needing to rebuild container images for each change.
Cleaning Up
kind delete cluster --name nstance-dev