Getting Started Developer Guide Feature Catalog Operator Guide Reference
Docs/Operator Guide

Aether Operator Guide

Deploy, scale, and monitor Aether clusters.

Cluster Bootstrap

An Aether cluster starts with a TOML configuration file. The aether cluster bootstrap command provisions nodes, forms the cluster, and brings it to a ready state.

# cluster.toml
[cluster]
name = "production"
size = 5

[cloud]
provider = "hetzner"
region = "fsn1"
instance_type = "cx31"

[security]
cluster_secret = "${secrets:cluster/secret}"
aether cluster bootstrap --config cluster.toml

The 12-step bootstrap orchestrator handles: cloud provisioning, node startup, peer discovery, consensus formation, quorum establishment, leader election, KV-Store initialization, artifact repository setup, and readiness verification.

Deploying Applications

# Upload artifact to cluster
aether artifact upload target/my-service-1.0.0.jar

# Deploy blueprint
aether deploy target/blueprint.toml

# Or use a strategy
aether deploy target/blueprint.toml --strategy canary

# Deployment lifecycle
aether deploy status          # Show deployment status
aether deploy promote         # Advance to next stage (canary/rolling)
aether deploy rollback        # Rollback deployment
aether deploy complete        # Finalize deployment

Deployment Strategies

StrategyBehaviorRollback
ImmediateAll instances at onceManual redeploy previous version
RollingWeighted traffic shift with health thresholdsaether rollback
CanaryProgressive traffic shift through configurable stages (e.g., 5% → 25% → 50% → 100%) with auto-evaluationAutomatic on health degradation
Blue-greenAtomic switchover via consensus (~5ms routing change, plus drain for in-flight requests)Instant switch back
A/B testingDeterministic split by request contextRemove variant

Scaling

Three Dimensions

Auto-Scaling

Tier 1 — Decision Tree (1-second intervals): reactive scaling on CPU, latency, queue depth, error rate. Always active.

Tier 2 — TTM Predictor (60-second intervals): ONNX ML model with 2-hour sliding window. Predicts load spikes before they happen.

If TTM fails, Decision Tree continues with default thresholds. No scaling disruption.

Manual Scaling

# Scale a specific slice
aether scale my-service --instances 5

# Scale the cluster
aether cluster scale --nodes 7

Monitoring

Built-in Observability

Health & Alerts

# Check cluster health
aether health

# View active alerts
aether alerts

# Configure thresholds
aether thresholds set cpu_warning 0.75
aether thresholds set cpu_critical 0.90

Cloud Integration

Aether integrates with four cloud providers without vendor SDKs. All API clients use SigV4/JWT/OAuth2 implemented from scratch.

ProviderComputeSecretsLoad BalancerDiscoveryCertificates
HetznerYesYes (env vars)YesYes (labels)Self-signed (HKDF)
AWSEC2Secrets ManagerELBv2YesACM
GCPCompute EngineSecret ManagerNEGsYesCertificate Manager
AzureVMsKey VaultLoad BalancersResource GraphKey Vault

Cloud provider is selected via aether.toml. Adding a new provider is implementing an interface — no SDK to adopt.

Node Lifecycle

StateDescription
JOININGNode connecting to cluster, restoring state
ON_DUTYServing traffic, participating in consensus
DRAININGGracefully migrating workload, respecting disruption budgets
DECOMMISSIONEDNo workload, awaiting shutdown
SHUTTING_DOWNFinal cleanup and exit
# Graceful drain (respects disruption budget)
aether node drain node-3

# Force shutdown
aether node shutdown node-3

Security

Backup & Recovery

Cluster metadata is periodically serialized to a local git repository — triggered on state changes and at configurable intervals. Git provides versioning, history, diffs, and optional remote push.

# Trigger manual backup
aether backup trigger

# List backups (shows git commit history)
aether backup list

# Restore from backup (by commit ID)
aether backup restore abc1234

Next Steps