Scaling

What It Does

Monk provides intelligent scaling for your entire system - both workloads and infrastructure. An algorithmic autoscaler handles workload scaling automatically, while you can ask Monk to scale infrastructure (VMs, service settings) with simple chat commands.

How It Works

Algorithmic Workload Autoscaling

Monk includes an algorithmic autoscaler that manages your containerized workloads automatically: What the autoscaler handles:

Horizontal scaling - Adds or removes container replicas based on load
Resource-based scaling - Scales based on CPU and memory utilization
Automatic load balancing - Distributes traffic across scaled replicas

Example: Your API server starts with 2 replicas. As traffic increases during peak hours, the autoscaler automatically spins up additional replicas (3, 4, 5…). When traffic subsides, it scales back down to save resources. The autoscaler runs continuously in the background as part of the orchestration process - no manual intervention required as long as the autoscaling rules are present in Monk configuration.

Infrastructure Scaling (Manual Trigger)

Beyond workload scaling, Monk can scale the underlying infrastructure itself: What Monk can scale:

Virtual machines - Add or remove VMs from your deployment
Instance sizing - Change VM sizes (e.g., upgrade from 2GB to 4GB RAM)
Service settings - Adjust database connection pools, cache sizes, worker counts
Storage - Increase disk size for databases and persistent volumes

How to trigger: Currently, you ask Monk to scale infrastructure via chat in your IDE:

You: Add 2 more machines to the API cluster

You: Scale the database instance up to 8GB RAM

You: Increase the worker count to 5

You: Remove the extra VMs, traffic is back to normal

Monk provisions or deprovisions resources accordingly, using your cloud provider accounts.

Intelligent Scaling Decisions

When you request infrastructure changes, Monk makes intelligent decisions: Instance sizing:

Recommends appropriate VM sizes based on current usage
Suggests cost-effective alternatives
Warns about over-provisioning

Placement:

Places new VMs in optimal regions
Co-locates with related services for low latency
Balances across availability zones when needed

Cost awareness:

Estimates cost impact of scaling changes
Suggests cheaper alternatives when possible
See Cost Tracking for real-time cost monitoring

Confirmation before changes:

You: Add more machines to handle this traffic spike

Monk: Current setup: 2x t3.medium instances (4 vCPU, 8GB RAM)
      Recommendation: Add 2x t3.medium instances
      New total: 4 instances
      Cost increase: ~$50/month

      Proceed?

You: Yes

Zero-Downtime Scaling

Whether workload autoscaling or infrastructure scaling, Monk ensures zero downtime: Workload scaling:

New replicas added before old ones removed (scale-up-then-down)
Health checks before traffic routing
Graceful shutdown of scaled-down replicas

Infrastructure scaling:

New VMs provisioned and containers deployed before traffic shifts
Load balancers updated automatically
Old VMs drained before shutdown

Coming Soon

Proactive AI-Driven Scaling COMING SOON The next evolution of Monk’s scaling capabilities: autonomous, proactive scaling driven by AI. What’s coming:

24/7 monitoring - Monk watches your infrastructure continuously
Traffic spike response - Automatically scales up when traffic increases
Cost optimization - Scales down during silent periods to save money
Predictive scaling - Learns traffic patterns and scales preemptively
Autonomous decisions - No manual trigger needed - Monk acts autonomously
Both layers - Scales workloads (containers) and infrastructure (VMs) together

How it will work:

[Late evening, traffic spike detected]

Monk: Traffic increased 300% on API server
      Current: 2 replicas at 85% CPU
      Action: Scaling to 6 replicas
      ✓ Scaled up

[3 AM, traffic back to normal]

Monk: Traffic returned to baseline
      Current: 6 replicas at 15% CPU
      Action: Scaling down to 2 replicas
      ✓ Scaled down (saved $12 tonight)

No human intervention required - Monk handles it autonomously while you sleep.

What Makes This Different

Traditional scaling requires:

Manually configuring autoscaling rules and thresholds
Learning Kubernetes HPA, AWS Auto Scaling Groups, etc.
Writing infrastructure-as-code for scaling policies
Setting up CloudWatch alarms and scaling triggers
Manually provisioning VMs when autoscaling isn’t enough
24/7 on-call to respond to traffic spikes
Capacity planning and forecasting

With Monk:

Today: Workloads autoscale automatically. Ask Monk to scale infrastructure.
Soon: Monk handles everything autonomously, 24/7.

Key Capabilities

Current:

✅ Algorithmic workload autoscaling - Containers scale automatically based on load
✅ Horizontal scaling - Add/remove replicas dynamically
✅ Infrastructure scaling - Add/remove VMs, change instance sizes
✅ Service configuration - Adjust database, cache, worker settings
✅ Natural language commands - “Add more machines”, “Scale the API up”
✅ Intelligent recommendations - Cost-aware, placement-optimized decisions
✅ Zero-downtime scaling - No interruption during scale operations
✅ Automatic load balancing - Traffic distributed across scaled instances

Coming Soon:

🔜 Proactive AI-driven scaling - Autonomous 24/7 scaling based on traffic
🔜 Predictive scaling - Learns patterns and scales preemptively
🔜 Cost optimization mode - Minimize costs while maintaining performance
🔜 No manual trigger needed - Fully autonomous operation

Impact

Today: Workloads scale automatically. Scale infrastructure with a chat message. No autoscaling rule configuration or manual VM provisioning. Soon: Sleep soundly knowing Monk scales your system autonomously, reacting to traffic spikes and silent periods 24/7 - optimizing both performance and cost.

Cloud Infrastructure - How Monk provisions and manages VMs
Containerization - Workloads that get scaled
Networking - Load balancing across scaled instances
Monitoring - How Monk tracks resource usage
Cost Tracking - Real-time cost impact of scaling
IDE Integration - Where you chat with Monk

Deployment & Build

Infrastructure & Cloud

Configuration & Data

Networking & Security

Operations & Monitoring

Developer Experience

Team Features

What It Does

How It Works

Algorithmic Workload Autoscaling

Infrastructure Scaling (Manual Trigger)

Intelligent Scaling Decisions

Zero-Downtime Scaling

Coming Soon

What Makes This Different

Key Capabilities

Impact

Deployment & Build

Infrastructure & Cloud

Configuration & Data

Networking & Security

Operations & Monitoring

Developer Experience

Team Features

​What It Does

​How It Works

​Algorithmic Workload Autoscaling

​Infrastructure Scaling (Manual Trigger)

​Intelligent Scaling Decisions

​Zero-Downtime Scaling

​Coming Soon

​What Makes This Different

​Key Capabilities

​Impact

​Related Features

What It Does

How It Works

Algorithmic Workload Autoscaling

Infrastructure Scaling (Manual Trigger)

Intelligent Scaling Decisions

Zero-Downtime Scaling

Coming Soon

What Makes This Different

Key Capabilities

Impact

Related Features