Skip to main content

What It Does

Monk provides intelligent scaling for your entire system - both workloads and infrastructure. An algorithmic autoscaler handles workload scaling automatically, while you can ask Monk to scale infrastructure (VMs, service settings) with simple chat commands. Cloud Scaling

How It Works

Algorithmic Workload Autoscaling

Monk includes an algorithmic autoscaler that manages your containerized workloads automatically: What the autoscaler handles:
  • Horizontal scaling - Adds or removes container replicas based on load
  • Resource-based scaling - Scales based on CPU and memory utilization
  • Automatic load balancing - Distributes traffic across scaled replicas
Example: Your API server starts with 2 replicas. As traffic increases during peak hours, the autoscaler automatically spins up additional replicas (3, 4, 5…). When traffic subsides, it scales back down to save resources. The autoscaler runs continuously in the background as part of the orchestration process - no manual intervention required as long as the autoscaling rules are present in Monk configuration.

Infrastructure Scaling (Manual Trigger)

Beyond workload scaling, Monk can scale the underlying infrastructure itself: What Monk can scale:
  • Virtual machines - Add or remove VMs from your deployment
  • Instance sizing - Change VM sizes (e.g., upgrade from 2GB to 4GB RAM)
  • Service settings - Adjust database connection pools, cache sizes, worker counts
  • Storage - Increase disk size for databases and persistent volumes
How to trigger: Currently, you ask Monk to scale infrastructure via chat in your IDE:
You: Add 2 more machines to the API cluster
You: Scale the database instance up to 8GB RAM
You: Increase the worker count to 5
You: Remove the extra VMs, traffic is back to normal
Monk provisions or deprovisions resources accordingly, using your cloud provider accounts.

Intelligent Scaling Decisions

When you request infrastructure changes, Monk makes intelligent decisions: Instance sizing:
  • Recommends appropriate VM sizes based on current usage
  • Suggests cost-effective alternatives
  • Warns about over-provisioning
Placement:
  • Places new VMs in optimal regions
  • Co-locates with related services for low latency
  • Balances across availability zones when needed
Cost awareness:
  • Estimates cost impact of scaling changes
  • Suggests cheaper alternatives when possible
  • See Cost Tracking for real-time cost monitoring
Confirmation before changes:
You: Add more machines to handle this traffic spike

Monk: Current setup: 2x t3.medium instances (4 vCPU, 8GB RAM)
      Recommendation: Add 2x t3.medium instances
      New total: 4 instances
      Cost increase: ~$50/month

      Proceed?

You: Yes

Zero-Downtime Scaling

Whether workload autoscaling or infrastructure scaling, Monk ensures zero downtime: Workload scaling:
  • New replicas added before old ones removed (scale-up-then-down)
  • Health checks before traffic routing
  • Graceful shutdown of scaled-down replicas
Infrastructure scaling:
  • New VMs provisioned and containers deployed before traffic shifts
  • Load balancers updated automatically
  • Old VMs drained before shutdown

Coming Soon

Proactive AI-Driven Scaling COMING SOON The next evolution of Monk’s scaling capabilities: autonomous, proactive scaling driven by AI. What’s coming:
  • 24/7 monitoring - Monk watches your infrastructure continuously
  • Traffic spike response - Automatically scales up when traffic increases
  • Cost optimization - Scales down during silent periods to save money
  • Predictive scaling - Learns traffic patterns and scales preemptively
  • Autonomous decisions - No manual trigger needed - Monk acts autonomously
  • Both layers - Scales workloads (containers) and infrastructure (VMs) together
How it will work:
[Late evening, traffic spike detected]

Monk: Traffic increased 300% on API server
      Current: 2 replicas at 85% CPU
      Action: Scaling to 6 replicas
      ✓ Scaled up

[3 AM, traffic back to normal]

Monk: Traffic returned to baseline
      Current: 6 replicas at 15% CPU
      Action: Scaling down to 2 replicas
      ✓ Scaled down (saved $12 tonight)
No human intervention required - Monk handles it autonomously while you sleep.

What Makes This Different

Traditional scaling requires:
  • Manually configuring autoscaling rules and thresholds
  • Learning Kubernetes HPA, AWS Auto Scaling Groups, etc.
  • Writing infrastructure-as-code for scaling policies
  • Setting up CloudWatch alarms and scaling triggers
  • Manually provisioning VMs when autoscaling isn’t enough
  • 24/7 on-call to respond to traffic spikes
  • Capacity planning and forecasting
With Monk:
  • Today: Workloads autoscale automatically. Ask Monk to scale infrastructure.
  • Soon: Monk handles everything autonomously, 24/7.

Key Capabilities

Current:
  • Algorithmic workload autoscaling - Containers scale automatically based on load
  • Horizontal scaling - Add/remove replicas dynamically
  • Infrastructure scaling - Add/remove VMs, change instance sizes
  • Service configuration - Adjust database, cache, worker settings
  • Natural language commands - “Add more machines”, “Scale the API up”
  • Intelligent recommendations - Cost-aware, placement-optimized decisions
  • Zero-downtime scaling - No interruption during scale operations
  • Automatic load balancing - Traffic distributed across scaled instances
Coming Soon:
  • 🔜 Proactive AI-driven scaling - Autonomous 24/7 scaling based on traffic
  • 🔜 Predictive scaling - Learns patterns and scales preemptively
  • 🔜 Cost optimization mode - Minimize costs while maintaining performance
  • 🔜 No manual trigger needed - Fully autonomous operation

Impact

Today: Workloads scale automatically. Scale infrastructure with a chat message. No autoscaling rule configuration or manual VM provisioning. Soon: Sleep soundly knowing Monk scales your system autonomously, reacting to traffic spikes and silent periods 24/7 - optimizing both performance and cost.