What It Does
Watcher is Monk’s built-in cluster monitoring and alerting system. It runs on your cluster 24/7, detects crashes, resource pressure, and health check failures, then sends AI-analyzed alerts to Slack with actionable recommendations. Available on Pro and Team plans.How to Set Up
In chat, ask Monk to set up Watcher:- “configure watcher”
- “enable watcher”
- “set up cluster monitoring”
- “configure Slack alerts for my cluster”
- Active cluster with at least one non-local node
- Slack webhook URL (optional, for alerts)
Configuration Options
The setup form has four sections:Crash Detection
- Crash Threshold: Number of restarts within the window to trigger an alert (default: 3)
- Crash Window: Time window for counting restarts (default: 5 minutes)
- Health Check Failures: Consecutive liveness failures before alerting (default: 3)
Peer Thresholds (Cluster Nodes)
- CPU %: CPU usage threshold (default: 80%)
- CPU Duration: Sustained time before alerting (default: 5 minutes)
- Memory %: Memory usage threshold (default: 80%)
- Memory Duration: Sustained time before alerting (default: 5 minutes)
- Disk %: Disk usage threshold (default: 85%)
- Disk Breaches: Consecutive polls above threshold before alerting (default: 2)
Workload Thresholds (Running Services)
- CPU %: CPU usage threshold (default: 80%)
- CPU Duration: Sustained time before alerting (default: 5 minutes)
- Memory %: Memory usage threshold (default: 80%)
- Memory Duration: Sustained time before alerting (default: 5 minutes)
- Disk %: Disk usage threshold (default: 90%)
- Disk Breaches: Consecutive polls above threshold before alerting (default: 3)
Advanced Settings
Toggle “Show Advanced Options” to access:- Poll Interval: How often to check cluster health (default: 15 seconds)
- AI Only Slack: Only send AI-analyzed alerts to Slack, reduces noise (default: on)
- Enable Fix with Monk: Include debugging links in Slack alerts (default: on)
- Ignore Local Peer: Skip local node checks, focus on remote peers (default: on)
- Context TTL: How long to keep alert context for debugging links (default: 24 hours)
- Reassess Interval: How often to re-evaluate ongoing issues (default: 15 minutes)
- Log Lines: Number of log lines to analyze per workload (default: 100)
Slack Integration
When you set up Watcher, Monk asks if you want to configure Slack alerts. If you choose yes, Monk prompts for your Slack webhook URL (collected securely, never shown in chat). To create a Slack webhook:- Go to Slack Incoming Webhooks
- Create a new webhook for your workspace
- Copy the webhook URL
- Paste it when Monk asks during Watcher setup
How It Works
Watcher deploys two components to your cluster:- watcher-agent: Monitors cluster health, collects metrics, detects issues
- watcher-ai: Analyzes issues with AI, generates recommendations, sends Slack alerts
- Continuous polling of all nodes and workloads
- Threshold breach or crash detected
- AI analyzes logs, metrics, and context
- Alert sent to Slack with diagnosis and recommendations
- Recovery notification when issue resolves
Slack Alert Format
Issue detected:Fix with Monk Button
Each Slack alert includes a Fix with Monk button. Clicking it:- Opens VS Code with the Monk extension
- Loads the Monk chat panel
- Prefills context about the issue (affected workload, logs, metrics, AI diagnosis)
Managing Watcher
Check status:Coming Soon
Autonomous Auto-Fixes COMING SOON Future Watcher capabilities will include automatic remediation:- Automatic restart of crashed services
- Auto-scaling resources when sustained pressure detected
- Applying known fixes without human intervention
- Smart rollback on failed deployments
Related Features
- Monitoring & Observability - Log streaming and metrics
- Scaling - Metrics that trigger scaling
- Security - How Watcher access is secured

