What It Does
Watcher is Monk’s built-in cluster monitoring and alerting system. It runs on your cluster 24/7, detects crashes, resource pressure, and health check failures, then sends AI-analyzed alerts to Slack with actionable recommendations. Available on Pro and Team plans.How to Set Up
In chat, ask Monk to set up Watcher:- “configure watcher”
- “enable watcher”
- “set up cluster monitoring”
- “configure Slack alerts for my cluster”
- Active cluster with at least one non-local node
- Slack webhook URL (optional, for alerts)
Configuration Options
The setup form has four sections:Crash Detection
- Crash Threshold: Number of restarts within the window to trigger an alert (default: 3)
- Crash Window: Time window for counting restarts (default: 5 minutes)
- Health Check Failures: Consecutive liveness failures before alerting (default: 3)
Peer Thresholds (Cluster Nodes)
- CPU %: CPU usage threshold (default: 80%)
- CPU Duration: Sustained time before alerting (default: 5 minutes)
- Memory %: Memory usage threshold (default: 80%)
- Memory Duration: Sustained time before alerting (default: 5 minutes)
- Disk %: Disk usage threshold (default: 85%)
- Disk Breaches: Consecutive polls above threshold before alerting (default: 2)
Workload Thresholds (Running Services)
- CPU %: CPU usage threshold (default: 80%)
- CPU Duration: Sustained time before alerting (default: 5 minutes)
- Memory %: Memory usage threshold (default: 80%)
- Memory Duration: Sustained time before alerting (default: 5 minutes)
- Disk %: Disk usage threshold (default: 90%)
- Disk Breaches: Consecutive polls above threshold before alerting (default: 3)
Advanced Settings
Toggle “Show Advanced Options” to access:- Poll Interval: How often to check cluster health (default: 15 seconds)
- AI Only Slack: Only send AI-analyzed alerts to Slack, reduces noise (default: on)
- Enable Fix with Monk: Include debugging links in Slack alerts (default: on)
- Ignore Local Peer: Skip local node checks, focus on remote peers (default: on)
- Context TTL: How long to keep alert context for debugging links (default: 24 hours)
- Reassess Interval: How often to re-evaluate ongoing issues (default: 15 minutes)
- Log Lines: Number of log lines to analyze per workload (default: 100)

Slack Integration
When you set up Watcher, Monk asks if you want to configure Slack alerts. If you choose yes, Monk prompts for your Slack webhook URL (collected securely, never shown in chat). To create a Slack webhook:- Go to Slack Incoming Webhooks
- Create a new webhook for your workspace
- Copy the webhook URL
- Paste it when Monk asks during Watcher setup
How It Works
Watcher deploys two components to your cluster:- watcher-agent: Monitors cluster health, collects metrics, detects issues
- watcher-ai: Analyzes issues with AI, generates recommendations, sends Slack alerts
- Continuous polling of all nodes and workloads
- Threshold breach or crash detected
- AI analyzes logs, metrics, and context
- Alert sent to Slack with diagnosis and recommendations
- Recovery notification when issue resolves
- An alert is sent to the configured notification endpoint (e.g., Slack).
- The notification includes a summary of the issue and a “Fix with Monk” button.
- Upon activation, Monk opens a contextual chat session.
- Monk explains the root cause and transparently performs remediation steps.
What Watcher Detects
Watcher can identify a wide range of infrastructure and application-level issues, including:- High CPU usage exceeding defined thresholds
- Crash loops
- Noisy neighbor resource contention
- Excessive log output
- Infrastructure instability
Alert Notification Configuration
Watcher supports flexible alert routing options:- Slack Webhook Notifications
Incident Flow
A typical incident resolution process follows these steps:- An alert is triggered in an external application (e.g., Slack).
- The root cause of the issue is analyzed.
- The user clicks Fix with Monk.
- Monk opens a contextual chat session and begins remediation.
- Monk displays each action taken in real time.
Slack Alert Format
Issue detected:Fix with Monk Button
Each Slack alert includes a Fix with Monk button. Clicking it:- Opens VS Code with the Monk extension
- Loads the Monk chat panel
- Prefills context about the issue (affected workload, logs, metrics, AI diagnosis)

Managing Watcher
Check status:Coming Soon
Autonomous Auto-Fixes COMING SOON Future Watcher capabilities will include automatic remediation:- Automatic restart of crashed services
- Auto-scaling resources when sustained pressure detected
- Applying known fixes without human intervention
- Smart rollback on failed deployments
Related Features
- Monitoring & Observability - Log streaming and metrics
- Scaling - Metrics that trigger scaling
- Security - How Watcher access is secured

