> ## Documentation Index
> Fetch the complete documentation index at: https://docs.monk.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Prometheus + Grafana

> Ready-to-run Prometheus and Grafana stack template for comprehensive monitoring, metrics collection, and visualization.

## Overview

This template provides a production‑ready Prometheus + Grafana monitoring stack as Monk runnables. You can:

* Run it directly to get a complete monitoring solution with metrics collection and visualization
* Inherit it in your own infrastructure to add observability to your applications

Prometheus is a time-series database and monitoring system that collects metrics via HTTP pulls. Grafana is a visualization platform that creates dashboards from Prometheus and other data sources. Together, they provide a powerful, open-source monitoring stack.

## What this template manages

* Prometheus server for metrics collection
* Grafana for visualization and dashboards
* AlertManager for alerting (optional)
* Service discovery and scrape configuration
* Persistent storage for metrics and dashboards
* Pre-configured data sources

## Quick start (run directly)

1. Load templates

```bash theme={null}
monk load MANIFEST
```

2. Run the monitoring stack

```bash theme={null}
monk run prometheus-grafana/stack
```

3. Customize credentials (recommended via inheritance)

Running directly uses the defaults defined in this template's `variables`. Secrets added with `monk secrets add` will not affect this runnable unless you inherit it and reference those secrets.

* Preferred: inherit and replace variables with `secret("...")` as shown below.
* Alternative: fork/clone and edit the `variables` in the template, then `monk load MANIFEST` and run.

Once started:

* Prometheus UI: `http://localhost:9090`
* Grafana UI: `http://localhost:3000` (default: admin/admin)

## Configuration

Key variables you can customize in this template:

```yaml theme={null}
variables:
  # Prometheus
  prometheus-image-tag: "latest"      # Prometheus image tag
  prometheus-port: "9090"             # Prometheus UI/API port
  scrape-interval: "15s"              # metrics scrape interval
  retention-time: "15d"               # metrics retention period
  
  # Grafana
  grafana-image-tag: "latest"         # Grafana image tag
  grafana-port: "3000"                # Grafana UI port
  grafana-admin-user: "admin"         # admin username
  grafana-admin-password: "..."       # admin password
```

Data is persisted under `${monk-volume-path}/prometheus` and `${monk-volume-path}/grafana` on the host.

## Use by inheritance (recommended for monitoring)

Inherit the stack to monitor your applications. Example:

```yaml theme={null}
namespace: myapp
monitoring:
  defines: runnable
  inherits: prometheus-grafana/stack
  variables:
    grafana-admin-password: <- secret("grafana-password")
api:
  defines: runnable
  containers:
    api:
      image: myorg/api
      labels:
        prometheus.scrape: "true"
        prometheus.port: "8080"
        prometheus.path: "/metrics"
  connections:
    monitor:
      runnable: monitoring
      service: prometheus
```

Then set the secrets once and run your app group:

```bash theme={null}
monk secrets add -g grafana-password="STRONG_PASSWORD"
monk run myapp/api
```

## Ports and connectivity

* Service: `prometheus` on TCP port `9090`
* Service: `grafana` on TCP port `3000`
* Service: `alertmanager` on TCP port `9093` (if enabled)
* From other runnables in the same process group, use `connection-hostname("\<connection-name>")` to resolve service hosts.
* From monitored services, Prometheus scrapes metrics via HTTP

## Persistence and configuration

* Prometheus data: `${monk-volume-path}/prometheus:/prometheus`
* Grafana data: `${monk-volume-path}/grafana:/var/lib/grafana`
* Prometheus config: `${monk-volume-path}/prometheus/config`
* You can customize Prometheus scrape configs and Grafana dashboards via the mounted volumes.

## Features

### Prometheus

* Time-series metrics database
* Powerful PromQL query language
* Service discovery (Kubernetes, Docker, Consul, etc.)
* Pull-based metrics collection
* Alerting with AlertManager
* High availability and federation

### Grafana

* Beautiful, customizable dashboards
* Multiple data source support
* Templating and variables
* Alerting and notifications
* User management and RBAC
* Dashboard sharing and versioning

## Metrics Exposition

Expose metrics from your applications:

```python theme={null}
# Python example with prometheus_client
from prometheus_client import Counter, start_http_server

requests = Counter('http_requests_total', 'Total HTTP requests')

@app.route('/metrics')
def metrics():
    return generate_latest()
```

Configure Prometheus to scrape:

```yaml theme={null}
scrape_configs:
  - job_name: 'myapp'
    static_configs:
      - targets: ['api:8080']
```

## Alerting

Configure alerts in Prometheus:

```yaml theme={null}
groups:
  - name: example
    rules:
      - alert: HighErrorRate
        expr: rate(http_errors_total[5m]) > 0.05
        for: 10m
        annotations:
          summary: "High error rate detected"
```

## Use cases

This stack excels at:

* Application performance monitoring
* Infrastructure monitoring
* Real-time alerting
* Capacity planning
* SLA monitoring
* DevOps observability

## Related templates

* Use `alertmanager/` for advanced alerting and notification routing
* Integrate with `node-exporter/` for system metrics collection
* Combine with `loki/` for log aggregation and correlation

## Troubleshooting

* Access Prometheus targets at `http://localhost:9090/targets` to verify scrape status
* Check Grafana data sources in Settings → Data Sources
* Verify metrics are being scraped:

```bash theme={null}
# Query Prometheus API
curl 'http://localhost:9090/api/v1/query?query=up'
```

* Check logs:

```bash theme={null}
monk logs -l 500 -f prometheus-grafana/prometheus
monk logs -l 500 -f prometheus-grafana/grafana
```

* For missing metrics, verify:
  * Service is exposing metrics on the configured port
  * Prometheus can reach the target (check firewalls)
  * Scrape configuration is correct
* For Grafana dashboard issues, check data source configuration and time ranges
