Trino (formerly Presto SQL)

Overview

This template provides a production‑ready Trino instance as a Monk runnable. You can:

Run it directly to get a managed distributed SQL query engine
Inherit it in your own data infrastructure to query data across multiple sources

Trino (formerly Presto SQL) is a fast distributed SQL query engine for big data analytics. It allows you to query data where it lives, whether in Hadoop, S3, Cassandra, MySQL, or dozens of other data sources, using standard SQL.

What this template manages

Trino coordinator
Trino workers (optional, for distributed setup)
Web UI and query interface
Catalog configuration for data sources
Query execution engine
Web UI on port 8080

Quick start (run directly)

Load templates

monk load MANIFEST

Run Trino with defaults

monk run trino/trino

Customize configuration (recommended via inheritance)

Running directly uses the defaults defined in this template’s variables. Secrets added with monk secrets add will not affect this runnable unless you inherit it and reference those secrets.

Preferred: inherit and replace variables with secret("...") as shown below.
Alternative: fork/clone and edit the variables in trino/trino.yml, then monk load MANIFEST and run.

Once started:

Web UI: http://localhost:8080
Connect with Trino CLI or JDBC: jdbc:trino://localhost:8080/catalog/schema

Configuration

Key variables you can customize in this template:

variables:
  # Trino
  trino-image-tag: "latest"              # container image tag
  trino-port: "8080"                     # HTTP port (env: TRINO_PORT)
  discovery-uri: "http://localhost:8080" # discovery service URI
  
  # Performance
  query-max-memory: "5GB"                # max memory per query
  query-max-memory-per-node: "1GB"       # max memory per node
  heap-size: "2G"                        # JVM heap size

Data and configuration are persisted under ${monk-volume-path}/trino on the host. Custom catalog configurations are mounted from ${monk-volume-path}/trino/catalog:/etc/trino/catalog.

Use by inheritance (recommended for data platforms)

Inherit the Trino runnable in your data platform and declare connections. Example:

namespace: mydata
query-engine:
  defines: runnable
  inherits: trino/trino
  files:
    postgres-catalog:
      container: trino
      path: /etc/trino/catalog/postgresql.properties
      contents: |
        connector.name=postgresql
        connection-url=jdbc:postgresql://postgres:5432/mydb
        connection-user=<- secret("db-user")
        connection-password=<- secret("db-password")
api:
  defines: runnable
  containers:
    api:
      image: myorg/analytics-api
  connections:
    analytics:
      runnable: query-engine
      service: trino
  variables:
    trino-host:
      value: <- connection-hostname("analytics")
    trino-port:
      value: "8080"

Then set the secrets once and run your data platform group:

monk secrets add -g db-user="trino"
monk secrets add -g db-password="STRONG_PASSWORD"
monk run mydata/api

Ports and connectivity

Service: trino on TCP port 8080
Web UI: http://localhost:8080
JDBC: jdbc:trino://localhost:8080/catalog/schema
From other runnables in the same process group, use connection-hostname("\<connection-name>") to resolve the Trino host.

Persistence and configuration

Data path: ${monk-volume-path}/trino:/etc/trino
Catalog path: ${monk-volume-path}/trino/catalog:/etc/trino/catalog
You can drop additional catalog .properties files into the catalog path to configure data source connectors.

Features

Fast Queries: In-memory distributed execution
Federated Queries: Query across multiple data sources in one SQL
Standard SQL: ANSI SQL support
40+ Connectors: PostgreSQL, MySQL, S3, Hive, Kafka, Elasticsearch, etc.
Scalable: Add workers for horizontal scaling
No ETL: Query data where it lives, no data movement
BI Tool Integration: Tableau, Looker, Metabase, Superset

Supported Connectors

RDBMS: PostgreSQL, MySQL, Oracle, SQL Server
NoSQL: MongoDB, Cassandra, Redis
Cloud Storage: S3, GCS, Azure Blob
Data Lakes: Hive, Iceberg, Delta Lake
Streaming: Kafka, Kinesis
Search: Elasticsearch, OpenSearch
And 40+ more…

Catalog Configuration

Example PostgreSQL catalog (/etc/trino/catalog/postgresql.properties):

connector.name=postgresql
connection-url=jdbc:postgresql://postgres:5432/mydb
connection-user=trino
connection-password=secret

Example S3/Hive catalog:

connector.name=hive
hive.metastore.uri=thrift://hive-metastore:9083
hive.s3.aws-access-key=AKIAIOSFODNN7EXAMPLE
hive.s3.aws-secret-key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Querying with Trino

Connect with Trino CLI:

trino --server localhost:8080 --catalog postgresql --schema public

Run queries:

-- Query PostgreSQL
SELECT * FROM postgresql.public.users LIMIT 10;

-- Federated query across multiple sources
SELECT u.name, o.total
FROM postgresql.public.users u
JOIN mysql.sales.orders o ON u.id = o.user_id
WHERE o.created_at > DATE '2024-01-01';

-- Query S3 data lake
SELECT * FROM hive.default.events
WHERE dt = '2024-01-01';

Use cases

Trino excels at:

Data lake analytics
Federated queries across silos
Interactive analytics on big data
Ad-hoc SQL queries
BI and reporting on distributed data
ETL and data pipeline queries
Real-time analytics

Data warehouses: postgresql/, clickhouse/, for connecting to RDBMS
ETL tools: airflow/, dagster/ for data pipeline orchestration
BI platforms: Use Trino as a data source for Tableau, Looker, Metabase, or Superset

Troubleshooting

Access Web UI: Navigate to http://localhost:8080 to view running queries and statistics
Check catalog connections:

SHOW CATALOGS;
SHOW SCHEMAS FROM catalog_name;

Check logs:

monk logs -l 500 -f trino/trino

Slow queries: Check query plan with EXPLAIN statement
Connector issues: Verify catalog configuration files in ${monk-volume-path}/trino/catalog
Memory errors: Increase query-max-memory or query-max-memory-per-node variables
Resource monitoring: Use Web UI to monitor CPU, memory, and query performance
Query history: Review failed queries in Web UI for troubleshooting

Networking

CDN & DNS

Identity & Auth

Database

Compute

Serverless

Storage

Messaging

Devtools

Analytics Monitoring

Hosting & CI/CD

Payments & Billing

Cache

Web Server

Database Tools

Data Integration

Data Engineering

Communication

Infrastructure

CMS

Observability

DevOps

Big Data

API

Security

Monitoring

Analytics

Automation

Customer Support

Message Broker

Development

Search

AI/ML

Documentation

Social

​Overview

​What this template manages

​Quick start (run directly)

​Configuration

​Use by inheritance (recommended for data platforms)

​Ports and connectivity

​Persistence and configuration

​Features

​Supported Connectors

​Catalog Configuration

​Querying with Trino

​Use cases

​Related templates

​Troubleshooting