Grafana Loki provides cost-efficient log aggregation by indexing only log labels — not log content — keeping storage costs low while enabling fast label-based filtering. LogQL combines label matchers with filter expressions and metric queries over log streams. Grafana visualizes both Loki logs and Prometheus metrics with unified dashboards. Dashboard as code via JSON model means dashboards live in version control. Alerting rules in YAML fire on log patterns and metrics. Promtail agents on each node tail log files and push to Loki. Claude Code generates structured logging configurations, LogQL queries, Grafana dashboard JSON, Promtail pipeline stages, and the alerting rule YAML for production observability stacks.

CLAUDE.md for Loki/Grafana Stack

## Observability Stack
- Loki >= 3.2, Grafana >= 11, Promtail >= 3.2 (or Alloy/Vector as shipper)
- Log format: structured JSON with consistent field names across services
- Labels: cluster, namespace, app, pod — keep cardinality LOW (<100 values per label)
- LogQL: use label matchers first (fast), then regex (slow) — same principle as SQL indexes
- Dashboards: commit as JSON to git, use Grafonnet or raw JSON, never click-to-create in prod
- Alerts: use Grafana alerting with Loki data source for log-based alerts
- Retention: configure per-tenant limits in Loki ruler config

Structured Logging

# logging/structured_logger.py — produce Loki-friendly JSON logs
import logging
import json
import sys
import traceback
from datetime import datetime, timezone
from contextvars import ContextVar

# Context vars: automatically included in all log lines
request_id: ContextVar[str] = ContextVar('request_id', default='')
user_id: ContextVar[str] = ContextVar('user_id', default='')


class StructuredFormatter(logging.Formatter):
    """Emit JSON log lines with consistent fields for Loki parsing."""

    SERVICE_NAME = "order-service"
    VERSION = "1.0.0"

    def format(self, record: logging.LogRecord) -> str:
        log_entry = {
            "timestamp": datetime.fromtimestamp(record.created, tz=timezone.utc).isoformat(),
            "level": record.levelname,
            "logger": record.name,
            "message": record.getMessage(),
            "service": self.SERVICE_NAME,
            "version": self.VERSION,
        }

        # Add context vars
        if req_id := request_id.get():
            log_entry["request_id"] = req_id
        if uid := user_id.get():
            log_entry["user_id"] = uid

        # Add structured fields from extra={}
        for key, value in record.__dict__.items():
            if key not in {
                "args", "asctime", "created", "exc_info", "exc_text",
                "filename", "funcName", "id", "levelname", "levelno",
                "lineno", "message", "module", "msecs", "msg", "name",
                "pathname", "process", "processName", "relativeCreated",
                "stack_info", "thread", "threadName", "taskName",
            }:
                log_entry[key] = value

        # Include exception info
        if record.exc_info:
            log_entry["exception"] = {
                "type": record.exc_info[0].__name__ if record.exc_info[0] else None,
                "message": str(record.exc_info[1]),
                "traceback": traceback.format_exception(*record.exc_info),
            }

        return json.dumps(log_entry, default=str)


def setup_logging(level: str = "INFO") -> None:
    """Configure structured logging for the application."""
    root_logger = logging.getLogger()
    root_logger.setLevel(getattr(logging, level))

    handler = logging.StreamHandler(sys.stdout)
    handler.setFormatter(StructuredFormatter())
    root_logger.handlers = [handler]


# Usage in application code
logger = logging.getLogger("orders")

def create_order(customer_id: str, amount: int) -> dict:
    logger.info(
        "Creating order",
        extra={
            "customer_id": customer_id,
            "amount_cents": amount,
            "event": "order.create.start",
        }
    )

    try:
        order = _do_create(customer_id, amount)

        logger.info(
            "Order created",
            extra={
                "order_id": order["id"],
                "customer_id": customer_id,
                "amount_cents": amount,
                "event": "order.create.success",
                "duration_ms": 42,
            }
        )
        return order

    except Exception as e:
        logger.error(
            "Order creation failed",
            extra={
                "customer_id": customer_id,
                "error_type": type(e).__name__,
                "event": "order.create.failure",
            },
            exc_info=True,
        )
        raise

LogQL Queries

# Common LogQL patterns for application debugging

# 1. Filter logs by label (fast — index scan)
{app="order-service", namespace="production"}

# 2. Filter by level within a stream
{app="order-service"} | json | level="ERROR"

# 3. Filter by specific field value
{app="order-service"} | json | event="order.create.failure"

# 4. Regex filter on message
{app="order-service"} |~ "payment.*failed"

# 5. Count errors per minute (metric query)
rate({app="order-service"} | json | level="ERROR" [5m])

# 6. 95th percentile order creation latency
quantile_over_time(0.95,
  {app="order-service"}
    | json
    | event="order.create.success"
    | unwrap duration_ms [5m]
) by (pod)

# 7. Top error types in last hour
topk(10,
  sum by (error_type) (
    count_over_time(
      {app="order-service"} | json | level="ERROR" [1h]
    )
  )
)

# 8. Request rate by endpoint with status codes
sum by (path, status_code) (
  rate(
    {app="api-gateway"} | json | event="request.complete" [1m]
  )
)

Promtail Configuration

# promtail-config.yaml — tail logs and push to Loki
server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push
    tenant_id: default
    batchwait: 1s
    batchsize: 1048576

scrape_configs:
  # Kubernetes pod logs
  - job_name: kubernetes-pods
    kubernetes_sd_configs:
      - role: pod
    pipeline_stages:
      # Parse JSON logs
      - json:
          expressions:
            level: level
            message: message
            event: event
            request_id: request_id
            duration_ms: duration_ms

      # Promote parsed fields to labels (KEEP CARDINALITY LOW)
      - labels:
          level:
          event:

      # Set log timestamp from parsed field
      - timestamp:
          source: timestamp
          format: RFC3339Nano

    relabel_configs:
      - source_labels: [__meta_kubernetes_namespace]
        target_label: namespace
      - source_labels: [__meta_kubernetes_pod_name]
        target_label: pod
      - source_labels: [__meta_kubernetes_pod_label_app]
        target_label: app
      - source_labels: [__meta_kubernetes_pod_container_name]
        target_label: container

Grafana Dashboard JSON

{
  "title": "Order Service Dashboard",
  "uid": "order-service-v1",
  "refresh": "30s",
  "time": {"from": "now-1h", "to": "now"},
  "templating": {
    "list": [
      {
        "name": "namespace",
        "type": "query",
        "datasource": {"type": "loki", "uid": "loki"},
        "query": "label_values(namespace)",
        "label": "Namespace"
      }
    ]
  },
  "panels": [
    {
      "title": "Error Rate",
      "type": "timeseries",
      "gridPos": {"x": 0, "y": 0, "w": 12, "h": 8},
      "targets": [
        {
          "datasource": {"type": "loki", "uid": "loki"},
          "expr": "sum(rate({app=\"order-service\", namespace=\"$namespace\"} | json | level=\"ERROR\" [5m]))",
          "legendFormat": "Errors/sec"
        }
      ]
    },
    {
      "title": "Order Creation Latency (p95)",
      "type": "gauge",
      "gridPos": {"x": 12, "y": 0, "w": 6, "h": 8},
      "targets": [
        {
          "datasource": {"type": "loki", "uid": "loki"},
          "expr": "quantile_over_time(0.95, {app=\"order-service\"} | json | unwrap duration_ms [5m])",
          "legendFormat": "p95 ms"
        }
      ],
      "fieldConfig": {
        "defaults": {
          "thresholds": {
            "steps": [
              {"color": "green", "value": null},
              {"color": "yellow", "value": 500},
              {"color": "red", "value": 2000}
            ]
          }
        }
      }
    },
    {
      "title": "Recent Errors",
      "type": "logs",
      "gridPos": {"x": 0, "y": 8, "w": 24, "h": 12},
      "targets": [
        {
          "datasource": {"type": "loki", "uid": "loki"},
          "expr": "{app=\"order-service\", namespace=\"$namespace\"} | json | level=\"ERROR\"",
          "maxLines": 100
        }
      ]
    }
  ]
}

Alert Rules

# alerts/order-service.yaml — Grafana alerting rules
apiVersion: 1

groups:
  - name: order-service
    folder: Application Alerts
    interval: 1m
    rules:
      - uid: high-error-rate
        title: High Error Rate
        condition: A
        data:
          - refId: A
            queryType: range
            relativeTimeRange:
              from: 300
              to: 0
            datasourceUid: loki
            model:
              expr: |
                sum(rate({app="order-service"} | json | level="ERROR" [5m])) > 0.1
        noDataState: OK
        execErrState: Error
        for: 2m
        annotations:
          summary: "Order service error rate above 0.1/s"
          runbook: "https://runbooks.internal.com/order-service-errors"
        labels:
          severity: warning
          team: platform

      - uid: order-creation-failures
        title: Order Creation Failures Spike
        condition: A
        data:
          - refId: A
            datasourceUid: loki
            model:
              expr: |
                sum(count_over_time({app="order-service"} | json | event="order.create.failure" [5m])) > 10
        for: 1m
        annotations:
          summary: "More than 10 order creation failures in 5 minutes"
        labels:
          severity: critical

For the Prometheus metrics and alerting stack that pairs with Loki for a complete metrics + logs observability platform, see the OpenTelemetry guide for tracing, metrics, and log collection with the OTel Collector. For the structured logging and tracing integration that feeds into Loki, the observability guide covers distributed tracing and correlation IDs. The Claude Skills 360 bundle includes Loki/Grafana skill sets covering LogQL queries, dashboard JSON, and Promtail configuration. Start with the free tier to try Grafana dashboard generation.

Claude Code for Loki and Grafana: Centralized Logging and Dashboards as Code

CLAUDE.md for Loki/Grafana Stack

Structured Logging

LogQL Queries

Promtail Configuration

Grafana Dashboard JSON

Alert Rules

Keep Reading

Claude Code for Vite Plugins: Custom Build Transforms and Dev Server Extensions

Claude Code for SST Ion: Full-Stack Apps on AWS with Infrastructure as TypeScript

Claude Code for Biome: Fast Linting and Formatting for JavaScript

Put these ideas into practice