A service mesh moves cross-cutting concerns — mutual TLS, retries, circuit breaking, distributed tracing — out of application code and into the infrastructure layer. Every service gets these behaviors for free, and security and traffic policies are applied consistently across the entire cluster without code changes.

Claude Code generates Istio configurations correctly — understanding the difference between VirtualService and DestinationRule, when to use traffic shifting vs weighted routing, and how to debug mesh connectivity issues systematically.

Installing Istio and Initial Setup

CLAUDE.md for Service Mesh Projects

## Istio Configuration
- Istio 1.21+ with Helm install (istio-base, istiod, istio-ingressgateway)
- Ambient mode NOT enabled — using sidecar injection
- Sidecar injection: namespace label `istio-injection: enabled` on all app namespaces
- mTLS mode: STRICT on all namespaces (no unencrypted service-to-service traffic)
- Observability stack: Prometheus + Grafana + Jaeger (standard Istio addons)

## Key Resources
- Gateway: controls ingress at cluster edge
- VirtualService: HTTP routing rules — retries, timeouts, fault injection, traffic splitting
- DestinationRule: load balancing, circuit breaking, connection pool settings, mTLS mode
- PeerAuthentication: mTLS policy (STRICT/PERMISSIVE/DISABLE per workload)
- AuthorizationPolicy: L7 RBAC — which services can call which endpoints

# Install Istio with production profile
istioctl install --set profile=production

# Enable sidecar injection for application namespaces
kubectl label namespace orders istio-injection=enabled
kubectl label namespace users istio-injection=enabled
kubectl label namespace products istio-injection=enabled

# Deploy standard observability addons
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.21/samples/addons/prometheus.yaml
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.21/samples/addons/grafana.yaml
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.21/samples/addons/jaeger.yaml

Traffic Management

VirtualService with Retries and Timeouts

The payment service is slow and sometimes returns 503.
Add retries for 503s and a 5-second timeout so users don't wait forever.

# virtualservice-payment.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: payment-service
  namespace: orders
spec:
  hosts:
    - payment-service
  http:
    - name: payment-route
      match:
        - uri:
            prefix: /api/payments
      timeout: 5s
      retries:
        attempts: 3
        perTryTimeout: 2s
        # Only retry on these conditions (don't retry on 4xx — those are client errors)
        retryOn: gateway-error,connect-failure,retriable-4xx,503
        # Retry on idempotent methods only
        retryRemoteLocalities: true
      route:
        - destination:
            host: payment-service
            port:
              number: 8080

Canary Deployment with Traffic Splitting

Deploy payment-service v2 to 10% of traffic.
If error rate is acceptable, shift to 50% then 100%.

# Step 1: Label both versions in your Deployment
# payment-service-v1 Deployment has labels: app: payment, version: v1
# payment-service-v2 Deployment has labels: app: payment, version: v2

# Step 2: DestinationRule defines subsets
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: payment-service
  namespace: orders
spec:
  host: payment-service
  trafficPolicy:
    # Connection pool limits (circuit breaker inputs)
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        h2UpgradePolicy: UPGRADE
        idleTimeout: 10m
        http1MaxPendingRequests: 1024
        http2MaxRequests: 1024
  subsets:
    - name: v1
      labels:
        version: v1
    - name: v2
      labels:
        version: v2
---
# Step 3: VirtualService splits traffic
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: payment-service
  namespace: orders
spec:
  hosts:
    - payment-service
  http:
    - route:
        - destination:
            host: payment-service
            subset: v1
          weight: 90
        - destination:
            host: payment-service
            subset: v2
          weight: 10

Monitor during canary:

# Check error rates per version
kubectl exec -it $(kubectl get pod -l app=prometheus -n istio-system -o jsonpath='{.items[0].metadata.name}') \
  -n istio-system -- curl -s 'http://localhost:9090/api/v1/query?query=sum(rate(istio_requests_total{destination_service="payment-service.orders.svc.cluster.local",response_code=~"5.."}[5m]))by(destination_version)/sum(rate(istio_requests_total{destination_service="payment-service.orders.svc.cluster.local"}[5m]))by(destination_version)'

Circuit Breaking

The inventory service sometimes gets overwhelmed.
Add circuit breaking so a slow inventory service doesn't
cascade failures to the order service.

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: inventory-service
  namespace: orders
spec:
  host: inventory-service
  trafficPolicy:
    outlierDetection:
      # Eject a host after 5 consecutive 5xx errors
      consecutiveGatewayErrors: 5
      consecutive5xxErrors: 5
      interval: 10s
      # Eject for at least 30 seconds
      baseEjectionTime: 30s
      # Maximum 50% of hosts can be ejected simultaneously
      maxEjectionPercent: 50
      # Minimum request volume before tracking outliers
      minHealthPercent: 30
    connectionPool:
      tcp:
        maxConnections: 50
        connectTimeout: 5s
      http:
        http1MaxPendingRequests: 100
        http2MaxRequests: 200
        # Reject requests immediately when circuit is open
        maxRequestsPerConnection: 10

With this configuration, Istio’s Envoy proxies automatically open the circuit when inventory-service is unhealthy, returning 503 to callers immediately rather than waiting for connections to an overloaded service.

Zero-Trust mTLS and Authorization

Enforce strict mTLS across all services.
The order service should only be callable by the API gateway.

Enforce STRICT mTLS per namespace:

# peer-authentication.yaml
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: orders
spec:
  mtls:
    mode: STRICT  # Reject all non-mTLS traffic

Authorization Policy — service-to-service access control:

# Deny all by default, then explicitly allow
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: deny-all
  namespace: orders
spec:
  {} # Empty spec = deny all

---
# Allow order-service to call payment-service:8080/api/payments
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: payment-service-allow-orders
  namespace: orders
spec:
  selector:
    matchLabels:
      app: payment-service
  action: ALLOW
  rules:
    - from:
        - source:
            # Only allow from order-service's service account
            principals:
              - "cluster.local/ns/orders/sa/order-service"
      to:
        - operation:
            ports: ["8080"]
            methods: ["POST"]
            paths: ["/api/payments*"]

---
# Allow ingress gateway to reach order-service
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: order-service-allow-ingress
  namespace: orders
spec:
  selector:
    matchLabels:
      app: order-service
  action: ALLOW
  rules:
    - from:
        - source:
            principals:
              - "cluster.local/ns/istio-system/sa/istio-ingressgateway-service-account"

Fault Injection for Testing

Test how the order service behaves when payment service is slow.
Inject a 3-second delay for 50% of requests to payment-service.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: payment-service-fault-test
  namespace: orders
spec:
  hosts:
    - payment-service
  http:
    - fault:
        delay:
          percentage:
            value: 50
          fixedDelay: 3s
        abort:
          percentage:
            value: 10
          httpStatus: 503
      route:
        - destination:
            host: payment-service

Run your load tests against the order service with this fault injection active to verify timeouts and circuit breakers work correctly. Remove the VirtualService when done.

Observability and Debugging

The order service is getting intermittent 503s from payment-service.
Help me debug what's happening at the mesh layer.

Check Envoy proxy config:

# Is the sidecar injected and healthy?
kubectl get pod -n orders -l app=order-service -o jsonpath='{.items[*].spec.containers[*].name}'
# Should show: order-service istio-proxy

# Check Envoy's view of payment-service routes
istioctl proxy-config routes deploy/order-service -n orders | grep payment

# Check Envoy cluster endpoints
istioctl proxy-config endpoints deploy/order-service -n orders | grep payment

# Are any hosts ejected (circuit open)?
istioctl proxy-config clusters deploy/order-service -n orders --fqdn payment-service.orders.svc.cluster.local -o json | \
  jq '.[].outlierDetection'

# Check Envoy access logs for the proxy sidecar
kubectl logs -n orders deploy/order-service -c istio-proxy --tail=50 | grep 503

Distributed tracing — follow a request through all services:

# Open Jaeger UI
istioctl dashboard jaeger

# In Jaeger: search for service "order-service.orders", filter to status code 500
# The trace tree shows exactly which downstream call is failing

Common issues Claude Code can diagnose:

I'm getting: upstream connect error or disconnect/reset before headers. reset reason: connection termination
What does this mean in Istio and how do I fix it?

This error from Envoy means the upstream service reset the TCP connection before sending a response. Common causes in Istio:

mTLS mismatch — one side enforcing STRICT, other sending plaintext → fix with PeerAuthentication
Connection pool exhausted → increase connectionPool.http.http1MaxPendingRequests
Service didn’t start properly → check pod health, not mesh configuration

For deploying Istio-enabled services on Kubernetes with Helm, see the Kubernetes guide and Helm charts guide. For mapping service dependencies before adding a mesh, the microservices guide covers dependency visualization. The Claude Skills 360 bundle includes service mesh skill sets for Istio configuration, zero-trust policies, and traffic management patterns. Start with the free tier to try mesh configuration generation.

Claude Code for Service Mesh: Istio, Traffic Management, and Zero-Trust Security

Installing Istio and Initial Setup

CLAUDE.md for Service Mesh Projects

Traffic Management

VirtualService with Retries and Timeouts

Canary Deployment with Traffic Splitting

Circuit Breaking

Zero-Trust mTLS and Authorization

Fault Injection for Testing

Observability and Debugging

Keep Reading

Claude Code for Crossplane: Kubernetes-Native Infrastructure

Claude Code for Ansible: Playbooks, Roles, and Idempotent Automation

Claude Code for HashiCorp Vault: Dynamic Secrets and PKI

Put these ideas into practice