Skip to main content

Observability Implementation

Folder: .github/skills/tsh-implementing-observability/ Used by: DevOps Engineer

Provides patterns for logging, monitoring, alerting, and distributed tracing across services.

Three Pillars

PillarPurposeTools
MetricsQuantitative system health measurementPrometheus, CloudWatch, Datadog
LogsStructured event records for debuggingELK Stack, CloudWatch Logs, Loki
TracesRequest flow across service boundariesJaeger, Zipkin, AWS X-Ray, OpenTelemetry

Alerting Guidelines

SeverityResponse TimeExample
CriticalImmediate (page)Service down, data loss risk
WarningWithin hoursDisk >80%, elevated error rate
InfoNext business dayDeployment completed, scaling event

Structured Logging

  • Use JSON format for machine-parseable logs.
  • Include correlation IDs for request tracing.
  • Log at appropriate levels (ERROR, WARN, INFO, DEBUG).
  • Never log sensitive data (credentials, PII).

Dashboard Design

  • Start with the RED method: Rate, Errors, Duration.
  • Add business-specific KPIs and SLO tracking.
  • Use consistent layouts across services for familiarity.

Connected Skills

  • tsh-implementing-kubernetes — Monitoring Kubernetes workloads.
  • tsh-technical-context-discovering — Discover existing monitoring patterns.