Lambda

SRE - Observability (Senior)

Lambda

Overview

Role focused on deploying and operating observability platforms for AI systems.

Ideal candidate has 8+ years in software engineering with strong Site Reliability Engineering experience.

267k usd / yearhybridseniorpermanentfull-timeEnglishKubernetesPrometheusAnsibleTerraformOpenTelemetry

Locations

  • United States, California, San Francisco

Requirements

  • 8+ years experience
  • 3+ years in Go
  • 5+ years in Site Reliability Engineering
  • Experience with Kubernetes
  • Experience building CI/CD pipelines

Responsibilities

  • Deploy and operate observability platforms
  • Automate observability systems
  • Set up monitoring for AI/HPC clusters
  • Develop platform software for observability
  • Lead engineering teams for monitoring solutions

Benefits

  • Health, dental, and vision coverage
  • Wellness and Commuter stipends
  • 401k Plan with 2% company match
  • Flexible Paid Time Off Plan