SRE - Observability (Senior)
Lambda
Overview
Role focused on deploying and operating observability platforms for AI systems.
Ideal candidate has 8+ years in software engineering with strong Site Reliability Engineering experience.
267k usd / yearhybridseniorpermanentfull-timeEnglishKubernetesPrometheusAnsibleTerraformOpenTelemetry
Locations
United States, California, San Francisco
Requirements
5+ years in Site Reliability Engineering Experience with Kubernetes Experience building CI/CD pipelines
Responsibilities
Deploy and operate observability platforms Automate observability systems Set up monitoring for AI/HPC clusters Develop platform software for observability Lead engineering teams for monitoring solutions
Benefits
Health, dental, and vision coverage Wellness and Commuter stipends 401k Plan with 2% company match Flexible Paid Time Off Plan