Overview
Key role in transitioning to Site Reliability Engineering model, focusing on reliability and efficiency.
Ideal candidate has solid understanding of SRE principles and hands-on experience with cloud platforms.
remotePrometheusGrafanaELK StackTerraformAnsibleAWSGCPAzureDockerKubernetes
Locations
United States, California United States, Washington
Requirements
DevOps, Cloud Operations, or SRE expertise Advanced Linux internals expertise Proficiency in programming languages like Go or Python Strong scripting skills in Python, Bash, or Go Extensive experience with cloud platforms like AWS, GCP, and Azure Hands-on experience with Docker and Kubernetes is a nice to have Strong problem-solving skills and collaboration abilities
Responsibilities
Enhance system monitoring Automate deployments and workflows Manage cloud infrastructure Support incident response and post-mortem Collaborate with cross-functional teams Drive technical innovation
Benefits
Professional development budget Remote work travel program