Principal Site Reliability Engineer

Groupon

Overview

Role focused on ensuring performance, availability, and resilience of platforms.

Ideal candidate has 10+ years in systems engineering with expertise in cloud platforms and SRE.

remoteseniorTerraformAnsibleKubernetesDockerPythonGoBashPrometheusGrafanaELK Stack

Locations

  • Ecuador

Requirements

  • 10+ years in systems engineering
  • 5+ years in SRE or DevOps
  • Expertise in cloud platforms
  • Proficiency in programming languages
  • Advanced knowledge of IaC tools
  • Deep understanding of networking principles
  • Proven track record in high-availability systems
  • Exceptional analytical skills

Responsibilities

  • Architect and maintain fault-tolerant systems
  • Drive automation in infrastructure management
  • Create and optimize CI/CD pipelines
  • Build observability solutions
  • Collaborate on SLIs, SLOs, and error budgets
  • Lead incident response
  • Design performance testing and scalability strategies
  • Mentor junior engineers

Benefits

  • Cutting-edge technologies
  • Collaborative work culture
  • Professional growth opportunities
  • Impactful work