Overview
Role focused on ensuring reliability in model training and scaling processes.
Ideal candidate should have extensive experience with cloud platforms and SRE tools.
hybridEnglishGoogle Cloud PlatformDockerKubernetesTerraformCI/CDGrafanaPrometheusPython
Locations
United Kingdom, England, London
Requirements
Experience with Google Cloud Platform Experience with monitoring & alerting systems Experience with Docker, Kubernetes, Terraform Experience with CI/CD and DevOps technologies Experience with Grafana/Prometheus/Splunk Comfortable with Shell Scripting and Python Experience building secure/scalable platforms Practical experience with modern SRE lifecycle
Responsibilities
Contribute to technical strategy for reliability Build and operate software development processes Manage research and production software Handle cloud infrastructure and systems Collaborate with diverse teams Contribute to core technical decisions