Overview
Role involves advising customers and managing their AI workloads throughout the lifecycle.
Ideal candidate has 2+ years of relevant experience and strong communication skills.
remotemidpermanentfull-timeEnglishKubernetesGoPythonBashAnsibleTerraform
Locations
United States, California, San Francisco United States, California, London
Requirements
2+ years of SWE, SRE, DevOps, Sysadmin, or HPC experience Experience deploying and operating Kubernetes and/or SLURM clusters Strong engineering background in relevant fields
Responsibilities
Deploy clusters of 1,000+ GPUs Validate compute, storage, and networking infrastructure Migrate petabytes of data Debug issues across the stack Build internal tooling for deployment efficiency Support customers in on-call rotation
Benefits
Competitive compensation package Health, dental, and vision insurance Remote work with WeWork access