FluidStack

Site Reliability Engineer

FluidStack

Overview

Role involves ensuring the reliability and performance of GPU cloud infrastructure.

Ideal candidate has 2+ years of relevant experience and strong communication skills.

remotemidpermanentfull-timeEnglishKubernetesAnsibleTerraformGoPythonBash

Locations

  • United States, California, San Francisco
  • United States, California, London

Requirements

  • 2+ years of SRE, DevOps, Sysadmin, or HPC experience
  • Experience deploying and operating Kubernetes and/or SLURM clusters
  • Strong engineering background in relevant fields

Responsibilities

  • Ensure reliability and performance of GPU cloud
  • Deploy clusters of GPUs
  • Debug production issues
  • Build internal tooling for deployment
  • Participate in on-call rotation

Benefits

  • Competitive compensation package
  • Retirement plan
  • Health, dental, and vision insurance
  • Generous PTO policy
  • Access to WeWork for remote locations