Platform Engineer, MLOps
Writer
Overview
Role involves deploying and managing infrastructure for AI/ML operations and collaborating with engineers to develop CI/CD pipelines.
Ideal candidate has professional experience with model training and large-scale ML systems, and is skilled in troubleshooting complex systems.
hybridfull-timeEnglishDockerKubernetesPyTorchTerraformPythonbashGoogle CloudAWSAzuregitGitHubPrometheusGrafana+ 4 more
Locations
United States, California, San Francisco
Requirements
5+ years building core infrastructure Experience running inference clusters at scale Experience operating orchestration systems such as Kubernetes at scale
Responsibilities
Design and deploy CI/CD pipeline Manage monitoring, logging, and alerting systems Ensure training environments are available Develop containerization and orchestration systems Operate large Kubernetes clusters Improve software solutions reliability Measure and optimize system performance Provide operational support for software applications
Benefits
Medical, dental, and vision coverage Fertility and family planning support Flexible spending account Annual work-life stipends