Jobgether

Platform Engineer, MLOps

Jobgether

Overview

Role focused on building and maintaining infrastructure for AI/ML development and production.

Ideal candidate should have 5+ years of experience in managing core infrastructure for large-scale systems.

hybridseniorfull-timeKubernetesDockerGCPAWSAzureTerraformPythonbashgitPrometheusGrafanaPyTorch

Locations

  • United States, California, San Francisco

Requirements

  • 5+ years experience in infrastructure
  • Deep experience with Kubernetes and Docker
  • Expertise in cloud platforms (GCP, AWS, Azure)
  • Proficiency in Python and Bash scripting
  • Familiarity with ML frameworks like PyTorch
  • Experience with monitoring tools like Prometheus

Responsibilities

  • Develop and manage CI/CD pipelines
  • Set up and monitor logging and observability systems
  • Operate and optimize Kubernetes clusters
  • Manage containerization using Docker
  • Ensure high availability of training environments
  • Support MLOps infrastructure performance
  • Troubleshoot complex systems

Benefits

  • Generous paid time off
  • Comprehensive medical, dental, and vision insurance
  • 12 weeks paid parental leave
  • Fertility and family planning support
  • Flexible spending accounts
  • Annual stipends for home office setup
  • Competitive salary and stock options
  • 401(k) plan