GPU Cloud Platform Engineer

Yotta Labs

Overview

Role involves designing, deploying, and operating large-scale GPU infrastructure for AI workloads.

Ideal candidate should have 5+ years in cloud-native development with strong Kubernetes experience.

remotemidfull-timeEnglishKubernetesDockerPrometheusGrafanaAWSGCPAzureHelm

Locations

  • Canada
  • Argentina
  • United States
  • Brazil
  • Mexico

Requirements

  • Bachelor's degree in relevant field
  • 3+ years in system engineering or DevOps
  • 5+ years in cloud-native development or AI engineering
  • 2+ years in Kubernetes multi-cluster management
  • Familiarity with Kubernetes ecosystem
  • Proficient in Docker and containerization
  • Experience with monitoring tools like Prometheus and Grafana
  • Hands-on experience with cloud platforms like AWS, GCP, or Azure

Responsibilities

  • Build and operate large-scale GPU clusters
  • Conduct performance testing of GPU clusters
  • Deploy large models across multi-cluster environments
  • Participate in GPU cluster scheduling and optimization
  • Build a unified multi-cluster management system
  • Coordinate with IDC providers for GPU clusters

Benefits

  • Flexible remote work environment
  • Opportunity to work on cutting-edge technologies
  • Collaboration with experts from leading institutions
  • Visionary team aiming to redefine AI infrastructure