Previous Work Experience 5 yrs
Key Skills &
Requirements
Scale Kubernetes & cloud infra, GPU workloads, CI/CD, and observability.
Partner closely with ML teams to optimize inference and vector DB
deployments.
Role Responsibilities
• Maintain Kubernetes/AWS/Azure clusters, GPU infra, and CI/CD pipelines.
• Implement monitoring (Prometheus/Grafana) and logging systems.
• Support vector DB scaling and model deployment.
• Collaborate with ML teams for optimized inference.
Requirements
• 5+ years of DevOps or SRE experience.
• Strong knowledge of Docker, Terraform, CI/CD, cloud platforms.
Familiarity with ML workloads a plus.