Infra Ops Tech Lead

Lead Infrastructure & Operations Engineer – AI/GPU Cloud (US, Remote)

We're hiring on behalf of a fast-scaling, AI-native GPU Cloud provider for a hands-on technical leader to build and run their US infrastructure operation.
This is a player-manager role for someone who wants to stay technical while stepping into leadership. You'll lead a team of three infrastructure engineers (with room to grow), working US afternoons with your team and US mornings in close collaboration with the UK Infrastructure Operations Manager and broader global team — true follow-the-sun coverage.

What you'll be doing:

Acting as the senior technical voice for US infrastructure — designing, deploying, and troubleshooting HPC/AI/GPU compute, storage, and networking
Leading incident response, driving SRE practices (SLOs/SLIs, observability, automation)
Owning capacity planning, deployment velocity, and 99.9% uptime for US-region services
Mentoring and growing your team — 1:1s, KPIs, career development
Coordinating 24/7 on-call coverage and seamless handoffs with the UK team

What you'll bring:

6 years in infrastructure/platform operations, 2 years leading a team with direct reports
Deep hands-on expertise in high-scale infrastructure with high-performance networking (InfiniBand, RoCE, low-latency design)
Strong SRE/DevOps chops: Terraform, Ansible, Kubernetes, Prometheus, Grafana, Python/Bash
Cloud experience (AWS, GCP, or Azure)
The ability to lead through ambiguity, communicate across regions, and translate complex tech for non-technical stakeholders

If you're excited by cutting-edge AI infrastructure, want real ownership of a growing team, and thrive working across time zones with a high-caliber global ops group — we'd love to talk.

NOTE: Candidates must have at least Green Card status to be eligible for this role.

APPLY HERE