We're hiring on behalf of a fast-scaling, AI-native GPU Cloud provider for a hands-on technical leader to build and run their US infrastructure operation.
This is a player-manager role for someone who wants to stay technical while stepping into leadership. You'll lead a team of three infrastructure engineers (with room to grow), working US afternoons with your team and US mornings in close collaboration with the UK Infrastructure Operations Manager and broader global team — true follow-the-sun coverage.
What you'll be doing:
- Acting as the senior technical voice for US infrastructure — designing, deploying, and troubleshooting HPC/AI/GPU compute, storage, and networking
- Leading incident response, driving SRE practices (SLOs/SLIs, observability, automation)
- Owning capacity planning, deployment velocity, and 99.9% uptime for US-region services
- Mentoring and growing your team — 1:1s, KPIs, career development
- Coordinating 24/7 on-call coverage and seamless handoffs with the UK team
- 6 years in infrastructure/platform operations, 2 years leading a team with direct reports
- Deep hands-on expertise in high-scale infrastructure with high-performance networking (InfiniBand, RoCE, low-latency design)
- Strong SRE/DevOps chops: Terraform, Ansible, Kubernetes, Prometheus, Grafana, Python/Bash
- Cloud experience (AWS, GCP, or Azure)
- The ability to lead through ambiguity, communicate across regions, and translate complex tech for non-technical stakeholders
NOTE: Candidates must have at least Green Card status to be eligible for this role.
