đ Site Reliability Engineer â Future-Proof Your Career in HPC đ Location: Remote (Global team, global scale) Type: Full-timeVisa Sponsorship: No Do you get a buzz out of making complex systems hum at scale? Do you secretly enjoy shaving milliseconds off performance bottlenecks? Are you the type who believes âif itâs not automated, itâs brokenâ? If so, read on⊠Weâre on the hunt for a Site Reliability Engineer to join a high-growth team running mission-critical compute environments across the globe. This isnât your average SRE gig â youâll be working at the bleeding edge of infrastructure where uptime, throughput, and ultra-low latency arenât just buzzwords, theyâre survival tactics. Why Youâll Love This Roleđ Impact at scale â your work underpins globally distributed, performance-intensive environments used by some of the most demanding workloads on earth.đ§ Hands-on depth â from tuning kernels and optimizing I/O to running Kubernetes clusters and automating everything in sight.đ§ Upskill into HPC â get cross-trained in High Performance Computing, supported by world-class partners (yes, that includes the big names like NVIDIA đ).đ Mentorship & growth â youâll both learn and lead, helping junior engineers level up while expanding your own mastery in HPC and GPU-powered infrastructure.What Youâll Be DoingDesigning, running, and evolving resilient, scalable infrastructure.Optimizing Linux systems down to the BIOS, kernel, and hardware subsystems.Automating infrastructure lifecycles and simplifying troubleshooting through code.Keeping things observant with Prometheus, Grafana, and custom monitoring wizardry.Working shoulder-to-shoulder with HPC engineers to bring next-gen performance into production.Playing a key role in 24/7 operations, but backed by a team thatâs big on collaboration and smart rotations.What You BringDeep Linux wizardry (Ubuntu is your second home).Strong scripting/automation chops (Python, Bash, Ansible).Experience running orchestration platforms (Kubernetes, MAAS, etc.).Networking fluency: TCP/IP, DNS, VLANs, routing, switching.A knack for explaining complex technical decisions to both engineers and non-techies.Bonus points if youâve dabbled in HPC, GPU infra, or InfiniBand (but donât worry if not â weâll help you get there).Perks & CultureRemote-first and flexible, because great engineers live everywhere.A team that values do â document â automate.An environment that celebrates diversity, curiosity, and real talk.The chance to shape the future of infrastructure where AI and HPC collide.
Francis Alexander