Overview

Bristol, Cambridge or London (central locations), UK

We’re hiring customer focused SRE’s and Systems Engineers/Developers to apply infrastructure support and site reliability engineering approaches to significant projects embracing emerging ML compute technology.

As a platform vendor and MLaaS provider, we offer you the opportunity to work across all sectors – including research organisations, universities, technology vendors and enterprises – encountering a diversity of ecosystems and best practices.

This team helps customers extend their data center and cloud provisioning ecosystems to incorporate our ML compute products, helps define and build pipelines for migration, refines production operations, and provides site reliability engineering expertise, automation and technical support throughout. Becoming hands-on a subject matter expert, you’ll empower others to develop new capabilities and accomplish things that were not previously possible, embracing emerging advances in machine intelligence.

Of particular interest are your skills applied to domains such as any of; site reliability engineering at scale; grid or cloud computing; HPC/scientific computing, OpenStack admin or development; data center orchestration; SDN/NFV; and/or developing Linux-based systems for novel IP-based protocols – or similar.

We’re hiring an all-new team, including a lead engineer, and we’ll be pleased to explore the possibilities with you.

We’re looking for:

  • Someone customer focused and solution oriented
  • A solid understanding of Computing, Maths or Engineering – accrued through formal education or equivalent applied practice
  • Linux configuration and management with shell scripting, Python or similar
  • Optionally; strong Python and/or C++ applied to Linux systems, infrastructure, or back-end development
  • Experience of configuring and managing hardware platforms, and infrastructure for clusters
  • Knowledge of Ethernet and IP networking standards
  • Production admin skills with two or more of; Kubernetes, Docker, Grid Engine, Slurm, OpenStack, public/private cloud etc.
  • Comfortable debugging across multi-layer solutions
  • Familiarity with modern CI/CD and orchestration methods
  • An aptitude for trouble-shooting and a pragmatic application of engineering rigour: from the basic symptoms through to analysis and resolution with code fixes, work-arounds, improved documentation, tutorials, and collaboration with other teams

Source: Python.org Jobs Feed