Senior HPC Technical Manager

Posted 1 day 3 hours ago by Durham University

£50,000 - £70,000 Annual
Permanent
Full Time
I.T. & Communications Jobs
Tyne And Wear, Gateshead, United Kingdom, NE9 7YE
Job Description

We welcome applications from candidates with disabilities, neurodiversity and long term health conditions, and we are committed to ensuring fair treatment throughout the recruitment process.

We will make adjustments to support the recruitment and interview process wherever it is reasonable to do so, and, where successful, reasonable adjustments will be made to support people within their role.

If you are unable to complete your application via our recruitment system or would like to discuss any reasonable adjustments to support you in the application process, please get in touch with us at .

Job Description

Senior HPC Technical Manager (Job Number: )

Department of Physics - Institute of Computational Cosmology (ICC)

Contract: Fixed Term, Full Time - 24 month

Working Arrangements: Full time; flexible working, job shares or part time are considered.

Closing Date: 04 Jan 2026

Role Overview

This role is primarily responsible for the operation and ongoing development of the COSMA High Performance Computing (HPC) system, supporting innovative research and development projects. You will provide expert user support and ensure smooth day to day operations, while leading both software and hardware initiatives that will shape the future of UK HPC. Hands on experience with OpenStack, Ansible, containerisation platforms such as Docker and Kubernetes, traditional HPC server administration, Bash and Python scripting is essential. The position offers training, professional development, conference participation, and collaboration with other UK HPC facilities.

Key Responsibilities
  • Operate and develop the COSMA HPC system, maintaining system efficiency and reliability.
  • Provide expert user support, troubleshooting, and workflow automation.
  • Collaborate with COSMA staff and DiRAC technical support teams across the UK for routine and emergency maintenance.
  • Engage with DiRAC researchers to optimise codes for efficient execution on supercomputing facilities.
  • Lead software and hardware initiatives that shape the future of UK HPC.
  • Ensure compliance with regulatory and organisational policy and guidelines.
  • Participate in training, professional development and HPC events; liaise with other UK HPC facilities.
  • Contribute to strategic planning, risk assessment, safety procedures and health & safety management.
  • Maintain positive working relationships with internal and external stakeholders.
  • Develop, audit and implement specialist risk assessments and safety procedures.
Qualifications / Experience (Essential)
  • Educated to degree level (or equivalent experience).
  • Significant Linux and/or HPC expertise, command line operation.
  • Experience operating in large HPC environments such as DiRAC or COSMA.
  • Expertise in OpenStack, Ansible, Docker, Kubernetes, SLURM, Lustre, Linux system management, software installation from source.
  • Experience implementing policies & procedures and supporting service improvements.
  • Strong Bash & Python scripting skills.
  • Experience providing specialist advice to users and colleagues.
  • Resident, or intend to reside within 30 min of the Physics department.
  • Physical ability to access and manipulate computer and data centre hardware.
Skills / Abilities / Knowledge (Essential)
  • Excellent written and spoken communication skills.
  • Competence across digital devices & applications: Linux administration, code compilation, job scheduling (SLURM), scripting, HPC tools, networking & storage.
  • Problem solving, decision making and strategic planning.
  • Knowledge of regulatory compliance and organisational guidelines.
  • Ability to teach or train others with technical skills.
  • Experience in managing ongoing professional development.
Desirable Criteria
  • End user engagement & user support experience.
  • Experience with HPC - computing, storage, networking.
  • OpenStack, Docker, Ansible, Kubernetes expertise.
  • Security knowledge - securing systems against attacks.
  • Adaptability to deadlines & unexpected issues.
  • Web development frameworks experience.
  • Virtualisation & container experience.
  • Python & Bash fluency.
  • Project management and sub project implementation experience.
  • Data interpretation & information processing.
  • Server maintenance & safe hardware manipulation.
  • Long term strategic planning experience.
  • Database experience.
  • Data curation techniques.
  • Developing solutions to HPC related problems.
  • Management & development experience across a large technical service team.
  • SLURM scheduler experience.
  • Lustre storage solutions experience.
How to Apply

To progress to the assessment stage, candidates must provide evidence of each essential criterion in the person specification. The panel may also consider desirable criteria.

We prefer applications online. We will update candidates throughout the selection process via automated emails. Please check your spam/junk folder.

What You Need to Submit
  • A CV.
  • A cover letter with a supporting statement outlining how you meet the criteria with examples.

For a chat about the role or any further information, please contact Alastair Basden, .

Equal Opportunity Statement

Durham University is committed to equality, diversity and inclusion. We welcome applications from groups under represented in our workforce, including people with disabilities, women and the black, Asian and minority ethnic communities. We actively work toward a supportive and inclusive environment.