Leave us your email address and we'll send you all the new jobs according to your preferences.

Principle SRE

Posted 3 hours 11 minutes ago by Barclays

£90,000 - £120,000 Annual
Permanent
Full Time
Other
London, United Kingdom
Job Description
Role Overview

The Principal Site Reliability Engineer will be a senior technical expert responsible for driving end-to-end resilience, reliability, and scalability across our mission-critical payments platform. This role focuses on front-to-back payment flows, ensuring systems are designed for fault tolerance, observability, and operational excellence.

You will perform deep technical reviews, troubleshoot complex issues, and define patterns for resiliency by design. As a hands on engineer, you will collaborate with development and production support teams, advocate chaos engineering, and build a culture of designing for failure. This position requires strong technical breadth across infrastructure, applications, networks, databases, and integrations, combined with expertise in modern reliability engineering practices.

Key Responsibilities
  • Reliability Engineering Leadership
    • Drive strategies to improve reliability, maintainability, and scalability across payment flows and platform components.
  • Architecture & Design Reviews
    • Conduct deep technical assessments of system architectures, identifying risks and recommending improvements for fault tolerance and disaster recovery.
  • Incident Management & Root Cause Analysis
    • Act as a senior escalation point for production incidents, lead RCA, and implement permanent fixes to prevent recurrence.
  • Resiliency by Design
    • Define and enforce reliability patterns, frameworks, and best practices; ensure adoption across engineering teams.
  • Chaos Engineering & Failure Testing
    • Advocate and implement chaos engineering principles to validate system resilience under real-world failure scenarios.
  • Observability & Monitoring
    • Design and implement full-stack observability solutions, including metrics, logging, distributed tracing, and alerting.
  • Automation & Tooling
    • Develop automation for failover, capacity management, and self healing mechanisms to reduce operational risk.
  • Collaboration
    • Partner with development, infrastructure, and production support teams to embed reliability into the SDLC.
  • Continuous Improvement
    • Analyze service risk assessments and production incidents to identify systemic issues and drive long-term improvements.
  • Culture Building
    • Promote operational excellence and a mindset of designing for failure across all engineering teams.
Required Skills & Experience
  • Technical Expertise
    • 12+ years in software engineering or infrastructure roles, with at least 5 years focused on reliability engineering or SRE.
    • Proven experience building and operating fault tolerant, highly available systems at scale.
  • Architecture & Design
    • Strong knowledge of distributed systems, resiliency patterns (circuit breakers, retries, failover), and disaster recovery strategies.
  • Technical Breadth
    • Expertise across infrastructure (compute, storage, networking), application architecture, databases, and integration patterns.
  • Problem Solving
    • Ability to troubleshoot complex technical issues across distributed systems and perform deep root cause analysis.
  • Collaboration & Influence
    • Skilled at working with development, operations, and architecture teams to embed reliability into design and delivery.
Purpose of the Role

To drive technical excellence and innovation by leading the design and implementation of robust software solutions, providing mentorship to engineering teams, fostering cross functional collaboration, and contributing to strategic planning to ensure the delivery of high quality solutions aligned with business objectives.

Accountabilities
  • Provision of guidance and expertise to engineering teams to ensure alignment with best practices and foster a culture of technical excellence.
  • Contribution to strategic planning by aligning technical decisions with business goals, anticipating future technology trends, and providing insights to optimize product roadmaps.
  • Design and implementation of complex, scalable, and maintainable software solutions, considering long term viability and business objectives.
  • Mentoring and coaching to junior and mid level engineers to foster professional growth and knowledge sharing, elevating the overall skillset and capabilities of the organization.
  • Collaboration with business partners, product managers, designers, and other stakeholders to translate business requirements into technical solutions and ensure a cohesive approach to product development.
  • Innovation within the organization by identifying and incorporating new technologies, methodologies, and industry practices into the engineering process.
Email this Job