Leave us your email address and we'll send you all the new jobs according to your preferences.
Site Reliability Engineer
Posted 3 days 21 hours ago by Orgvue Limited
Orgvue is an organisational design and planning platform that empowers your business to transform its workforce by understanding the work people do and the skills they have. Our platform connects strategy to structure, providing clarity of vision, so you can build a more adaptable, better performing organisation that thrives in a constantly changing world of work.
The world's largest and best-known enterprises and consulting firms use Orgvue to visualise and model current and future states of the organisation and make faster, more informed decisions. The company is headquartered in London, with offices in Philadelphia, The Hague, Toronto, and Sydney.
Role: Principal Site Reliability Engineer
You will be a senior technical leader focused on scaling and hardening our AWS- and Kubernetes-based infrastructure. You will collaborate across product, platform, and operations teams to ensure our systems are reliable, observable, and resilient - even at scale.
This role combines hands-on technical skills with strategic vision, helping us build a world-class reliability culture and a robust engineering foundation for growth. We seek someone with technical expertise, excellent communication skills, and a collaborative spirit.
Responsibilities:- Define and enforce SLOs, SLIs, and error budgets across critical services
- Develop and implement cloud infrastructure and tooling strategies
- Enhance SRE practices across the organization
- Implement robust observability metrics, logs, and traces using our observability tools
- Guide the team in building automated, self-healing systems
- Own and evolve incident response processes, including on-call practices and post-mortem culture
- Mentor engineers on reliability, operational readiness, and scalable infrastructure best practices
- Drive Infrastructure as Code (IaC) initiatives using Terraform, Kubernetes, CloudFormation, and GitOps practices
- Collaborate with security, DevOps, and software teams to ensure compliance and operational excellence
- Evaluate and adopt tools and practices to improve platform performance and reliability
- Experience leading SRE transformations
- Hands-on expertise with Kubernetes (EKS preferred) in production
- Strong experience with AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.)
- Proficiency in Infrastructure as Code using Terraform and knowledge of GitOps workflows
- Strong background in observability: metrics, visualization, logging, tracing
- Understanding of automation, CI/CD pipelines, deployment automation, and release strategies
- Experience with incident management, disaster recovery, root cause analysis, and post-incident reviews
- Hybrid working: 1+ days a week in London office
- Wellbeing initiatives: coaching, fitness sessions, webinars, Wellbeing day
- Subsidised gym membership
- Private medical insurance, dental, vision, and life assurance
- 25 days holiday (increasing to 30)
- Summer Fridays (half-days in July and August)
- Employer pension contribution of 5% (if you contribute at least 3%)
- Season ticket loan
- Cycle to Work Scheme
- Annual discretionary bonus
Here at Orgvue, we promote individualism and a diverse workforce to build our future success.