Senior Site Reliability Engineer

Posted 7 days 18 hours ago by Embarcaderomediagroup

Permanent

Full Time

Other

Lancashire, Manchester, United Kingdom, M21 0

Job Description

Senior Site Reliability & Platform Engineer

Manchester Hybrid/Flexible Working Full-Time

Drive better infrastructure and developer experience at scale

At Sorted, we're building robust, scalable systems to support modern digital services - and we're looking for a Site Reliability & Platform Engineer to help lead the way.

You'll sit at the heart of our engineering operations, bringing together SRE principles and modern platform engineering practices. This includes combining principles of SRE - such as service-level reliability, observability, incident response - with platform engineering practices like GitOps, Infrastructure as Code, DevSecOps automation, and self-service enablement, to help development teams ship faster, safer, and more cost-efficiently.

What you'll be doing:

Designing and operating highly reliable, scalable, and secure Azure-based platforms
Applying SRE principles like SLOs, observability, and incident management to drive service reliability
Building Infrastructure as Code using Terraform (v1.7+) and GitOps workflows
Enabling teams through platform tools, reusable Terraform modules, and self-service infrastructure
Enhancing CI/CD pipelines (Azure DevOps, YAML-based) with security scanning and progressive delivery
Supporting AKS clusters and Azure services (SQL, Cosmos DB, ADF, Functions, Logic Apps, etc.)
Improving monitoring and alerting with Datadog, Grafana, ELK, and proactive failure detection
Participating in the on-call rota and leading incident response workflows and blameless postmortems
Coaching engineers, upskilling teams, and contributing to a culture of continuous improvement
Driving cost awareness through FinOps practices and automated budget controls

What we're looking for:

We're seeking someone with strong platform and cloud engineering experience who can collaborate across teams and incorporate reliability thinking into all aspects of their work. Ideally, you have:

In-depth Azure knowledge (AKS, Functions, SQL, Cosmos DB, etc.)
Strong Infrastructure as Code skills with Terraform (v1.7+)
Experience with CI/CD pipelines, GitOps, and automation tools (PowerShell, Bash)
Familiarity with observability and incident tools like Datadog, ELK, and synthetic monitoring
Solid understanding of networking (TCP/IP, Load Balancing, DNS, Routing)
Good knowledge of DevSecOps practices - including security scanning, IAM, and RBAC
Experience with FinOps - tagging, budgeting, cost optimisation
Experience with Windows and Linux Operating Systems
Understanding of progressive delivery methods (canary, blue/green)
Familiarity with security scanning tools (Trivy, tfsec) integrated into pipelines
A proactive approach to problem-solving, documentation, and coaching

Additional bonus skills include experience with Azure governance tools, advanced Datadog capabilities, Kubernetes autoscaling solutions, GitOps workflows, automated cost dashboards, compliance frameworks, and internal platform development.

What You Can Expect:

Competitive salary: £70,000 - £80,000 depending on experience
25 days holiday plus bank holidays
Flexible remote/hybrid working with office collaboration as needed
Health Shield from day one
Annual £200 personal development budget
Enhanced family leave policy
Life Assurance coverage
Pension contributions via salary sacrifice
35-hour workweek (plus 1-hour unpaid lunch)

Who this role suits:

This is a great opportunity for someone passionate about building robust infrastructure and enabling others to move faster and more securely. You might come from a cloud engineering, SRE, or DevOps background - what matters most is your curiosity, systems thinking, and drive to improve operational efficiency.

At Sorted, we are committed to fostering an inclusive environment where people from all backgrounds can thrive. If you need any accommodations during the interview process, please let us know-we're happy to assist.