Site Reliability Engineer
Posted 21 hours 19 minutes ago by Wickes
Permanent
Not Specified
Other
Hertfordshire, Watford, United Kingdom, WD171
Job Description
We are looking for a proactive and detail-oriented Site Reliability Engineer (SRE) to join our Platform Engineering & Technology team. Why not join us and shape the future of platform engineering. Wickes is a digitally-led, service-enabled home improvement retailer, focused on helping the nation feel house-proud. We are 'big on people' and committed to a culture where everyone feels proud to be part of the Wickes family. Join our agile, cross-functional teams in a supportive, collaborative environment as we become a 'seriously digital' business, leveraging modern, event-driven architectures (MACH) to deliver best-in-class customer and colleague experiences. The Role: As a Site Reliability Engineer, you will be a guardian of our production systems, ensuring the reliability, scalability, and performance of our 'seriously digital' platform. You'll blend software and systems engineering to build and run large-scale, fault-tolerant systems, crucial for our customer-facing and internal services. You'll embrace our "platform as a product" mindset and deliver reliability as a key feature. Key Responsibilities: 
- You'll define and manage Service Level Objectives (SLOs) and Service Level Indicators (SLIs), driving initiatives to enhance reliability, performance, and scalability.
- You will design, implement, and manage observability solutions, including monitoring, logging, and tracing, with strong expertise in Datadog for proactive dashboards and alerts.
- Automate manual operational tasks to reduce toil and improve system resilience.
- Collaboration is key both with our Platform Engineers, to ensure we manage and improve cloud infrastructure using Infrastructure as Code (IaC) tools like Terraform, Ansible, and Packer.
- You'll integrate reliability and performance considerations into CI/CD pipelines.
- You'll anticipate future infrastructure needs by monitoring system performance and usage trends.
- With good judgement and a sense of urgency, you will have demonstrated commitment to high standards of ethics, regulatory compliance, customer service and business integrity.
- Proven experience in a Site Reliability Engineering, DevOps, or Production Engineering role.
- Expertise in the AWS ecosystem, with a deep understanding of its services and best practices for building resilient architectures.
- Strong experience with Infrastructure as Code (IaC), particularly with Terraform and Ansible. Experience with Packer is also required.
- Proven experience with modern observability stacks, with specific expertise in Datadog.
- Proficient in using JIRA and Confluence.
- Solid understanding of CI/CD pipelines and their role in maintaining a stable production environment.
- Competitive package including an annual bonus
- 25 Days holiday plus bank holidays
- Contributory Pension and Life Assurance
- Flexible Hybrid working (2-3 days in Watford)
- Save-as-you-earn scheme
- Colleague discount
- Discount platform including savings and cash back at numerous retailers, savings on gym membership, cycle to work scheme