Leave us your email address and we'll send you all the new jobs according to your preferences.

Site Reliability Engineer, Dublin

Posted 10 hours 8 minutes ago by Omaze

Permanent

Full Time

Other

Dublin, Dublin, Ireland

Job Description

Summary

People at Apple don't just build products - they craft the kind of experience that has revolutionised entire industries. The diverse collection of our people and their ideas inspire innovation in everything we do. Imagine what you could do here! Join Apple, and help us leave the world better than we found it.

Description

Apple Service Engineering (ASE)'s Compute team is seeking highly motivated software engineer with strong technical and communication skills to join our SRE team on our quest to build and enhance massive clusters hosting Virtual Machines, Containers and associated infrastructure that can scale to meet the demands of Apple's Services offerings. You will work with world class engineers on core components of Virtualization and Containerization technologies, customise it to help fit Apple's diverse needs, and engage with the upstream community to drive Apple's requirements. Ultimately, you will help build the platform that delivers our applications at scale to our end users. As a Compute Site Reliability Engineer, you will be part of the team responsible for providing the platform for mission critical cloud systems to maintain constant uptime, scale seamlessly, and allow for new applications and services to flourish.

Responsibilities

Design and develop tooling, frameworks, and automation in Go and Java to improve reliability, scalability, and operational efficiency of compute infrastructure (VMs, containers, orchestration).
Define and implement SLOs/SLIs for compute services and build the observability pipelines (metrics, logging, tracing) to measure and enforce them.
Lead incident response for compute infrastructure, driving triage, root cause analysis, and postmortem corrective actions.
Develop and maintain infrastructure as code and CI/CD pipelines, ensuring reproducibility, automated testing, and staged rollouts across the fleet.
Contribute to compute platform architecture through design reviews, technical design documents, production readiness reviews, capacity planning, and disaster recovery exercises.
Partner cross functionally with engineering, QA, and program management to embed reliability into the development lifecycle, upholding best practices in code review, testing, and documentation.

Minimum Qualifications

Must be an expert and have in depth professional experience with cloud operations, with a focus on "infrastructure as a service" (compute, storage, and network virtualization).
Strong software development skills in Go and Java, with experience building production services, tools or automation frameworks.
Experience with software development lifecycle practices including version control, code review, CI/CD, and automated testing.
Experience operating and engineering large scale multi tenant infrastructure as a managed service.
Ability to articulate complex technical concepts to both technical and non technical stakeholders.

Preferred Qualifications

Experience with infrastructure as a service orchestration tools (OpenStack, CloudStack, etc) is a plus.
Experience with Linux system virtualization (Libvirt, QEMU, KVM, etc), along with the APIs.
Ability to implement and coordinate telemetry using monitoring and observability tools such as Splunk, Grafana, and Prometheus.
Experience building internal platforms or developer tooling and familiarity with distributed systems concepts.

Email this Job

Apply Now

ShortList

Recommend to a friend