Site Reliabiity Engineer (iCloud)
Posted 21 hours 4 minutes ago by Apple Inc.
London, England, United Kingdom Software and Services
DescriptionThe services that Apple and iCloud runs are massive; Edge and Messaging comprise a set of platforms and products which are foundational for both users and other Apple Services. Operating at our scale, across multiple geographically dispersed data centers and servicing hundreds of millions of users presents unique challenges. As an you'll need to solve these problems using data, teamwork, and your own expertise. own the full infrastructure stack; from device driver performance debugging to content delivery network traffic management, our responsibilities are both broad and deep. Systems are ran both directly on Linux and in the Cloud. We run a mix of open source and internally developed tools for system & configuration management, provisioning, software deployment, and monitoring. You'll learn these tools and have opportunities to improve them. Our team is collaborative; we work closely with the development teams we support to deliver the best results for Apple. We think critically and strive to balance the best solution with the need to get things done for each engineering challenge we face. Good ideas are heard and results are rewarded.Responsibilities:Deploy, support and monitor new and existing services, platforms, and application stacks.Use scale testing to measure, tune and optimization system performance.Enhance, architect, author, and deliver software to improve the availability, scalability and security of Apple's internet services.Build and manage systems, infrastructure and applications through automation.Participate in periodic on-call duties.
Minimum Qualifications- Experience in managing and scaling distributed systems in a public, private, or hybrid cloud environment
- Experience with deploying, supporting and monitoring new and existing services, platforms, and application stacks
- Excellent troubleshooting and problem solving skills
- Passion for eliminating repetitive manual processes using automation and to improve them through repeated iteration
- Experience with scale testing, disaster recovery, and capacity planning
- Proclivity towards efficient programming emphasizing improvement via complexity analysis.
- Understanding of standard networking protocols and components such as: HTTP, DNS, ECMP, TCP/IP, ICMP, the OSI Model, Subnetting and Load Balancing strategies.
- BS in Computer Science or related field, or equivalent employment