Senior Site Reliability Engineer
Posted 3 days 6 hours ago by Caspian One Ltd
Senior Systems Reliability Engineer (SRE)
Employment Type: Full-time, Permanent
Location: Ireland (Remote)
Salary: 125,000-175,000 Euros
About the company:
A global fintech organisation operating mission-critical trading technology is expanding its engineering presence in Ireland. The team builds and supports high-performance, low-latency systems used across financial markets, with a strong engineering culture and focus on reliability, fairness and technical excellence.
They are now expanding their engineering presence in Ireland and hiring their next Senior Systems Reliability Engineer to support our mission-critical trading systems.
The Role
As a Senior SRE, you'll join a highly technical, engineering-driven environment responsible for the reliability, performance, and operational excellence of a large-scale, bare-metal trading platform. This is a hybrid role combining systems engineering, observability, automation, and Real Time operational support.
You'll work across the full stack - (Linux, networking, applications, hardware) and play a key role in building a follow-the-Sun support model with teams in the U.S. and Europe.
What You'll Do
- Own the technical operations of trading systems running on bare-metal infrastructure
- Monitor, troubleshoot, and resolve issues across OS, network, hardware, and application layers
- Build and improve automation, tooling, and configuration management (Ansible or similar)
- Develop and maintain observability dashboards, alerts, and telemetry pipelines
- Participate in deployments, start-up/shutdown procedures, and change management
- Contribute to engineering projects such as OS tuning, Kernel-level optimisation, and performance improvements
- Collaborate with platform, development, and market operations teams
- Participate in on-call rotation (1 week on; occasional Saturday for industry-wide testing)
- Document processes, mentor teammates, and promote operational best practices
What You Will Bring
Must-Have Technical Skills
- Strong Linux experience (comfortable with system processes, logs, services, troubleshooting)
- Hands-on Scripting with Python or Bash
- Experience with Ansible or similar configuration management tools
- Solid understanding of networking fundamentals: TCP/IP, routing, multicast
- Experience supporting large, distributed, or high-availability systems
Must have technical skills in observability; Prometheus, Grafana, Splunk, Graylog, Telemetry, alerting systems (eg Alertmanager), log pipelines
Nice-to-Have
- Experience with Bare-metal deployments
- Kernel tuning/Kernel bypass techniques
- KDB experience
- Familiarity with Arista/Cisco Switches, Corvil, Solarflare/Mellanox NICs
- Understanding of trading systems
Who You Are
This matters as much as the tech.
You are someone who:
- Works well independently and in distributed teams
- Communicates clearly and calmly
- Is collaborative, low-ego, and easy to work with
- Can follow processes while still thinking critically
- Learns quickly and enjoys understanding complex systems
- Thrives in a high-trust, engineering-focused culture
Why Join?
- Work on mission-critical, low-latency trading systems
- Highly technical environment with deep engineering challenges
- Exposure to Kernel tuning, networking, automation, and performance optimisation
- Flexible working arrangements with opportunity for travel