Lead Site Reliability Engineer - Algo-Trading (w/m/d)

Posted 3 hours 57 minutes ago by Uniper

50 000,00 € - 65 000,00 € Annual
Permanent
Full Time
Other
Nordrhein-Westfalen, Düsseldorf, Germany, 40221
Job Description

Salary: 50.000 - 65.000 € per year

Requirements:
  • A degree in Computer Science, Mathematics, Engineering or other related discipline
  • 10+ years in SRE/Platform/Infrastructure roles
  • Hands-on experience running complex, low-latency algo-trading or market-facing systems in production
  • 3+ years of experience as a DevOps/SRE with a clear observability focus
  • 3+ years of experience as a Software Developer
  • Expert with Kubernetes (AKS preferred), including cluster lifecycle, networking (CNI, Ingress, eBPF), HPA/VPA, node autoscaling, PodDisruptionBudgets, and surge/zero-downtime upgrades
  • Deep Azure experience: VNet design, Private Link/Endpoints, peering, routing, Managed Identity/Entra ID, Key Vault, Storage, Azure Monitor/Log Analytics, Front Door/Traffic Manager, Load Balancers, App Gateway, API Management
  • Terraform (expert): modular design, state management, workspaces, policies (OPA/Sentinel), and pipeline integration
  • Containers & supply chain: Docker/OCI, image scanning/signing, SBOMs, and build reproducibility
  • Observability: Prometheus, Grafana, alerting design; OpenTelemetry tracing; log pipelines and retention strategies
  • Deltix (required): hands-on operating and tuning Deltix components (e.g., TimeBase/QuantOffice/Ember) in containerized, HA contexts
  • Strong networking (L4/L7, TLS/mTLS, DNS, BGP basics), Linux internals, and performance tuning for low-latency services
  • Proven track record of geo-redundant architectures, DR planning/testing
  • Experience with market data distribution (multicast/unicast), FIX/OUCH/ITCH, and exchange connectivity
  • Fluency in GitHub Actions or similar CI/CD and at least one programming language (e.g., Python or C#) for tooling and diagnostics
  • Excellent communication; ability to lead through influence
  • Fluent in English; German advantageous
Responsibilities:
  • As the technical lead for site reliability for our algorithmic trading and other key platforms, I own reliability, performance, and operational excellence end-to-end. I set the technical direction, drive standards, mentor engineers, and partner with quants, traders, development, and other teams to deliver geo-redundant, containerized, and compliant trading systems with near-zero downtime. My responsibilities include:
  • Defining and driving SLOs/SLIs, error budgets, and golden signals for latency-sensitive algo-trading services; leading incident response and postmortems with a blameless culture.
  • Designing and evolving geo-redundant, active-active/active-passive topologies across regions and availability zones, including failover, data replication, and disaster recovery (RTO/RPO).
  • Architecting, hardening, and operating AKS-based multi-cluster environments (multi-tenant, multi-region), including networking, security, autoscaling, node pools, and upgrade strategies.
  • Owning Terraform blueprints and Ansible automations for everything from base images to cluster add-ons, ensuring idempotent, policy-guarded, and auditable changes.
  • Building progressive delivery (blue/green, canary) pipelines with gated rollouts and automated rollback for trading microservices, adapters, market data, and execution gateways.
  • Implementing end-to-end tracing (OpenTelemetry), metrics, logs, and synthetic probes; leading capacity planning, performance tests, and p99/p999 latency optimization.
  • Enforcing runtime security, secrets management, image hygiene, and compliance controls integrated "shift-left" into build and deploy workflows.
  • Operating and optimizing Deltix-based components (Timebase DB, Ember, Strategy Server) in containerized, high-availability setups, and owning the corresponding Helm charts.
  • Mentoring SREs/DevOps/Developers, guiding design reviews, and aligning with Platform, Security, and Trading stakeholders on priorities and roadmaps.
  • Promoting a culture of innovation by staying up to date with new technologies and integrating useful advancements into the commercial area.
Technologies:
  • API
  • Ansible
  • Azure
  • C#
  • CI/CD
  • DevOps
  • Docker
  • Ember
  • GitHub
  • Grafana
  • Helm
  • Support
  • Kubernetes
  • Linux
  • Prometheus
  • Python
  • Security
  • Terraform
  • microservices
  • ASP.NET
  • NodeJS
  • Cloud
  • Architect

More:

At Uniper, we are actively transforming the world of energy while ensuring the security of energy supply. Our corporate culture is characterized by equal opportunities, mutual appreciation, and respect. We offer attractive salaries, an excellent company pension, and health-related benefits. We enable various flexible working arrangements and support with home office equipment. We also invest in training and workshops to help our employees visualize their career paths and achieve personal goals. We promote work-life balance, modern and ergonomic workplace equipment, and support for private life and work situations. We value health with benefits like flu vaccinations and preventive health services. Our commitment to diversity and equal opportunities encourages applications from qualified individuals irrespective of gender, origin, disability, age, religion, ideology, sexual identity, or marital status.

last updated 46 week of 2025