Senior Platform Engineer

Posted 2 hours 2 minutes ago by Deepstreamtech

Permanent

Full Time

Other

Dublin, Dublin, Ireland

Job Description

Requirements

5+ years of hands on experience in platform, cloud, or infrastructure engineering roles
Deep expertise in Azure cloud services and architecture
Strong production experience with Kubernetes (deployment, operators, Helm, networking, RBAC, autoscaling)
Proficiency with Terraform (writing modules, remote state, testing, and large scale usage)
Solid Linux administration and troubleshooting skills (kernel tuning, systemd, networking, security)
Hands on experience building observability stacks with Prometheus and Grafana (or equivalent)
Strong scripting skills (Bash/Python) and GitOps mindset
Excellent problem solving, communication, and collaboration skills
Additional skills that could set you apart:
(Desirable) Exposure to AWS (any services - EC2, EKS, IAM, etc.)
(Desirable) Experience with additional observability tools (Loki, Tempo, OpenTelemetry, ELK)
(Desirable) Familiarity with GitOps (ArgoCD, Flux), service mesh (Istio/Linkerd), or policy as code (OPA/Gatekeeper)
(Desirable) Prior mentoring or technical leadership experience

What the job involves

We're looking for a Senior Platform Engineer to strengthen our cloud infrastructure and developer platform capabilities. You'll own the design, build, and operation of scalable, observable, and reliable platforms that power our applications
This is a hands on senior role with significant ownership over Azure based cloud infrastructure, Kubernetes orchestration, infrastructure as code, and full stack observability
Design, provision, and maintain cloud infrastructure primarily on Microsoft Azure (AKS, ACR, networking, storage, policies, etc.)
Build and operate production grade Kubernetes clusters, including day 2 operations, scaling, security, and cost optimization
Develop and evolve Infrastructure as Code (IaC) pipelines using Terraform (modules, state management, multi environment strategies)
Implement and continuously improve observability across the platform using Prometheus (metrics, alerting, recording rules) and Grafana (dashboards, alerting, Loki integration where applicable)
Manage and harden Linux environments (Ubuntu/RHEL) - OS tuning, security hardening, logging, and troubleshooting at scale
Collaborate with development and SRE teams to deliver self service platforms, golden paths, and internal developer experience improvements
Participate in on call rotation, incident response, and post incident reviews to drive reliability and performance
Mentor junior engineers and contribute to platform standards, best practices, and documentation