Senior Platform Engineer

Posted 2 hours 2 minutes ago by Deepstreamtech

Permanent
Full Time
Other
Dublin, Dublin, Ireland
Job Description
Requirements
  • 5+ years of hands on experience in platform, cloud, or infrastructure engineering roles
  • Deep expertise in Azure cloud services and architecture
  • Strong production experience with Kubernetes (deployment, operators, Helm, networking, RBAC, autoscaling)
  • Proficiency with Terraform (writing modules, remote state, testing, and large scale usage)
  • Solid Linux administration and troubleshooting skills (kernel tuning, systemd, networking, security)
  • Hands on experience building observability stacks with Prometheus and Grafana (or equivalent)
  • Strong scripting skills (Bash/Python) and GitOps mindset
  • Excellent problem solving, communication, and collaboration skills
  • Additional skills that could set you apart:
  • (Desirable) Exposure to AWS (any services - EC2, EKS, IAM, etc.)
  • (Desirable) Experience with additional observability tools (Loki, Tempo, OpenTelemetry, ELK)
  • (Desirable) Familiarity with GitOps (ArgoCD, Flux), service mesh (Istio/Linkerd), or policy as code (OPA/Gatekeeper)
  • (Desirable) Prior mentoring or technical leadership experience
What the job involves
  • We're looking for a Senior Platform Engineer to strengthen our cloud infrastructure and developer platform capabilities. You'll own the design, build, and operation of scalable, observable, and reliable platforms that power our applications
  • This is a hands on senior role with significant ownership over Azure based cloud infrastructure, Kubernetes orchestration, infrastructure as code, and full stack observability
  • Design, provision, and maintain cloud infrastructure primarily on Microsoft Azure (AKS, ACR, networking, storage, policies, etc.)
  • Build and operate production grade Kubernetes clusters, including day 2 operations, scaling, security, and cost optimization
  • Develop and evolve Infrastructure as Code (IaC) pipelines using Terraform (modules, state management, multi environment strategies)
  • Implement and continuously improve observability across the platform using Prometheus (metrics, alerting, recording rules) and Grafana (dashboards, alerting, Loki integration where applicable)
  • Manage and harden Linux environments (Ubuntu/RHEL) - OS tuning, security hardening, logging, and troubleshooting at scale
  • Collaborate with development and SRE teams to deliver self service platforms, golden paths, and internal developer experience improvements
  • Participate in on call rotation, incident response, and post incident reviews to drive reliability and performance
  • Mentor junior engineers and contribute to platform standards, best practices, and documentation