ML Infrastructure Lead
Posted 16 hours 59 minutes ago by iProov
iProov provides science-based biometric solutions that enable the world's most security-conscious organizations to streamline secure remote onboarding and authentication for digital and physical access. Our award-winning liveness technology and iSOC offer resilience against deepfakes and generative AI threats while ensuring scalable user experiences. Trusted by governments and enterprises, including the U.S. Department of Homeland Security, U.K. Home Office, GovTech Singapore, ING, and UBS.
This global trust is built on both our technology and the strength of our people. We value diversity, equality and inclusion, and aim to foster a culture where individuals of all backgrounds feel confident to bring their whole selves to work, feel included, and have their talents nurtured.
The RoleReports to: Chief Scientific Officer
Location: WeWork Waterloo - Hybrid
Comp: Negotiable (Base) + Company Performance Bonus (20%) + Share Options + iProov Benefits
We are looking for a highly capable and hands-on Senior ML Infrastructure Lead to build and scale the technical foundations that enable machine learning to operate effectively in production.
This hybrid leadership role sits across machine learning infrastructure, platform engineering and MLOps. You will be responsible for designing and evolving the systems, tooling, processes and standards that allow ML teams to train, deploy, monitor and improve models reliably, securely and at scale.
You will work at the intersection of machine learning, software engineering, data, cloud infrastructure and platform reliability, helping bridge the gap between research and production. This role is ideal for someone who can think strategically about long-term platform capability, while still being technically hands-on enough to solve complex engineering and operational challenges.
How you can make an impact- Lead the design and evolution of our ML platform, infrastructure and MLOps capability
- Build and maintain scalable, reliable and secure systems for model training, testing, deployment, monitoring and lifecycle management
- Develop the infrastructure and tooling that enable ML Engineers, Data Scientists and Researchers to work efficiently and ship models with confidence
- Design robust workflows for CI/CD, model versioning, reproducibility, experimentation, feature management and release management
- Own and improve the production environment for machine learning systems, ensuring strong standards for availability, performance, observability and resilience
- Define and implement monitoring across model and platform layers, including system health, data quality, drift, latency, throughput and cost efficiency
- Build or optimise internal self-service tooling and platform capabilities to reduce friction for teams working on ML use cases
- Partner closely with ML, Data, Software and Platform Engineering teams to productionise models and improve the end-to-end ML development lifecycle
- Support the scaling of infrastructure for both training and inference workloads, including high-throughput, real-time or compute-intensive use cases where relevant
- Drive best practice in governance, security, compliance, auditability and operational rigour across the ML lifecycle
- Improve the efficiency and cost-effectiveness of ML systems, including cloud resource usage, compute environments and deployment patterns
- Mentor engineers and act as a technical leader across ML platform and operations topics
- Help define the roadmap for ML enablement, ensuring the platform can support current needs while scaling for future growth
You will have experience working in high growth, fast paced tech-first environments. You are passionate about building and launching quality products that have a positive impact.
You're an experienced product leader with a background in security, identity (IAM), or enterprise SaaS. You combine strategic vision with operational rigour, and you're motivated by delivering usable, secure, and elegant solutions to complex technical problems.
- Proven experience in a senior MLOps, ML Platform, ML Infrastructure, Platform Engineering or Machine Learning Systems role
- Strong hands-on background in software engineering and cloud infrastructure, ideally with direct experience supporting production machine learning environments
- Experience building and operating systems that support the full ML lifecycle, from experimentation and training through to deployment and monitoring
- Strong knowledge of Python and sound engineering principles, including testing, automation and code quality
- Strong experience with cloud platforms such as GCP
- Experience with Docker, Kubernetes and modern containerised deployment patterns
- Strong experience with CI/CD pipelines, infrastructure-as-code and workflow orchestration
- Experience with tools such as Airflow or similar platform and orchestration technologies
- Good understanding of model observability, data quality, feature pipelines, lineage and reproducibility
- Experience designing scalable infrastructure for ML workloads, including training, batch inference and real-time serving
- Strong appreciation of reliability, security, governance and operational excellence in customer-facing or production-critical systems
- Ability to operate across both strategic and hands-on technical work
- Strong communication skills and the ability to work effectively across engineering, product and data teams
- Experience supporting computer vision, deep learning, LLM or other compute-intensive ML workloads
- Experience with GPU infrastructure, distributed training or high-performance compute environments
- Familiarity with feature stores, model registries and automated retraining pipelines
- Experience building internal developer platforms or self-service ML tooling
- Experience in regulated, high-security or high-availability environments
- Experience leading or mentoring engineers in a scale-up or high-growth technology business
- Familiarity with responsible AI, model governance or risk controls in production ML setting
Our Culture & Recruitment Process
At iProov, we value psychological safety, diversity and inclusion. We are an equal opportunities employer and encourage applications from people of all backgrounds. Our recruitment process focuses on qualifications, competence and suitability for the role. If you need an adjustment for a disability or any other reason during the hiring process, please send a request to