Data Engineer- AI SaaS

Posted 1 day 10 hours ago by Vortexa Ltd

Permanent
Not Specified
Other
London, United Kingdom
Job Description

Vortexa is a fast-growing international technology business founded to solve the immense information gap that exists in the energy industry. By using massive amounts of new satellite data and pioneering work in artificial intelligence, Vortexa creates an unprecedented view on the global seaborne energy flows in real-time, bringing transparency and efficiency to the energy markets and society as a whole.

The Role:

Processing thousands of rich data points per second from many and vastly different external sources, moving terabytes of data while processing it in real-time, running complex prediction and forecasting AI models, and coupling their output into a hybrid human-machine data refinement process, all presented through a nimble low-latency SaaS solution used by customers around the globe is a significant challenge of science and engineering. This processing requires models that can withstand scrutiny from industry experts, data analysts, and traders, with performance, stability, latency, and agility necessary for a fast-moving startup influencing multi-$m transactions.

The Data Production Team is responsible for all of Vortexa's data, ranging from mixing raw satellite data from 600,000 vessels with rich but incomplete text data, to generating high-value forecasts such as vessel destination, cargo onboard, ship-to-ship transfer detection, dark vessels, congestion, and future prices.

The team has built procedural, statistical, and machine learning models that provide the most accurate and comprehensive view of energy flows. We pride ourselves on applying cutting-edge research to real-world problems in a robust, maintainable way. Our data quality is continuously benchmarked and assessed by experienced in-house market and data analysts to ensure the accuracy of our predictions.

You will be instrumental in designing and building infrastructure and applications to support the deployment and benchmarking of existing and new pipelines and ML models. Working with software and data engineers, data scientists, and market analysts, you will help bridge the gap between scientific experiments and commercial products, ensuring 100% uptime and fault-tolerance of all data pipeline components.

Requirements

You Are:

  • Experienced in building and deploying distributed scalable backend data processing pipelines that handle terabytes of data daily using AWS, Kubernetes, and Airflow.
  • With solid software engineering fundamentals, fluent in Java and Python (Rust is a plus).
  • Knowledgeable about data lake systems like Athena, and big data storage formats such as Parquet, HDF5, ORC, focusing on data ingestion.
  • Driven by working in an intellectually engaging environment with top industry minds, where constructive debates are encouraged.
  • Excited about working in a start-up environment: not afraid of challenges, eager to bring new ideas to production, with a positive, proactive attitude.
  • Passionate about coaching developers, helping them improve their skills and grow their careers.
  • Deep experience with the full software development lifecycle (SDLC), including design, coding, review, source control, testing, deployment, and operations.

Nice to Have:

  • Experience with Apache Kafka and streaming frameworks like Flink.
  • Familiarity with observability principles such as logging, monitoring, and tracing.
  • Experience with web scraping and information extraction technologies.