Leave us your email address and we'll send you all the new jobs according to your preferences.

Research Engineer

Posted 4 hours 57 minutes ago by Storm3

Permanent
Not Specified
Academic Jobs
Paris, France
Job Description

LLM Engineer w/ particular focus on speech processing and integration

Hybrid in Paris

Competitive base


The Mission

To ttackle the fundamental challenges of world modeling and establish a new paradigm for next-generation machine reasoning. They are looking for passionate individuals who share our vision and are eager to push the boundaries of AI together.


Key Responsibilities: Data Infrastructure & Pipelines

  • Design, implement, and maintain scalable video data pipelines to support large-scale training.
  • Develop data preprocessing, transformation, and synthesis workflows to support world model training.
  • Contribute to building high-quality data annotation pipelines to ensure accurate and consistent labels across large-scale datasets.


Key Responsibilities: Training & Inference Systems

  • Support the training of multimodal foundation models (e.g., video diffusion models, world models) by developing and optimizing distributed training systems.
  • Improve inference and serving efficiency for real-time interaction through model optimization and system tuning.
  • Monitor system health and performance, and contribute to debugging and optimization at scale.


Key Responsibilities: Collaboration & Integration

  • Work closely with research teams to understand experimental goals and translate ideas into reliable and maintainable infrastructure and tools.
  • Integrate novel research prototypes into production-ready systems and ensure reproducibility at scale.
  • Participate in design and code reviews, ensuring code quality, efficiency, and compliance with best practices.


Key Responsibilities: Benchmarking & Evaluation

  • Contribute to the development of tools and infrastructure to evaluate model performance using rigorous quantitative benchmarks, including metrics for physical accuracy and controllability.


Key Responsibilities: Codebase & Documentation

  • Maintain and extend shared codebases, contribute to internal documentation, and support onboarding of new team members or collaborators.
  • Write clean, efficient, and well-tested code for components across the model development lifecycle.


Key Responsibilities

  • Support contributions to research papers and demos when engineering work plays a significant role.
  • Help represent the team's engineering excellence in internal and external forums when appropriate.


Academic Qualifications

  • MSc or PhD in Machine Learning or Computer Science, or equivalent industry experience.


Professional Experience Required

  • Proficient in data collection, cleaning, and transformation at scale, including designing robust pipelines for multimodal datasets (e.g., video, audio, text).
  • Practical experience with web scraping and crawling frameworks (e.g., scrapy, selenium, playwright, BeautifulSoup) to collect and curate high-quality web-scale datasets.
  • Experience in large-scale model training (LLMs or Diffusion Models) on large clusters.
  • Hands-on experience with state-of-the-art video generative models (e.g., Sora, Veo2, MovieGen, CogVideoX, etc.).
  • Experiences in building and optimizing large-scale video data pipelines.
  • Experience in accelerating diffusion model inference for improved efficiency.
  • Exceptional problem-solving and troubleshooting skills to tackle complex technical challenges.
  • Strong systems and engineering expertise in deep learning frameworks such as PyTorch.
  • Strong communication and collaboration skills for effective cross-functional teamwork.
  • Demonstrated ability to solve complex system-level challenges and debug failures across the training/inference stack (e.g., memory issues, deadlocks, I/O bottlenecks).

Email this Job