Machine Learning Researcher
Posted 11 days 19 hours ago by BRAHMA
We are seeking a Machine Learning Researcher to join our team and help advance the state of the art in human-centric generative video models. Your work will focus on improving expression control, lip synchronisation, and overall realism in models such as WAN and Hunyuan. You'll collaborate with a world-class team of researchers and engineers to build systems that can generate lifelike talking-head videos from text, audio, or motion signals-pushing the boundaries of neural rendering and avatar animation.
Key Responsibilities- Research and develop cutting-edge generative video models, with a focus on controllable facial expression, head motion, and audio-driven lip synchronisation.
- Fine-tune and extend video diffusion models such as WAN and Hunyuan for better visual realism and audio-visual alignment.
- Design robust training pipelines and large-scale video/audio datasets tailored for talking-head synthesis.
- Explore techniques for controllable expression editing, multi-view consistency, and high-fidelity lip sync from speech or text prompts.
- Work closely with product and creative teams to ensure models meet quality and production constraints.
- Stay current with the latest research in video generation, speech-driven animation, and 3D-aware neural rendering.
- Strong background in machine learning and deep learning, especially in generative models for video, vision, or speech.
- Hands-on experience with video synthesis tasks such as face reenactment, lip sync, audio-to-video generation, or avatar animation.
- Proficient in Python and PyTorch; familiar with libraries like MMPose, MediaPipe, DLIB, or image/video generation frameworks.
- Experience training large models and working with high-resolution audio/video datasets.
- Deep understanding of architectures such as transformers, diffusion models, GANs and motion representation techniques.
- Proven ability to work independently and drive research from idea to implementation.
- Strong problem-solving skills, ability to work autonomously in a remote-first environment.
- PhD in Computer Vision, Machine Learning, or a related field, with publications in top-tier conferences (CVPR, ICCV, ICLR, NeurIPS, etc.).
- Familiarity with or contributions to open-source projects in lip sync, video generation, or 3D face modelling.
- Experience with real-time inference, model optimisation, or deployment for production applications.
- Knowledge of adjacent areas like emotion modelling, multimodal learning, or audio-driven animation.
- Experience working with or adapting models like WAN, Hunyuan or similar.
- Be part of a global media-tech leader shaping the future of content operations.
- Work on cutting-edge cloud and AI technologies in a high-impact role.
- Collaborate with world-class teams across engineering, media, and AI.
- Competitive compensation, benefits, and career advancement opportunities.
BRAHMA AI is the next generation of enterprise media technology formed through the integration of Prime Focus Technologies and Metaphysic. By combining CLEAR , CLEAR AI, Atman, and Vaani into one ecosystem, BRAHMA AI enables enterprises to manage, create, and distribute content with intelligence, security, and efficiency.
Proven, scalable, and enterprise-tested, BRAHMA AI is helping global organisations accelerate growth, efficiency, and creative impact in the AI-powered era.