Leave us your email address and we'll send you all the new jobs according to your preferences.

UX Engineer Intern (On-Device Computer Vision)

Posted 2 hours 46 minutes ago by Captur Limited

Permanent

Full Time

Apprenticeships & Internships Jobs

London, United Kingdom

Job Description

About Captur

Captur helps software understand real world scenes in real time with an SDK for flexible, on-demand visual recognition. We're a small, rapidly scaling team backed by top-tier investors; we recently closed a $6M seed round to accelerate product and go to market growth. We are global leaders in edge ML and have validated M images on device for enterprise customers such as Lime. Next, expanding as a horizontal platform across use cases that require real time speed, high volume and coverage across a wide range of mobile devices.

About the role

The role focuses on the camera flow inside our clients' mobile apps. We don't do single photo verification - our models run over the live camera stream, every frame, in real time, on the user's phone. Whatever the user sees, and feels while they're framing the shot is the focus of the problem you'll be working on.

The interesting problem: model output is probabilistic and noisy, the scene is different every time, and the user is operating in the real world - on foot, in the rain, gloves on, one hand free. Visual UI alone usually isn't enough. Haptics, symbolic cues (think game HUD), and visual feedback all need to play together, and at any given moment you have to decide what to surface, on which channel, in what order.

What you'll do

Real time feedback design. A scooter rider photographs a finished trip in the rain. The camera streams at 30fps; the on device model gives a confidence shaped output per frame. The job is to turn that stream into something the rider can act on inside their visual loop - usually under 300 ms.
Parallel feedback channels. Visual, haptic, symbolic. Each has different bandwidth, different attentional cost, different latency. Mapping the right signal to the right channel - and prioritising across them when several things are wrong at once - is the headline challenge.
Generalising across scenes. A courier at a doorstep, a rider at trip end - same SDK, different scenes, different priorities. We want a feedback system that generalises rather than one hand tuned per client.
End to end ownership. You'd scope, prototype, ship, and measure one piece of work over the internship. A 1:1 mentor helps you scope it in week one and reviews throughout.

Technical Challenges Real time stream verification

The camera runs at 30fps. Each frame is an input. The on device model emits probabilistic output per frame. The feedback layer aggregates that stream and decides what to surface, when (real time systems, signal processing).
Sub 300 ms end to end budget. Anything you display, or vibrate has to land inside the user's visual loop, not after they've moved on (HCI, perception of latency).
Multi frame smoothing - confidence over the last N frames, thresholds for triggering feedback, asymmetric thresholds (different for "shot is good" vs "shot is bad") (signal processing, applied statistics).

Parallel feedback channels and priority ordering

Visual UI is one channel. Haptics, and symbolic / game HUD cues are others. Each has different bandwidth, different attentional cost, different latency to perception (multi modal interfaces, HCI).
Mapping signal to channel: haptics are good for "now, wrong" pulses; visual for fine grained framing guidance. Why? When does that mapping break (interaction design, sensory psychology adjacent).
Priority ordering: if three things are wrong with the shot at once, which do you tell them first? Why? Does the answer change once they've corrected one (information design, game UI / HUD design).

Context dependence across scenes

Same SDK, different scenes: courier at a doorstep, rider at trip end, driver photographing a damaged parcel. Different lighting, different priorities, different "what does a good shot look like" (HCI, generalisation).
The model output is one signal. Time of day, IMU motion, ambient light, the user's prior attempts in this session are others. How do you combine them into a single coherent feedback layer (sensor fusion, applied ML).
How do you build a system that generalises rather than hand tuning per client (software architecture, declarative configuration).

Qualifications Required

Some interaction design, motion design, or HCI work - coursework, a side project, or self directed study. Send us one example in your application.

Useful

Coursework or projects in computer vision, ML, or signal processing.
Front end (TypeScript / React) - useful for our internal debug tools and visualisations.
Prototyping tools (Figma, Origami, ProtoPie, or hand rolled HTML / SwiftUI).
Game UI design, aviation HUD design, accessibility / multi modal interface work, or anything else where you've thought about feedback beyond visual UI.
User research methods - think alouds, contextual inquiry, watching your flatmate use your project.

Role Compensation and Working Details

8 12 week internship, starting on 22nd or 29th June depending on applicant availability.
Base salary of £35,000 per annum, pro rata'd for the duration of the internship.
Taxable housing stipend of £125 per week for interns whose permanent address is outside London and who need to pay for accommodation during the internship.
Based 3 days a week from our Liverpool Street Office, with work from home on the remaining days.
25 days' holiday plus public holidays, pro rata'd for the duration of the internship.
Dedicated company Macbook Pro for use during the internship.
Dedicated company Apple or Android device for UX testing during the internship.