
Locomotion / Motion Capture
Whole-body human motion from monocular and multi-view video. Robot joint angles, SMPL-X parameters, time-aligned 3D skeletons.
We capture the full human signal - sight, motion, touch, intent - across egocentric, wrist, body, and tactile sensors. The most exhaustive multi-modal stack feeding the most advanced VLA datasets in production.
A full-body sensor stack on every operator. Egocentric fisheye RGB, wrist-mounted cameras, 4-camera SLAM array, 9-axis IMU at 500Hz, motion-capture markers, and hand-mounted tactile sensors - all hardware-synchronized.
We run our own VLA models internally. We know what training pipelines actually need, and we shape multi-modal datasets to that signal - aligned vision, proprioception, and contact at frame-accurate timing.
The most advanced VLA-ready datasets shipping today. Structured, labelled, multi-sensor, and verified by humans. Drop straight into partner training loops.
Six-plus synchronized sensor streams on a single operator. Egocentric and wrist cameras, SLAM array, 9-axis IMU, motion-capture markers, and hand tactile sensors. Sub-millisecond timing. Every frame tagged with pose, gaze, and contact - ready to train the next generation of vision-language-action models.
Operator wears the rig and performs a real-world task.
Egocentric, wrist, SLAM, IMU, mocap, and tactile - all fused.
Human-in-the-loop labelling of intent, contact, and trajectory.
Our internal VLA loop scores and selects what actually trains.
Drop-in datasets for partner foundation-model pipelines.
Raw egocentric capture, fused with annotation and 3D world reconstruction. Rendered in Rerun to show the depth of signal we hand to partner training pipelines.
Six product lines covering the full surface of embodied learning, from human locomotion to humanoid whole-body manipulation. Standardized schema, cross- embodiment compatible, validated on partner training loops.

Whole-body human motion from monocular and multi-view video. Robot joint angles, SMPL-X parameters, time-aligned 3D skeletons.

Bi-manual demonstration data from human operators and teleop rigs. End-effector trajectories with frame-accurate sync.

Upper-limb and torso coordination for full humanoid embodiments. Captured with motion-capture markers and multi-DoF rigs.

First-person streams from our head-mounted rigs. 180° fisheye RGB + wrist cameras + SLAM, temporally aligned with action.

Structured contact, hand-object 6-DoF pose, and object state transitions. The signal models actually need to learn manipulation.

Sim and generative augmentation for long-tail and high-risk scenarios. Same schema as real capture, drop-in compatible.
Cross-embodiment, multi-modal, temporally aligned. We standardize the schema so partners can drop our data straight into the training pipelines they already run - no glue code, no format conversions, no surprises.
“The robots of today are blind. They move without understanding, act without perception, and fail the moment the world stops cooperating. We are here to change that.”