Vision & Perception
From simple cameras to neuromorphic sensors that see and understand the world.
3 years ago (2023)
Standard RGB and depth cameras (Intel RealSense, Microsoft Kinect). LiDAR was expensive and bulky. SLAM algorithms required significant compute. Object recognition was slow and brittle. No unified perception model.
Now (2026)
Neuromorphic event-based sensors for microsecond reaction times. On-device neural networks for real-time semantic segmentation. Multi-modal fusion (LiDAR + stereo + thermal). 360-degree perception systems in consumer robots. World models enable task understanding beyond object detection.
Next 3 years
Gaze tracking for human-robot communication. Predictive vision (anticipating human movement). Self-supervised learning from egocentric video (KAI Halo approach). Near-human visual reasoning in unstructured environments. Universal 3D scene understanding from any angle.
Related Robots
Full-sized humanoid with 115 degrees of freedom โ nearly triple rivals โ and tactile skin with 18,000 sensors detecting forces as light as 0.1N. Powered by the KAI World Model trained on 100,000+ hours of first-person egocentric video via the KAI Halo headset. Semi-solid-state 1.7kWh battery for home safety. Founded by XPeng Robotics alumni.
Agile quadruped robot for industrial inspection, data collection, and hazardous environment exploration.
High-performance humanoid robot with 360-degree perception, capable of running, jumping, and navigating complex terrain.