In the perception system of humanoid robots, visual sensors undoubtedly play the crucial role of “eyes”. Acting as a keen observer, they can capture image information in the environment, enabling robots to recognize objects, determine distances, track targets, and even perform face recognition and emotional understanding. For instance, in home service scenarios, robots equipped with visual sensors can easily identify furniture and electrical appliances, thereby better accomplishing tasks such as cleaning and organizing. On industrial production lines, visual sensors can accurately detect the shape, size, and defects of products, ensuring production quality.
However, relying solely on visual sensors—this pair of “eyes”—robots still face numerous limitations when “seeing” the world. In complex environments, the performance of visual sensors is severely challenged. When in dimly lit environments, just as humans struggle to see clearly in the dark, the images captured by visual sensors become blurred. This prevents robots from accurately recognizing objects and scenes, potentially leading to problems like collisions and operational errors. In special environments such as direct strong light, rainy and foggy weather, and dusty conditions, the performance of visual sensors also deteriorates significantly, often resulting in blind spots or inaccurate recognition. Additionally, visual sensors struggle to accurately perceive and recognize transparent objects and those with highly reflective surfaces.
Much like how humans cannot fully understand the surrounding world with just their eyes, robots cannot perceive the environment accurately and comprehensively relying solely on visual sensors. This gives rise to the multi-sensor fusion solution, which is like building a more acute and comprehensive perception system for robots. It allows robots to comprehensively leverage the advantages of various sensors, make up for the shortcomings of a single visual sensor, and thus perform various tasks more excellently.
Multi-Sensor Fusion: Giving a “Boost” to Robot Vision

To address the limitations of a single visual sensor, multi-sensor fusion solutions have emerged. In a multi-sensor fusion system, different types of sensors perform their respective roles: depth cameras, cameras, LiDAR, IMU, and millimeter-wave radar collaborate with each other, complementing strengths to jointly provide comprehensive and accurate environmental information for robots. This enables robots to complete various tasks with higher precision and intelligence. Below is an in-depth look at the role of these sensors in humanoid robots.
I Depth Camera + Camera: The Perfect Partnership
The combination of a depth camera and a camera is known as the “perfect partnership” for humanoid robots’ visual perception. A depth camera acts as a “spatial perception expert”—it can directly acquire 3D spatial information of objects in the surrounding environment through specific technologies such as structured light and Time of Flight (ToF). This allows the robot to perceive the distance, depth, and spatial position of objects, and construct a stereoscopic environmental model. In contrast, cameras focus on image capture: leveraging rich color and texture information, they provide clear visual frames for robots, enabling target recognition, object classification, and scene understanding.
In practical applications, this “perfect partnership” works in great synergy. When a humanoid robot performs navigation tasks in an indoor environment, the depth camera can real-time perceive the distance and position of obstacles ahead, planning a safe walking path for the robot. At the same time, the camera recognizes the surrounding environment to help the robot confirm its location and direction. During object-grasping tasks, the depth camera accurately measures the 3D coordinates of the target object, while the camera identifies the object’s shape and features. Their collaborative work allows the robot to grasp the target object precisely and complete the task.
II LiDAR: The Distance Perception Expert
LiDAR (Light Detection and Ranging) is a sensor that uses laser beams to detect the surrounding environment. Known as the “distance perception expert,” it can accurately measure the distance between the robot and surrounding objects, and generate high-precision 3D environmental maps. LiDAR’s working principle is straightforward: it emits laser beams, receives the reflected laser signals, and calculates the distance to the target object based on the laser’s flight time. Due to the laser’s strong directionality and concentrated energy, LiDAR enables long-distance, high-precision detection, providing reliable environmental perception information for robots in complex environments.
In specific scenarios, LiDAR plays a crucial role in environmental perception and navigation for humanoid robots. In outdoor complex terrains, LiDAR helps robots quickly identify terrain undulations and obstacle positions, planning reasonable walking routes to prevent robots from getting stuck. In industrial manufacturing, LiDAR can be used for robots to accurately locate and grasp objects on production lines, improving production efficiency and quality. However, LiDAR also has limitations, such as large size, high power consumption, and relatively high cost—these factors restrict its widespread application in humanoid robots to a certain extent. With continuous technological advancements, LiDAR is developing toward miniaturization, low power consumption, and low cost, and is expected to play a more significant role in the humanoid robot field in the future.
III IMU: The Unsung Hero of Stable Movement
IMU (Inertial Measurement Unit), mainly composed of accelerometers, gyroscopes, and magnetometers, is known as the “unsung hero” behind humanoid robots’ stable movement. The accelerometer measures the robot’s acceleration in three coordinate axes, allowing the robot to perceive its own acceleration, deceleration, and gravity changes; the gyroscope measures the robot’s angular velocity, helping it understand its rotation state and direction changes; the magnetometer provides information about the Earth’s magnetic field, assisting the robot in determining its orientation. Through the collaborative work of these sensors, the IMU can real-time measure the robot’s motion posture data (including acceleration, angular velocity, and direction), providing key information for the robot’s motion control.
During the movement of a humanoid robot, the IMU plays an irreplaceable role. When the robot walks, the IMU real-time monitors changes in the robot’s body posture. If it detects a tendency for the robot to lose balance, it immediately feeds data back to the control system. The control system then quickly adjusts the robot’s joint angles and motor output force to help the robot regain balance and maintain a stable walking state. When the robot performs complex movements (e.g., jumping, turning), the IMU can accurately perceive changes in the robot’s motion state, ensuring the robot completes the movement according to the predetermined trajectory and posture, and avoiding movement errors. Additionally, in scenarios requiring precise navigation—when visual sensors are blocked or environmental lighting is poor—the IMU can combine data from other sensors to provide short-term inertial navigation for the robot, ensuring its movement is not disrupted.
IV Millimeter-Wave Radar: A Powerful Perception Tool for Complex Environments
Millimeter-wave radar is a radar sensor operating in the millimeter-wave frequency band, and is regarded as a “powerful perception tool” for humanoid robots in complex environments. It uses electromagnetic waves in the millimeter-wave band to detect target objects, and has many unique advantages: it can work normally in harsh weather conditions (e.g., heavy rain, heavy fog, sandstorms) and complex lighting environments (e.g., dim light, direct strong light) without being disturbed by environmental factors, stably acquiring information such as the distance, speed, and angle of surrounding objects. Moreover, millimeter-wave radar has high detection accuracy and fast response speed, enabling real-time monitoring of changes in the motion state of target objects.
In the navigation and environmental perception of humanoid robots, the integrated application of millimeter-wave radar with other sensors plays an important role. When a humanoid robot walks in an outdoor environment, millimeter-wave radar can work in collaboration with LiDAR and visual sensors to jointly complete environmental perception and modeling. In harsh weather, the performance of visual sensors and LiDAR may be severely affected—at this time, millimeter-wave radar can leverage its advantages to provide reliable environmental information for the robot, ensuring it safely avoids obstacles and continues to perform tasks. In indoor environments, millimeter-wave radar can also be used to detect the presence and movement of humans, and cooperate with other sensors of the humanoid robot to achieve more intelligent human-robot interaction.
Fusion Strategies
A multi-sensor fusion solution is not just a simple combination of multiple sensors; it requires scientific and reasonable fusion strategies to organically integrate data from different sensors and maximize their advantages. Currently, common multi-sensor fusion strategies mainly include data-level fusion, feature-level fusion, and decision-level fusion.
I Data-Level Fusion: Early-Stage Information Integration
Data-level fusion integrates information at the raw data layer, similar to mixing and processing ingredients immediately after purchasing them. In a robot’s visual system, when visual sensors and LiDAR work simultaneously, data-level fusion directly processes and fuses the image pixel data from visual sensors and the point cloud data from LiDAR. This fusion method fully utilizes raw data, retains detailed information, improves information utilization, and enables the robot to perceive the environment more comprehensively and accurately.
However, data-level fusion has high requirements for data synchronization and sensor consistency. If data deviations or inconsistencies occur, the fusion effect will be affected. Additionally, since it directly processes large amounts of raw data, it requires high computing power and has relatively slow processing speed. This is analogous to a chef handling multiple ingredients at once—they need superb cooking skills and strong control abilities; otherwise, they may become flustered and compromise the quality of the dish.
II Feature-Level Fusion: Extracting Essence Before Integration
Feature-level fusion first extracts features from each sensor’s data, then fuses these features—much like processing different ingredients into semi-finished products before combining them into a dish. In robot vision, features such as the shape, color, and texture of objects are first extracted from visual images, while features such as the distance and position of objects are extracted from LiDAR data; these features are then fused. This fusion method reduces data volume, improves processing efficiency, and has relatively low dependence on sensors, offering high flexibility. Different types of sensors can be fused as long as they can extract valid features.
Nevertheless, feature-level fusion has high requirements for feature extraction algorithms. If feature extraction is inaccurate or incomplete, the fusion effect will be affected. For example, if a chef fails to master proper cooking techniques when processing semi-finished products, resulting in poor taste or quality of the semi-finished ingredients, the final dish made by combining them will also be compromised.
III Decision-Level Fusion: Independent Decision-Making Followed by Summarization
Decision-level fusion involves each sensor independently processing data and making decisions, with the final decision results fused together. This is like multiple chefs each making a dish based on their own experience and judgment; everyone then tastes the dishes and comprehensively evaluates the final result. In a robot’s visual system, a visual sensor may determine that an object ahead is an obstacle based on image data, while LiDAR may also judge the same object as an obstacle based on its own data. By fusing these two decision results, the robot can confirm the presence of an obstacle ahead and take corresponding actions. This fusion method offers high flexibility and fault tolerance—even if one sensor malfunctions or makes a wrong decision, the decision results of other sensors can still provide references for the robot, ensuring its normal operation.
However, decision-level fusion may lose some information from the raw data, as it primarily relies on the decision results of each sensor rather than the raw data itself. Moreover, since each sensor makes independent decisions, decision conflicts may occur, requiring a reasonable conflict resolution mechanism for coordination. Just as dishes made by different chefs may have distinct styles, a clear set of standards and methods is needed to weigh and select during comprehensive evaluation to reach the most reasonable conclusion.
Practical Application Scenarios
The application of multi-sensor fusion solutions has opened up broad prospects for the development of humanoid robots in various fields. Below is a look at the outstanding performance of humanoid robots in scenarios such as industrial manufacturing, medical care, and home services.
I Industrial Manufacturing: A Capable Assistant for Precision Operations
In the field of industrial manufacturing, humanoid robots equipped with multi-sensor fusion act as capable assistants for precision operations, playing an indispensable role. During the manufacturing of electronic devices, robots need to accurately assemble tiny electronic components. For example, chips on mobile phone motherboards are extremely small and require extremely high assembly precision. With multi-sensor fusion technology, humanoid robots can use visual sensors to accurately identify the shape and pin positions of chips, while depth cameras provide precise distance information between the chips and the motherboard. Their collaborative work allows the robot to place the chips accurately at the designated positions on the motherboard, significantly improving assembly efficiency and quality, and reducing errors that may occur in manual operations.
In automobile manufacturing plants, humanoid robots use multi-sensor fusion for quality inspection of components. Visual sensors can quickly scan the surface of automotive components to identify minor scratches, dents, deformations, and other defects; LiDAR can perform high-precision measurements of component dimensions to check if they meet design standards. Once a problem is detected, the robot immediately issues an alert and feeds defect information back to the production system, facilitating timely adjustments to the production process and preventing substandard products from entering the next production step—effectively ensuring the production quality of automobiles.
II Medical Care: A Considerate Health Partner
In the medical care field, humanoid robots, empowered by multi-sensor fusion technology, have become considerate health partners, providing significant assistance to medical staff and patients. In rehabilitation training, robots can use force sensors to perceive the strength and motion state of patients’ limbs, and combine visual sensors to recognize patients’ movements, developing personalized rehabilitation training programs for patients. When patients perform limb rehabilitation training, the robot can real-time monitor whether the patients’ movements are standard. Based on changes in patients’ strength and movement conditions, it automatically adjusts the difficulty and intensity of training, providing appropriate assistance and guidance to help patients better restore limb function.
In daily nursing work in hospitals, humanoid robots can also assist medical staff with tasks such as drug delivery and material handling. Through the fusion of LiDAR and visual sensors, robots can autonomously navigate in the complex hospital environment, accurately locate various wards and departments, and deliver drugs and materials to destinations in a timely manner. During interactions with patients, the robot’s voice sensors can receive patients’ needs, providing simple medical consultations and daily assistance (e.g., reminding patients to take medication on time, helping patients adjust hospital beds). This reduces the workload of medical staff and improves the efficiency and quality of medical services.
III Home Services: An Intelligent Housekeeping Helper
In home scenarios, humanoid robots, supported by multi-sensor fusion technology, act as intelligent housekeeping helpers, bringing great convenience to our lives. Imagine returning home after a busy day to find the house cleaned thoroughly by a humanoid robot. The robot uses visual sensors to recognize furniture and floor conditions, LiDAR to construct a room map and plan a reasonable cleaning path—easily avoiding obstacles such as tables, chairs, and carpets—and efficiently completing housekeeping tasks such as floor cleaning and furniture wiping. During cleaning, if the robot encounters stains on the floor, it can use tactile sensors to perceive the location and severity of the stains, and automatically adjust the cleaning strength and method to ensure the stains are completely removed.
Humanoid robots can also serve as family companions. They can engage in natural conversations with family members via voice sensors, answer questions, play music or stories, chat with the elderly to alleviate boredom, and accompany children in learning and playing. When guests visit, the robot can recognize guests’ identities through face recognition technology, proactively greet them, and provide guidance services—making family life more warm and convenient.
Future Outlook: Challenges and Opportunities Coexist
Multi-sensor fusion solutions have brought enormous opportunities for the development of humanoid robots, but they also face certain challenges in practical application and promotion.
In terms of cost, the use of multiple high-precision sensors (e.g., LiDAR) significantly increases the hardware cost of humanoid robots, which to some extent restricts their large-scale popularization and application. This is similar to a luxury car equipped with high-end sensors—its price is often prohibitive. To reduce costs, it is necessary to develop more cost-effective sensors, optimize sensor configuration and fusion algorithms, and reduce the use of unnecessary sensors while ensuring performance, thereby lowering hardware costs.
In terms of algorithm optimization, as the number of sensors and data volume continue to increase, the complexity of multi-sensor fusion algorithms rises sharply, placing higher demands on computing resources and processing speed. This is analogous to a busy transportation hub with numerous vehicles and pedestrians—without an efficient traffic management system, congestion will occur. Current algorithms still have issues such as slow response speed and insufficient decision accuracy when processing multi-source data in complex environments. In the future, it will be necessary to further research and develop more efficient and intelligent fusion algorithms, making full use of technologies such as artificial intelligence and deep learning to improve the real-time performance and accuracy of algorithms, enabling robots to respond quickly and accurately to complex environments.
Sensor miniaturization and integration is another important challenge. Humanoid robots need to integrate multiple sensors in a limited space while ensuring compatibility and collaborative work between sensors. However, some current sensors are relatively large, making it difficult to meet the compact structure requirements of humanoid robots. For example, early computers were bulky, but today’s computers are becoming increasingly thin and portable. In the future, breakthroughs in sensor design and manufacturing processes are needed to develop smaller, higher-performance sensors and achieve high integration of sensors—making the structure of humanoid robots more compact and flexible.

