HEAPGrasp: Robots Master Objects with RGB Camera

Tokyo University of Science

The fields of manufacturing, logistics, and even restaurants are increasingly moving toward automation, with robots being employed for a wide range of tasks. One of the most critical applications of robots is material handling, where grippers are used to move objects, such as automotive parts, logistics packages, food ingredients, and restaurant dishes. This reduces the burden on human workers while lowering the risk of accidents, thereby improving workplace safety.

For robots to handle objects autonomously, they must first accurately measure the three-dimensional (3D) shape of the scene using cameras and then plan how to grasp and move each object. However, certain objects pose major challenges for conventional 3D measurement systems. While opaque objects are relatively easy to identify, transparent objects, such as glass and clear plastics, are much more difficult, with measurement accuracy decreasing as transparency increases. Highly reflective or specular objects present similar challenges. These difficulties create bottlenecks, necessitating human intervention, slowing material handling processes, and limiting the wider application of robots.

To address these issues, Associate Professor Shogo Arai and Mr. Ginga Kennis (who completed his Master's course in 2025), both from the Department of Mechanical and Aerospace Engineering at Tokyo University of Science, Japan, developed an innovative method called HEAPGrasp: Hand-Eye Active Perception to Grasp objects with diverse optical properties. Their study was published online in Volume 11, Issue 3 of the tier-1 journal IEEE Robotics and Automation Letters on January 12, 2026. The findings of the study are also scheduled to be presented at one of the top conferences in the field of robotics called '2026 IEEE International Conference on Robotics and Automation (ICRA).'

"Traditionally, transparent or mirrored (glossy) objects such as reflective metal parts, transparent trays have been unstable to detect when using depth sensors or conventional 3D measurement techniques, making automatic grasping by robots difficult and ultimately leading to human intervention," explains Dr. Arai. "Our approach is based on the idea that even when depth information is unreliable, object shape estimation and grasping are still possible as long as the object's contours or silhouettes can be captured reliably in images."

HEAPGrasp relies on analyzing objects from red, green, and blue (RGB) images captured from multiple viewpoints. First, the system identifies and separates objects from the background using a computer vision technique called semantic segmentation, which assigns each pixel in an image to categories such as "object" or "background." Using a single hand-eye RGB camera, the researchers captured images from different viewpoints and applied semantic segmentation to extract object silhouettes. For this task, the researchers utilized DeepLabv3+ with ResNet-50, a convolutional neural network architecture.

The extracted silhouettes are then used in a 3D reconstruction technique known as Shape from Silhouette (SfS). This technique estimates the 3D shape of the object by analyzing its silhouettes from multi-view images. The idea is that each silhouette defines a possible 3D volume where the object could exist. By intersecting these volumes, SfS estimates the object's shape and position in space. Because this process relies only on object silhouettes, it is unaffected by optical properties such as transparency or reflectivity.

In the SfS method, increasing the number of unique viewpoints improves accuracy and therefore can lead to a higher success rate for grasping. However, this also means that the camera must be moved to multiple viewpoints, increasing the computational cost and time burden. To balance this trade-off, the researchers introduced a deep learning-based next pose planning system that determines the most efficient camera movement trajectory, maximizing measurement accuracy while minimizing unnecessary motion.

The team evaluated HEAPGrasp using a real robotic system across 20 different scenes, each containing five objects. The scenes included transparent-only objects, opaque-only objects, specular-only objects, and mixed scenes containing all three types of objects. The researchers also compared HEAPGrasp's performance with existing grasping methods.

Using HEAPGrasp, the robot achieved a 96% success rate for grasping objects with diverse optical properties using a single camera, significantly outperforming existing methods. In addition, it reduced the hand-eye RGB camera's trajectory length by 52% and the execution time by 19% compared to a baseline method that circles around the scene for 3D measurement.

"Our approach achieves accurate 3D measurement of objects while minimizing camera movement and execution time," remarks Mr. Kennis. "By reducing the amount of pre-adjustment required, HEAPGrasp simplifies on-site implementation and operation, especially since it can be retrofitted to existing robotic systems."

Overall, HEAPGrasp represents a novel and practical 3D measurement approach that enables robots to grasp objects reliably despite challenging optical properties, benefiting numerous fields.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.

You might also like