With the flow of e-commerce orders, a warehouse bot picks up cups from the shelf and puts them into boxes for shipment. Everything springs up, until the repository handles the change and the bot now has to grab taller and narrower cups that are stored upside down.
Reprogramming this robot involves manually marking thousands of photos showing how to pick up new cups, then retraining the system.
But the new technology developed by the MIT researchers will only require a few human demonstrations to reprogram the robot. This machine learning method allows robots to pick up and place never-before-seen objects into random positions like never before. Within 10 to 15 minutes, the robot will be ready to perform a new picking and locating task.
This technology uses a specially designed neural network to reconstruct the shapes of 3D objects. With just a few demos, the system uses what the neural network has learned about 3D geometry to snip out new objects similar to those in the demos.
In simulations and using a real robotic arm, the researchers demonstrated that their system can efficiently handle cups, bowls and bottles like never before, arranged in random positions, using just 10 demos to teach the robot.
“Our main contribution is the general ability to more efficiently deliver new skills to robots that need to operate in less structured environments where there can be a lot of variance,” says Anthony Simonov, graduate student in Electrical and Computer Engineering (EECS) and lead author Participant of the paper, the concept of generalizing by constructivism is a great ability because this problem is usually more difficult.
Semyonov wrote the article with co-lead author Yilon Do, an EECS graduate student. Andrea Taglisacchi, Research Scientist, Google Brain; Joshua b. Tenenbaum, Paul E. Newton, Professor of Career Development in Cognitive and Computational Sciences in the Department of Brain and Cognitive Sciences and member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); Alberto Rodriguez, Associate Professor for Class of 1957 in the Department of Mechanical Engineering; Lead authors are Pulkit Agrawal, professor at CSAIL, and Vincent Setsman, incoming associate professor at EECS. The research will be presented at the International Conference on Robotics and Automation.
A robot can be trained to pick up a specific object, but if that object is lying on its side (it may have fallen), the robot sees this as a completely new scenario. This is one of the reasons why it is so difficult for machine learning systems to generalize to new object orientations.
To overcome this challenge, the researchers created a new type of neural network model, the Neural Descriptor Field (NDF), which learns the 3D geometry of a class of elements. The model computes the geometric representation of a particular element using a three-dimensional point cloud, which is a set of three-dimensional data points or coordinates. Data points can be obtained from a depth camera that provides information about the distance between an object and a viewpoint. Although the network is trained to simulate on a large data set of 3D artificial shapes, it can be applied directly to real-world objects.
The team designed the NDF with a feature that identifies evenly. With this characteristic, if the model sees an image of an upright cup, and then shows an image of the same cup on its side, it understands that the second cup is the same, just rotated.
“This asymmetry is what allows us to deal with cases where the object you are observing is in an arbitrary direction more efficiently,” Simonov says.
When NDF learns to reconstruct the shapes of similar objects, it also learns to connect related parts of those objects. For example, he learns that the handles of mugs are similar, although some mugs are taller or wider than others, or have shorter or longer handles.
“If you want to do it another way, you’ll have to manually name all the parts. Instead, our approach automatically detects these parts of the shape reconstruction,” says Doe.
The researchers use this trained NDF model to teach the robot a new skill with just a few physical examples. They move the robot’s hand over a part of the object they want to hold, such as the rim of a bowl or the handle of a cup, and record the locations of the fingertips.
Parce que le NDF a beaucoup appris sur la géométrie 3D et sur la façon de reconstruire des forms, il peut déduire la structure d’une nouvelle form, ce qui permet au système de transférer les démonstrations sur lique de nouvelles poses, The.
Choose the winner
They tested their model in simulations and on a real robotic arm, using cups, bowls and bottles as objects. Their method had an 85% success rate in picking and putting tasks with new things in new directions, while the best baseline could only have a 45% success rate. Success means grabbing something new and putting it in a target place, like hanging mugs on a shelf.
Many baselines use 2D image information rather than 3D geometry, making integrating contrast in these ways more difficult. This is one of the reasons why NDF technology works better.
While the researchers were pleased with its performance, their method only works for the specific class of objects it is trained on. A robot that has learned how to pick up cups will not be able to pick up boxes or headphones, because these objects have very different geometric properties than those on which the network was trained.
“In the future, it would be ideal to expand it to many categories or completely abandon the idea of classification,” Simonov says.
They also plan to adapt the system to non-solid objects and, in the long run, to allow the system to perform pick-and-place tasks when the target area changes.
This work is supported in part by the Defense Advanced Research Projects Agency, the Defense Science and Technology Agency of Singapore, and the National Science Foundation.