Home » MIT unveils new robot training approach

MIT unveils new robot training approach

MIT researchers have developed a new approach for training robots, drawing inspiration from large language models (LLMs). This technique uses vast amounts of diverse data, similar to the data used to train LLMs, to teach robots new skills. Traditional methods like imitation learning, where robots learn by observing humans, often fall short when variables such as lighting changes or new obstacles are introduced.

The robots lack enough data to adapt effectively. The MIT team created an architecture called Heterogeneous Pretrained Transformers (HPT). It integrates data from various sensors and environments.

A transformer then compiles this data into training models. The output quality improves as the transformer size increases. Users can input the robot’s design, configuration, and desired task into the system.

David Held, a CMU associate professor, said, “Our dream is to have a universal robot brain that you could download and use for your robot without any training at all. While we are just in the early stages, we are going to keep pushing hard and hope scaling leads to a breakthrough in robotic policies, like it did with large language models.”

The Toyota Research Institute partially funded the research. MIT researchers have created a versatile technique that combines diverse data into a single system to train general-purpose robots.

This method saves time and reduces costs compared to traditional training methods. The new approach aligns data from simulations, real robots, vision sensors, and robotic arm position encoders into a unified “language” that a generative AI model can understand.

New robot training integrates diverse data

By using this large amount of data, robots can be trained for various tasks without restarting the training process each time. Lirui Wang, an MIT graduate student and lead author of the study, says, “In robotics, we often grapple with insufficient training data. The crux of our approach is addressing the diversity in data domains, modalities, and robot hardware.”

The research will be presented at the Conference on Neural Information Processing Systems.

Wang’s co-authors include fellow graduate student Jialiang Zhao, Meta research scientist Xinlei Chen, and MIT professor Kaiming He. The MIT team developed Heterogeneous Pretrained Transformers (HPT) to integrate data from different modalities and domains. A transformer model unifies vision and proprioception inputs into a consistent format.

The model learns from large datasets and becomes more effective as it scales. The researchers used 52 datasets with over 200,000 robot trajectories, including human demo videos and simulations. They also created methods to convert raw proprioception signals into data manageable by the transformer.

Treating proprioception as equally important to vision enables precise, dexterous robot movements crucial for complex tasks. HPT boosted robot performance by over 20% in simulations and real-world scenarios, even for tasks different from the pretraining data. David Held, an associate professor at Carnegie Mellon University, notes, “This approach enables training across diverse datasets, allowing robot learning methods to scale significantly.

The model can quickly adapt to new robot designs, which is crucial as new robots are continuously developed.”

The MIT team plans to explore how data diversity can further enhance HPT’s performance and enable it to process unlabeled data, similar to large language models. Wang says, “Our dream is a universal robot brain downloadable for immediate use without additional training.”

The Amazon Greater Boston Tech Initiative and the Toyota Research Institute partially funded this work.

Rashan Dixon

Rashan is a seasoned technology journalist and visionary leader serving as the Editor-in-Chief of DevX.com, a leading online publication focused on software development, programming languages, and emerging technologies. With his deep expertise in the tech industry and her passion for empowering developers, Rashan has transformed DevX.com into a vibrant hub of knowledge and innovation. Reach out to Rashan at [email protected]

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.