Can Robots Make Decisions Based on Common Sense?
When it comes to decision-making, humans often rely on their common sense. But what about robots? Can they make decisions based on common sense? Researchers at the Department of Automation and Beijing National Research Centre for Information Science and Technology have developed a Task Planning Agent (TaPA) that enables embodied agents, like robots, to make decisions based on common sense.
Understanding the Need for Common Sense
In order for robots to successfully complete human instructions, they need to have a sense of common sense. However, current language models fall short in generating feasible action sequences due to the lack of detailed information about the real world. To address this issue, the researchers proposed TaPA, which aligns Language and Vision models (LLMs) with visual perception models to generate executable plans based on existing objects in a scene.
Generating Grounded Plans
TaPA is designed to generate grounded plans without constraining task types and target objects. The researchers created a multimodal dataset consisting of visual scenes, instructions, and corresponding plans. They then fine-tuned a pre-trained LLaMA network by predicting the action steps based on the object list in the scene. This trained network serves as the task planner for TaPA.
The embodied agent collects RGB images from different viewpoints to provide sufficient information for multi-view image detection. By considering the scene information and human instructions, TaPA is able to generate executable actions step by step.
Generating the Multimodal Dataset
Creating a large-scale multimodal dataset for training the planning agent is a challenge due to the lack of existing datasets. To overcome this, the researchers used GPT-3.5 and combined it with scene representations and design prompts to generate a large-scale multimodal dataset. This dataset was then used to fine-tune the task planner.
Achieving Higher Success Rates
The researchers found that TaPA agents achieved higher success rates in generating action plans compared to other state-of-the-art LLMs and large multimodal models. TaPA demonstrated better understanding of input objects, with a decrease in the percentage of hallucination cases compared to other models.
Complexity and Optimization
The complexity of the tasks in the multimodal dataset indicates the need for new methods for optimization. The researchers suggest that conventional benchmarks for instruction following tasks may not be sufficient for more complex tasks that require longer implementation steps.
If you’re interested in learning more about this research, you can read the paper. And if you want to stay updated on the latest AI research news and projects, make sure to follow us on Twitter and join our ML SubReddit, Facebook Community, Discord Channel, and Email Newsletter.
Editor Notes
Robots with the ability to make decisions based on common sense have the potential to revolutionize various industries. This research by the Department of Automation and Beijing National Research Centre for Information Science and Technology presents a significant step forward in the development of embodied agents with common sense. By aligning language and vision models with visual perception, TaPA agents are able to generate grounded plans and improve the success rates of action plans. This research opens up new possibilities for the integration of robots into our daily lives, and we’re excited to see further advancements in this field.
For more AI news and insights, visit the GPT News Room.
from GPT News Room https://ift.tt/u8I5bXR
No comments:
Post a Comment