Alter3 is the latest GPT-4-powered humanoid robot

by | Jun 24, 2024 | Technology

Don’t miss OpenAI, Chevron, Nvidia, Kaiser Permanente, and Capital One leaders only at VentureBeat Transform 2024. Gain essential insights about GenAI and expand your network at this exclusive three day event. Learn More

Researchers at the University of Tokyo and Alternative Machine have developed a humanoid robot system that can directly map natural language commands to robot actions. Named Alter3, the robot has been designed to take advantage of the vast knowledge contained in large language models (LLMs) such as GPT-4 to perform complicated tasks such as taking a selfie or pretending to be a ghost.

This is the latest in a growing body of research that brings together the power of foundation models and robotics systems. While such systems have yet to reach a scalable commercial solution, they have propelled robotics research forward in recent years and are showing much promise.

[embedded content]

How LLMs control robots

Alter3 uses GPT-4 as the backend model. The model receives a natural language instruction that either describes an action or a situation to which the robot must respond.

The LLM uses an “agentic framework” to plan a series of actions that the robot must take to achieve its goal. In the first stage, the model acts as a planner that must determine the steps required to perform the desired action.

Countdown to VB Transform 2024

Join enterprise leaders in San Francisco from July 9 to 11 for our flagship AI event. Connect with peers, explore the opportunities and challenges of Generative AI, and learn how to integrate AI applications into your industry. Register Now

Alter3 uses different GPT-4 prompt formats to reason about instructions and map them to robot commands (source: GitHub)

Next, the action plan is passed on to a coding agent which generates the commands required for the robot to perform each of the steps. Since GPT-4 has not been trained on the programming commands of Alter3, the researchers use its in-context learning ability to adapt its behavior to the API of the robot. This means that the prompt includes a list of commands and a set of examples that show how each command can be used. The model then maps each of the steps to one or more API commands that are sent for execution to the robot.

“Before the LLM appeared, we had to control all the 43 axes in certain order to mimic a person’s pose or to pretend a behavior such as serving a tea or playing a chess,” the researchers write. “Thanks to LLM, we are now free from the iterative labors.”

Learn …

Article Attribution | Read More at Article Source

[mwai_chat context=”Let’s have a discussion about this article:nn
Don’t miss OpenAI, Chevron, Nvidia, Kaiser Permanente, and Capital One leaders only at VentureBeat Transform 2024. Gain essential insights about GenAI and expand your network at this exclusive three day event. Learn More

Researchers at the University of Tokyo and Alternative Machine have developed a humanoid robot system that can directly map natural language commands to robot actions. Named Alter3, the robot has been designed to take advantage of the vast knowledge contained in large language models (LLMs) such as GPT-4 to perform complicated tasks such as taking a selfie or pretending to be a ghost.

This is the latest in a growing body of research that brings together the power of foundation models and robotics systems. While such systems have yet to reach a scalable commercial solution, they have propelled robotics research forward in recent years and are showing much promise.

[embedded content]

How LLMs control robots

Alter3 uses GPT-4 as the backend model. The model receives a natural language instruction that either describes an action or a situation to which the robot must respond.

The LLM uses an “agentic framework” to plan a series of actions that the robot must take to achieve its goal. In the first stage, the model acts as a planner that must determine the steps required to perform the desired action.

Countdown to VB Transform 2024

Join enterprise leaders in San Francisco from July 9 to 11 for our flagship AI event. Connect with peers, explore the opportunities and challenges of Generative AI, and learn how to integrate AI applications into your industry. Register Now

Alter3 uses different GPT-4 prompt formats to reason about instructions and map them to robot commands (source: GitHub)

Next, the action plan is passed on to a coding agent which generates the commands required for the robot to perform each of the steps. Since GPT-4 has not been trained on the programming commands of Alter3, the researchers use its in-context learning ability to adapt its behavior to the API of the robot. This means that the prompt includes a list of commands and a set of examples that show how each command can be used. The model then maps each of the steps to one or more API commands that are sent for execution to the robot.

“Before the LLM appeared, we had to control all the 43 axes in certain order to mimic a person’s pose or to pretend a behavior such as serving a tea or playing a chess,” the researchers write. “Thanks to LLM, we are now free from the iterative labors.”

Learn …nnDiscussion:nn” ai_name=”RocketNews AI: ” start_sentence=”Can I tell you more about this article?” text_input_placeholder=”Type ‘Yes'”]

Share This