Figure’s Humanoid Robot Engages in Real-Time Conversation with OpenAI’s Generative AI
Robotics developer Figure has showcased its first humanoid robot, Figure 01, engaging in a real-time conversation using generative AI technology from OpenAI. The partnership between Figure and OpenAI enables the robot to understand and respond to human interactions instantly, thanks to high-level visual and language intelligence.
In a video demonstration, Figure 01 interacts with its creator, Senior AI Engineer Corey Lynch, as he assigns various tasks to the robot in a makeshift kitchen. The tasks include identifying an apple, dishes, and cups. The robot successfully identifies the apple as food and collects trash into a basket while multitasking.
Figure 01’s Capabilities
According to Corey Lynch’s explanation on Twitter, Figure 01 can:
- Describe its visual experience
- Plan future actions
- Reflect on its memory
- Explain its reasoning verbally
Lynch further elaborated that the robot’s behavior is learned rather than remotely controlled. Images from the robot’s cameras and transcribed speech captured by onboard microphones are fed into a large multimodal model trained by OpenAI. This multimodal AI understands and generates different data types such as text and images.
The model processes the entire conversation history, including past images, to generate language responses spoken back to the human through text-to-speech. It also decides which learned behavior to execute on the robot based on a given command.
Positive Reception and Speculation
The debut of Figure 01 has sparked excitement on Twitter, with many people impressed by its capabilities:
Please tell me your team has watched every Terminator movie.
— Daniel Innovate (@danielinnov8) March 13, 2024
We gotta find John Connor as soon as possible
— Kaylard – e/acc (@KaylardAI) March 13, 2024
Sci-fi has become Sci-nonfi
Congrats to @adcock_brett, @sama, and their teams for creating the first convincing demo of life 2.0
— Justin Halford (@Justin_Halford_) March 13, 2024
Technical Details and Implications
Corey Lynch provided technical details regarding Figure 01’s operation. The robot’s behaviors are driven by neural network visuomotor transformer policies that map pixels directly to actions. These networks process onboard images at a rate of 10hz and generate 24-DOF actions (wrist poses and finger joint angles) at a rate of 200hz.
The debut of Figure 01 comes at a time when policymakers and global leaders are grappling with the integration of AI tools into mainstream society. While most discussions focus on large language models like OpenAI’s ChatGPT and Google’s Gemini, developers are also exploring ways to give AI physical humanoid robotic bodies.
Closing Thoughts: A Step Towards the Future
The demonstration of Figure 01’s conversational abilities marks a significant milestone in the field of robotics and AI. The combination of generative AI from OpenAI and Figure’s humanoid robot showcases the potential for advanced human-robot interaction.
While the debut of Figure 01 has raised speculation and references to science fiction, it also highlights the progress made in merging AI and robotics. These advancements may contribute to various domains, including space exploration and utilitarian objectives.
Figure AI and OpenAI have not yet responded to Decrypt’s request for comment on this development.