1X’s Eve humanoid robot masters task chaining, nears autonomous work
OpenAI-backed robotics firm 1X is pressing ahead with its mission to supply physical labor through secure, intelligent androids.
Now, a new update shows its humanoid robot Eve displaying its ability to complete autonomous tasks back-to-back.
The firm claims the video showcases the beginning of 1X’s journey in developing an advanced AI system. The system chains simple tasks into complex actions through voice commands, enabling seamless multi-robot control and remote operation.
1X’s androids utilize Embodied Learning, a method that integrates AI software directly into their physical forms for advanced capabilities.
Earlier, the 1X robots showcased their ability to pick and manipulate simple objects. However, the team believes that its androids must master the ability to chain tasks together to become effective service robots.
Streamlined task integration
Researchers at 1X had developed an autonomous model for its androids that merges many tasks into a single goal-conditioned neural network. However, when these multi-task models are small (less than 100M parameters), adding data to fix one task often negatively impacts performance on other tasks.
According to the team, increasing the number of model parameters can mitigate this forgetting problem, but it also extends training time, delaying engineers’ ability to determine which demonstrations to gather to improve robot behavior.
To iterate quickly on data while building a generalist robot capable of performing many tasks with a single neural network, the team had to decouple the process of rapidly improving task performance by integrating multiple capabilities into a single neural network.
“To accomplish this, we’ve built a voice-controlled natural language interface to chain short-horizon capabilities across multiple small models into longer ones. With humans directing the skill chaining, this allows us to accomplish the long-horizon behaviors,” said Eric Jang, vice president of AI at 1X Technologies, in a blog post.
Navigating robot skill chains
Chaining multiple autonomous robot skills into a sequence is challenging because each subsequent skill must generalize to the slightly varied starting positions resulting from the previous skill.
According to 1X, this difficulty compounds with each successive skill: the second skill must handle the variations from the first, the third must adapt to the outcomes of the second, and so on. While humans can perform long-horizon tasks effortlessly, replicating this with robots requires addressing the complexity of these sequential variations.
“From the user perspective, the robot is capable of doing many natural language tasks and the actual number of models controlling the robot is abstracted away. This allows us to merge the single-task models into goal-conditioned models over time,” said Jang.
Single-task models offer a solid baseline for shadow mode evaluations, allowing the team to compare a new model’s predictions with an existing baseline during testing. Once the goal-conditioned model aligns well with the single-task model’s predictions, researchers can transition to a more powerful, unified model without altering the user workflow.
Using this high-level language interface to direct robots provides a novel user experience for data collection. “Instead of using VR to control a single robot, an operator can direct multiple robots with high-level language and let the low-level policies execute low-level actions to realize those high-level goals,” said Jang.
“Because high-level actions are sent infrequently, operators can even control robots remotely.”
Researchers highlight that the video showcases robots switching tasks based on human direction, indicating that the process is not fully autonomous. After creating a dataset of vision-to-natural language command pairs, the next logical step is to automate the prediction of high-level actions. This can be achieved using vision-language models such as GPT-4o, VILA, and Gemini Vision.
ABOUT THE EDITOR
Jijo Malayil Jijo is an automotive and business journalist based in India. Armed with a BA in History (Honors) from St. Stephen’s College, Delhi University, and a PG diploma in Journalism from the Indian Institute of Mass Communication, Delhi, he has worked for news agencies, national newspapers, and automotive magazines. In his spare time, he likes to go off-roading, engage in political discourse, travel, and teach languages.