1X shows advances in voice control, chaining tasks for humanoid robots
Listen to this article |
For humanoid robots to be useful in household settings, they must master numerous tasks. 1X Technologies today released a video showing how it is applying artificial intelligence and teleoperation to training its robots.
In the video above, the company demonstrated a user giving a verbal command to a group of robots that then carry out multiple actions.
“This update showcases progress we’ve made toward longer autonomous behaviors,” said Erik Jang, vice president of AI at 1X Technologies. “We’ve previously shown that our robots were able to pick up and manipulate simple objects, but to have useful home robots, you have to chain tasks together smoothly.”
“In practice, the robot doesn’t always position itself right next to a table, so we need to be able to tell it to adjust its position and then manipulate the object,” he told The Robot Report. “In building out our repertoire of skills, we’re finding a lot of other skills — like getting closer or backing up — that humans can instruct the robots with natural language.”
1X builds single tasks toward a unified model
1X Technologies has been working toward a single neural network to handle a wide range of tasks, but it is starting with training individual models through teleoperation. This marks a change in how the company is approaching training and scaling of capabilities, Jang said.
“Before, we thought of a single model for thousands of tasks, but it’s hard to train for so many skills simultaneously,” he noted. “It’s important to push forward on multiple fronts, so we’ve added a few hundred individual capabilities. Our library of skills is mapped to simple language descriptions.”
1X, which has offices in Sunnyvale, Calif., and Moss, Norway, still plans to work toward a single model for all tasks. It is using “shadow mode” evaluations to compare predictions to a baseline for testing. The company already has generic navigation and manipulation policies, said Jang.
“We can give the robot a goal — ‘Please go to this part of the room’ — and the same neural network can navigate to all parts of the room,” he said. “Tidying up a room involves four primitives: going anywhere in the room, adjusting for position, picking something up, and putting it down.”
1X plans to add skills such as opening doors, drawers, and bottles, and Jang acknowledged that it’s still early days for building them out.
“Autonomy is hard. If a robot has to go to a second task, it has to pick up the slack from the first one,” he said. “For example, if the first robot didn’t get to the right spot next to a table, then the second robot has to stick its arm out further to grab something, and the third task has to compensate even more. Errors tend to compound.”
Submit your presentation idea now.
Voice interface enables training, higher-level actions
“We’ve built a way for humans to instruct the robots on tasks so that if they make a mistake, the human can dictate what the command should be,” he added. “We use a human in the loop issuing natural language commands.”
In the video, 1X Technologies showed a person directing multiple robots to perform a sequence of actions with a simple voice command.
“We treat natural language commands as a new type of action, translating from low-level instructions to higher-level actions,” said Jang. “We’re working toward robots that can work autonomously for long periods of time. Cleaning things often involves interacting with different tools and appliances. To be useful, household robots should not be limited to pick-and-place operations.”
Remote and multi-robot control lead to scalability
1X Technologies has taken the approach of having the same people who gather the data from teleoperation be the ones who train robots for their skills.
“I’m super proud of the work they do,” said Jang. “We’ve closed the loop, and the teleoperators train everything themselves. In this ‘farm-to-table’ approach, they’ve built all the capabilities.”
By showing that users without computer science experience can train robots, 1X said it is removing a bottleneck to scaling.
“In the same way we have operators train low-level skills, we can have them train higher-level ones,” Jang added. “It’s now very clear to us that we can transition away from predicting robot actions at low levels to building agents that can operate at longer horizons. This opens up a lot of possibilities for connecting to advancements in LLMs and visual models.”
By enabling users to set high-level goals for multiple robots, 1X Technologies said it will allow for more efficient fleet management.
“Once we have controls in the language space, it’s not a huge leap to see robots working with Gemini Pro Vision or GPT 4.0 for longer-horizon behaviors,” Jang said.
Humanoids are fast approaching, says 1X
Over the past year, 1X has pivoted from purely commercial deployments with EVE to more diverse settings with NEO. The company raised $100 million in January. When will humanoids using unified AI models be ready for the domestic market?
“I want it to come as fast as possible,” replied Jang. “A lot of people think that general-purpose home or humanoid robots are far away, but they’re probably a lot closer than one thinks.”
Jang asserted that by designing its own actuators, 1X has made NEO to be safe around humans, a prerequisite for household use. The hardware’s ability to compensate also allows the AI to have room for error, he said.
Still, humanoid robot developers have to do more than produce interesting videos, Jang said. They have to demonstrate capabilities in the real world and control costs on the path to commercialization.
“The onus is on us to get away from making videos to making something that people can see in person without hiding actual performance details,” he said. “Not everything with a torso four limbs is a humanoid, and we’ve put a lot of thought about the force, torque, and strength of each. Not all robots are created equal.”
“There’s a sweet spot between overspeccing costs and underspeccing costs, which can hamper the ability to pursue AI or automation in general,” said Jang. “Many of the top humanoid companies are making different choices, and there’s a spectrum between millimeter-level precision on fingers and calibration with cameras to, on the other end, 3D-printed robots. It’s a healthy competition.”