Meta just stuck its AI somewhere you didn’t expect it — a pair of smart Ray-Bans
Smart glasses have arguably failed to take off, but the addition of artificial intelligence (AI) could be the key to developing a truly transformational wearable technology.
In the US and Canada, Ray-Ban Meta smart glasses have received a rollout of multimodal AI technology with software called the “Meta AI virtual assistant.” With multimodal AI — which means generative AI that can process queries that involve more than one medium (for example, both audio and imagery) — the device can better respond to queries based on what a wearer is looking at.
“Say you’re traveling and trying to read a menu in French. Your smart glasses can use their built-in camera and Meta AI to translate the text for you, giving you the info you need without having to pull out your phone or stare at a screen,” Meta representatives explained April 23 in a statement.
Related: Smart glasses could boost privacy by swapping cameras for this 100-year-old technology
The device first takes a photo of what a wearer is looking at, then the AI taps into cloud-based processing to serve up an answer to a query, delivered by speech, such as “what type of plant am I looking at?”
Meta first explored integrating multimodal AI into the Ray-Ban Meta smart glasses in a limited release in December 2023.
Testing the AI functionality in this device, a reporter from The Verge found that it mostly responded correctly when asked to identify the model of a car. It could also describe a type of cat, for example, and its features in an image snapped via the camera. But the AI ran into trouble in accurately identifying the species of plants belonging to one reporter and struggled to correctly identify a groundhog in their neighbor’s backyard.
Multimodal machinations
AI-powered virtual assistants are nothing new, with the likes of the Google Assistant, Amazon Alexa and Apple’s Siri all providing smart answers to queries in natural language. But the crux of the Meta AI in the Ray-Ban smart glasses is its multimodal functionality.
The ability to fuse and process data from multiple sensor modules — for example, cameras and microphones — means a multimodal AI can generate more accurate and sophisticated outcomes versus unimodal AI systems. Google’s Gemini multimodal AI model , for example,can process a photo of some cookies and respond with the recipe.
Trained on identifying patterns in different types of data inputs through multiple neutral networks — collections of machine learning algorithms arranged to mimic the human brain — multimodal AIs can process input data from text, images, audio and more.
In smart glasses, it means an AI can make sense of the world the wearer is viewing by combining sensors on the glasses with these neural networks. As a result, the system can answer more sophisticated queries and offer smarter contextual information.
But in the case of the Ray-Ban Meta device, the AI has some distance to go before it meets the AI-processing capabilities found in the latest smartphones; these benefit from more powerful chipsets and onboard sensor fusion – where data is taken from multiple sensors and processed together, for example to offer scene recognition in camera apps allowing for lighting and color balance to be intelligently adjusted, or combining data from thermometers and optical sensors in smartwatches to offer better feedback on one’s workout.