This $110 Million Startup Is Building A Nervous System For AI

June 4, 2024

36 3 minutes read

Russ D’Sa (left) and David Zhao (right) cofounded network infrastructure startup in 2021 to help developers add audio and video capabilities to their applications. Now the duo see generative AI model builders as a growing market for their product.

LiveKit

Over the past few years, AI models have seen big leaps in text and image-based capabilities, but their creators have a loftier vision: “multimodal” AI interfaces that can see, hear and speak to humans. But to be able to carry out impressive tasks like telling jokes and singing songs, models like OpenAI’s GPT-4o requires a faster, more efficient type of network infrastructure— one provided by lesser known three-year-old startup LiveKit.

“If OpenAI is building the brain, LiveKit is building the nervous system to carry signals to and from that brain, LiveKit CEO and cofounder Russ D’Sa said.

As recently as November 2023, D’Sa struggled to raise capital for his startup because investors thought these multimodal models were still at least five years away. That belief changed within just a few months as both Google and OpenAI demoed and released new AI models that can process and generate content across audio and visual formats. “All of a sudden I was getting pinged by the same investors trying to follow up with me and ask me how things are going with the round,” D’Sa told Forbes.

Today the company announced it has raised a $22.7 million series A investment led by Altimeter Capital with participation from Redpoint Ventures. Also joining the round are angel investors from around the AI industry including Google Chief Scientist Jeff Dean, tech investor Elad Gil and founders of prominent AI startups like Perplexity CEO Aravind Srinivas, Pika CEO Demi Guo and ElevenLabs CEO Mati Staniszewski. With about $38 million in total funding, LiveKit is valued at $110 million, according to a source familiar with the round. Its tools are already used by some 20,000 developers at firms like OpenAI, Character AI, Spotify and Meta, and last year it posted an annual run rate of $3 million.

The interest comes from the fact that current internet infrastructure isn’t optimized to transport multimodal data in and out of AI models, D’Sa said. That’s in part because every time a person sends a piece of information or a request, the sender first needs to get back a response confirming and acknowledging that the “packet” of data has been received before more can be sent. This is typically done to ensure that data doesn’t get lost during transmission. This lag time is barely noticeable when all you’re worried about is text. But for high-bandwidth data like videos and audio, there isn’t enough time to send a notification each time data is transferred and still ensure smooth operation.

To tackle this issue LiveKit uses a protocol called UDP that lets applications stream audio and video content without needing to confirm each packet (The downside? It increases the risk of data loss). The company’s pitch convinced Perplexity CEO Aravind Srinivas, who is also looking to add voice capabilities to his AI-powered search engine, to invest in his startup. “You can still build something yourself with the traditional architecture but this is something that truly scales to lots of users and scales across not just voice, but also consume images and videos at once,” Srinivas said, adding that he was impressed by the fact that OpenAI’s demo of its latest multimodal model, GPT-4o, was conducted on LiveKit’s network.

D’Sa met his cofounder David Zhao at Y Combinator in 2007, where they were both working on separate video streaming startups. D’Sa and Zhao parted ways, carrying out stints at Twitter and Motorola respectively. In 2012, they teamed up for the first time. After trying out several ideas, the duo eventually founded a machine learning based news recommendation app called Evie Labs, which they sold to Medium in 2019 for $30 million. The pair founded LiveKit in 2021 to provide tools to easily add video and audio capabilities into interactive applications amid the Covid-19 pandemic.

AI model builders aren’t LiveKit’s only customers. LiveKit’s open source tools are also being used to power customer support calls, schedule appointments with patients at hospitals, drive autonomous tractors in farms and carry out a quarter of 911 dispatch calls, D’Sa said. LiveKit claims its suite of tools come handy for these real-time audio and video applications, making the data transfer process swift and more efficient at scale.

According to D’Sa, as more companies aim to make voice and video interfaces sound and appear more human-like, a high-speed network that can move around data quickly would make a meaningful difference in the capabilities of these systems and enable more flexible interactions with AI.

“Almost everyone is focused on the compute part of AI,” he said. “Almost nobody is focused on the network part of it, but it’s such a critical piece to power this future.”