Exclusive: How generative AI is set to transform video conferencing

May 17, 2024

50 5 minutes read

Exclusive: How generative AI is set to transform video conferencing — Avi Baum CTO Hailo e1715934971446.png

While virtual meetings have become mainstream, technology has failed to replicate the social experience of in-person interactions. At the same time, generative artificial intelligence (GenAI) technology has evolved significantly, offering a solution to many of the issues that have plagued hybrid conferencing to date.

GenAI is bound to make virtual meetings more productive and engaging, mimicking real-life experiences. But in order for this to become a reality, these features need to be available in real-time with minimum latency and at an affordable price point. That means some of these new AI functions must be available at connected endpoints.

Fortunately, solutions providers are quickly integrating generative AI into leading video conferencing platforms and computers to make real-time optimisation, virtual enhancements and automated meeting management a reality. These developments are paving the way for the integration community to significantly enhance the hybrid and virtual meeting experience for their customers.

VIRTUAL REPLICATION
GenAI can significantly enhance the video, audio, and text experience of a virtual meeting. In a hybrid meeting with both in-person and remote participants, intelligent video processing powered by AI can enable remote participants to zoom-in on speakers, replicating the experience of an in-person meeting instead of broadcasting a static shot of the entire meeting room.

Neural Radiance field (NeRF) or similar technologies can help create an engaging view of the remote participant’s side to create an immersive experience that dynamically changes the angle of view at each endpoint. AI can then turn it into a consistent gallery view, displaying all participants in a uniform size, posture or style. Plus, if a whiteboard is present in the meeting room, it can be auto-detected by AI and written text can be converted into an editable format. A personal version could be created for notetaking as well.

GenAI can also assist each meeting participant – whether virtually or in person – with audio and text to maximiSe their productivity. This assistant can be used to convert audio to text to create a summary of the meeting, take actions as they are pointed to respective owners and even suggest relevant responses on the fly. For multilingual teams, language barriers can be mitigated with the help of such an assistant that can deliver instantaneous audio translation.

Despite the virtually unlimited possibilities, GenAI as it exists today is limited by the technology that enables it. To harness its true power, using existing cloud-based services is not enough for it to become available by default.

SCALABLE FUTURES
For GenAI to reach its full potential in videoconferencing, video conferencing systems should be able to perform GenAI processing at the endpoints themselves — either on the personal computer or the conferencing gateway device – without needing to reach back to the cloud for processing.

One of the key aspects of conferencing systems is their ability to scale. When it comes to scalability, it is vital to identify the cases in which centralised processing is relevant and the ones that require edge processing.

There are 3 main cases in which processing at a central point is advantageous:

• Time sharing – when functionality requires light processing that can be handled easily by a central machine at a fraction of its capacity, like an alert when a participant enters the room or unmutes their microphone, the central machine can serve all endpoints, each at a different time slot without noticeable impact.

• Resource sharing – when the function has an inherent processing that is common to all endpoints, such as searching on a shared database. In such cases, the shared processing can be applied once and it is reusable for many or all endpoints.

• Information sharing – when the same information needs to be shared by all participants. For example, a shared whiteboard with no personal comments per participant.

Most of the capabilities formerly described do not meet these three cases. To build scalable video conferencing systems that can make these functions available for all participants, distribution of the AI capabilities downstream is required, equipping the different nodes with proper AI compute capacity.

This will result in multiple benefits, such as:

• Latency – In virtual conferences, instantaneous results are imperative for smooth interactions, whether it’s real-time translation, content creation, or video adjustment. Leveraging generative AI on edge devices reduces latency, ensuring a fluent discussion and seamless user experience without delays.

• Expense – The cost of monthly subscriptions to cloud-based generative AI tools can be daunting for many organisations. With a multitude of tools catering to various user needs like chat, search engine, and image/video creation, costs can quickly add up to hundreds of dollars per user per month, straining budgets further. By migrating generative AI to the personal computer of the users or to the conferencing device, users become owners of the tools without the need for monthly subscriptions or long-term commitments, presenting a more financially viable solution.

• Bandwidth and Connectivity – Virtual conferences are often affected by shortage of bandwidth, especially when participants have limited internet connectivity, such as during travel or in remote locations. Edge-based generative AI can locally crop out irrelevant information, guaranteeing that only relevant and important data is transmitted and enabling uninterrupted and productive meetings.

• Environmental Impact – The impact of cloud-based AI processing cannot be underestimated, with significant energy consumption and pollution generated in the process. Researchers at Carnegie Mellon University and Hugging Face measured the carbon footprint of different machine learning tasks. Their findings show that AI tasks that involve the generation of new content, such as text generation, summarisation, image captioning, and image generation, stand out as the most energy intensive. The findings show that the most energy-intensive AI models such as Stability AI’s Stable Diffusion XL, produce nearly 1,600 grams of CO2 per session, which is about the same environmental impact as driving four miles in a gas-powered car.

Edge devices offer a more sustainable option for generative AI, consuming less power, minimising cooling requirements, and reducing carbon footprint, thereby contributing to a greener and more eco-friendly approach to AI conferencing.

ADDING AI
In the not-so-distant future, AV integrators and designers will be able to install video conferencing systems that are ready for the GenAI era, providing the benefits of GenAI alongside the performance, reliability, and security advantages of edge processing.

These videoconferencing systems of the future that process AI directly on edge devices require closed-loop systems that can handle parts of what is currently done in the cloud. Processing AI on devices such as laptops, conference room devices and cameras will ensure meetings run smoothly and at an affordable cost, while keeping AI-generated content like auto-summaries or dynamic presentations more secure.

Hailo offers AI processors that are purpose-designed to handle AI models efficiently and at an affordable price suitable to a variety of edge devices. Today, the company is working with conferencing manufacturers to integrate AI processors into their hardware and make the videoconferencing systems of the future a reality.

Avi Baum is Chief Technology Officer and Co-Founder of Hailo, an AI-focused, Israel-based chipmaker that has developed a specialised AI processor for enabling data centre-class performance on edge devices. Baum has over 17 years of experience in system engineering, signal processing, algorithms, and telecommunications and has focused on wireless communication technologies for the past 10 years.

Source

May 17, 2024

50 5 minutes read