Data Analytics

PixelsDB: An Open-Source Data Analytic System that Allows Users without SQL expertise to Explore Data Efficiently


https://arxiv.org/abs/2405.19784

Query-as-a-Service (QaaS), also called serverless query processing, is a method of running analytical queries on the cloud. Serverless query engines, like AWS Athena and Google BigQuery, automate resource management and scalability operations in contrast to traditional query engines that demand a large amount of manual labor. For individuals without extensive technical knowledge, this automation greatly simplifies the process by enabling users to run queries without having to handle the underlying infrastructure.

Under the serverless model, users are charged according to their actual consumption, which can be measured by the quantity of data scanned or the number of processing units used. Because users only pay for what they use, this pay-as-you-go pricing model may be more economical for those with low-volume workloads.

However, this paradigm does have several drawbacks. Rather than being optimized for continuous, high-volume operations, serverless query engines are meant for brief, bursty tasks. In comparison to conventional massively parallel processing (MPP) engines that operate on pre-provisioned virtual machine clusters for lengthy workloads, they may consequently become less scalable and significantly more expensive.

Pixels-Turbo is a hybrid query engine that was created to overcome these limitations. In order to address abrupt spikes in workload that the VM cluster is unable to immediately handle, Pixels-Turbo uses cloud functionalities in addition to an auto-scaled virtual machine cluster to process requests. This method combines the cost-effectiveness of typical virtual machine clusters with the elasticity of serverless computing for ongoing workloads.

Pixels-Turbo adds functionality that users can control to enable or disable cloud function acceleration. Enabling this feature guarantees faster execution at a higher cost for urgent requests. Even with these improvements, many users still find it difficult to translate sophisticated analytical requirements into effective SQL queries. 

In order to help users who are not proficient in SQL or system administration, a team of researchers has introduced PixelsDB, an open-source data analytics tool. With PixelsDB, an NLP interface enables users to create and troubleshoot SQL queries. Sophisticated language models powering this interface are able to transform user input into SQL queries that can be executed. Without much technical knowledge, consumers can engage with the system and get the data insights they require.

A serverless query engine runs the queries after they are generated. Several price tiers are available from PixelsDB, depending on how urgent the queries are. Through dedicated architecture design and heterogeneous resource scheduling, the system’s architecture is built to natively accommodate these different service levels. This implies that the system can optimize overall cost without sacrificing performance for critical jobs by allocating economic resources to address non-urgent inquiries.

The team has shared that PixelsDB’s serverless query processing, natural language interface, and customizable service levels and pricing will greatly improve the user experience of data analysis. In conclusion, PixelsDB seeks to increase the efficiency and accessibility of data analytics for non-technical users by removing technical obstacles and offering a more user-friendly interface for creating and executing queries.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 43k+ ML SubReddit | Also, check out our AI Events Platform

Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.





Source

Related Articles

Back to top button