Data Analytics

Lakehouse dam breaks after departure of long-time Teradata CTO • The Register


Data warehouse stalwart Teradata has shook off its aversion to the lakehouse concept, embracing the idea of performing enterprise analytics on unstructured data – a situation it once argued against.

Founded in 1979, the company pioneered enterprise data warehousing in the decades through to the 2010s, but has since been overshadowed by so-called cloud-native data warehouse products, which promise greater flexibility and lower startup costs.

Teradata has now announced support for open table formats (OTFs) Apache Iceberg and Linux Foundation Delta Lake, embracing an industry trend towards performing analytics on data in-situ, rather than moving it into a single store for BI and other analysis.

Teradata claimed that AI adoption had consolidated data warehouses, analytics, and data science workloads into unified lakehouses. “OTF support further enhances Teradata’s lakehouse capabilities, providing a storage abstraction layer that’s designed to be flexible, cost-efficient, and easy-to-use,” it said in a corporate missive.

The lakehouse concept originates with Teradata rival Databricks, a machine learning and analytics company with a history based around Apache Spark. Databricks launched the concept back in 2020 as a sort of hybrid approach by bringing better governance to the data lakes where organizations store messy data and allowing SQL-based analytics in-situ.

Until 18 months ago, Teradata eschewed the lakehouse concept. Speaking to The Register in late 2022, former CTO Stephen Brobst said that a data lake and data warehouse should be discrete concepts within a coherent data architecture, playing to the vendor’s historic strengths in query optimization and thousand-user concurrency.

“You need to have a unified architecture, but they are discrete things. There is a difference between the raw data, which is really data lake, and the data product, which is the enterprise data warehouse,” Brobst said.

Although Teradata launched its own data lake in August, in part by improving optimization for object stores such as AWS S3, Brobst said there was an important distinction between raw data and the data warehouse, with the latter optimizing query performance and controls governance.

Teradata’s decision to execute a dramatic volte-face is perhaps related in some way to the departure of Brobst, who left the company he helped develop in January after more than 24 years.

Teradata claims its adoption of OTFs Delta Lake and Iceberg brings a “forward-looking dimension to Teradata VantageCloud Lake,” which it says is a “next-generation, cloud-native analytics and data platform for AI” set for public preview on both AWS and Azure in Q2 2024.

Never mind the fact that rival vendors have already made their position around Delta Lake, Iceberg, and Hudi – another OTF – clear, in some cases nearly two years ago.

Apache Iceberg is an OTF designed for large-scale analytical workloads while supporting query engines including Spark, Trino, Flink, Presto, Hive, and Impala. It has spent the last couple of years gathering momentum after Snowflake, Google, and Cloudera announced their support in 2022. More specialist players are also in on the act, including Dremio, Starburst, and Tabular, which was founded by the team behind the Iceberg project when it was developed at Netflix.

Teradata CTO Stephen Brobst drowns data lakehouse concept

READ MORE

Databricks is behind the Delta Table format, but says it is fully open source as it is managed by the Linux Foundation. Last year, SAP and Microsoft announced support for Delta, but both said they could address data in Iceberg and Hudi in time.

Last week, CRM company Salesforce also announced support for Apache Iceberg. In a statement to The Register, it said it was contributing to the open source project and worked with data warehouse and data lake partners Snowflake, Google BigQuery, AWS Redshift, Databricks, and Microsoft (Fabric). It would not confirm its approach to Delta Lake.

Across the OTFs, the goal is roughly the same: to bring the analytics engine of choice to the data, without going through the cost and effort of moving the data. Teradata’s story has always focused on bringing data into one place, and giving it structure, emphasizing optimized queries and high-performance concurrency. What that means in the light of its newfound support for OTFs and the data lakehouse leaves a lot of unanswered questions. It has been offered the opportunity to respond. ®



Source

Related Articles

Back to top button