Data Analytics

Collibra launches AI Governance, unveils GenAI capabilities


Collibra on Wednesday unveiled new tools aimed at enabling customers to better discover and govern AI models and applications.

Collibra AI Governance, which was introduced in preview in February 2024, is now generally available. In addition, the vendor introduced Collibra AI, a new set of generative AI capabilities, and Collibra Data Notebook, both of which are in the preview phase.

The vendor revealed each during Data Citizens ’24: The Data Intelligence Conference, a user conference hosted by Collibra in Orlando, Fla.

While AI Governance, Collibra AI and Data Notebook have their specific benefits, perhaps one of the most significant aspects of each is that they are available for use, according to Stewart Bond, an analyst at IDC.

Many vendors have introduced capabilities over the past year either featuring or related to generative AI. Few, however, have been made generally available like AI Governance and some have not even reached the public preview stage, like Collibra AI and Data Notebook.

“We have seen many applications of generative AI in data intelligence software emerge in the past year, but mostly in the R&D lab and in demos that may or may not have had some smoke and mirrors involved,” Bond said.

By making them available, Collibra and others such as Informatica and Exasol are providing customers with capabilities that have the potential to meaningfully improve their use of data, he continued.

These capabilities will have an impact by improving the productivity of data stewards and providing opportunities for less technical resources to become more involved in data governance activities.
Stewart BondAnalyst, IDC

“These capabilities will have an impact by improving the productivity of data stewards and providing opportunities for less technical resources to become more involved in data governance activities,” Bond said.

Based in New York City and Brussels, Collibra is a metadata management specialist whose Data Intelligence Platform provides a data catalog along with other tools designed to help customers discover and govern their data to ensure it is actionable safely and securely.

In addition to a data catalog, the vendor provides features that enable users to automate data preparation, test data quality and put in place access controls that make data safe to use while also ensuring regulatory compliance.

In September, Collibra acquired Husprey, a data notebook specialist. Three months earlier, the vendor’s platform update targeted data quality, lineage and discovery.

Governing AI

Just as data was once the domain of a small team of experts, AI until recently has largely been the province of data scientists.

When data was strictly overseen by small teams, there was little need for data governance. Experts understood how data needed to be treated and what regulatory guidelines needed to be followed, and did so.

However, when self-service analytics emerged to extend data exploration and analysis to an audience of business users, enterprises needed to develop data governance frameworks so non-experts could confidently work with data while the larger organization was protected from misuse.

Now, something similar is happening with AI.

Generative AI tools that enable users to engage with models and applications using natural language rather than code are enabling a wider audience to work with AI than just data scientists. As a result, just as enterprises needed to safeguard data use with data governance, they now need to safeguard AI use with AI governance.

Collibra AI Governance was developed to provide full visibility and control over the use of AI models and applications, according to the vendor.

In addition, the tool aims to enable organizations to deliver quality data to AI models and applications so that the outputs from those models and applications can be trusted and help enterprises remain regulatory compliant as new AI regulations emerge.

Kevin Petrie, an analyst at BARC, noted that research shows that less than half of all companies lack sufficient data quality and governance measures to support their AI and machine learning projects.

“That’s a big problem, given the risks that AI and especially GenAI raise in terms of accuracy, privacy, bias, explainability and intellectual property,” he said.

Collibra AI Governance, therefore, addresses a real need.

“Companies need to rapidly adapt their data governance programs to address these risks in the context of their AI and ML initiatives,” Petrie said.

The focus on data quality in Collibra AI Governance, meanwhile, is critical, according to Bond. While AI governance can refer strictly controlling access to models and applications, if it doesn’t also address the data being used to train models and applications, it isn’t going far enough, he noted.

AI governance tools, therefore, need to mitigate risks related to poor data quality and misuse of sensitive data, as well as manage access to the finished products. They need to know where sensitive information is within the data used to train AI and they need to understand what AI models and applications do with that data that feeds them.

“When leveraging data with models, the most recent, high-quality and relevant data needs to be leveraged to improve the relevancy and accuracy of what gets returned by the model,” Bond said. “This is where the need to connect data intelligence with model intelligence is part of AI Governance and can help improve the outcome while reducing risks.”

Other new capabilities

While AI Governance aims to help Collibra customers oversee the use of their AI, Collibra AI was designed to enable users to use machine learning and AI to automate data quality and data governance functions.

In addition to potentially expanding data management and analytics to a broader audience by enabling true natural language processing (NLP), one of the promises of generative AI is that it can make data experts more efficient.

Experts, like non-technical users, can take advantage of NLP to reduce the amount of code they need to write to carry out tasks. But more than that, they can automate workloads that previously had to be done manually, including some of the work needed to ensure data quality and put governance measures in place.

Specifically, Collibra AI is an AI engine that enables users to automatically generate data quality rules using natural language and automate the generation of asset details when data is added to their data catalog.

The engine should help make Collibra’s platform easier to use by non-technical experts, Petrie noted. However, their work will still need to be inspected by more seasoned users to ensure that models and the data used to inform them are accurate and properly governed.

“Collibra AI helps … address some user concerns about the learning curve required for their product,” Petrie said. “As with most AI use cases today, this still might require expert users to inspect the AI outputs before putting them into production. But it is an important move toward simplifying the product, improving data quality and democratizing usage.”

In addition, AI Governance and Collibra AI complement one another, he continued.

“Collibra is improving its governance capabilities with AI, for example, by auto-generating data quality rules,” Petrie said. “And Collibra is governing data to improve AI outputs, for example, by cataloging and observing AI models as well as the data that feeds them.”

Lastly, Collibra Data Notebook resulted from the vendor’s acquisition of Husprey and aims to provide customers with a simplified means of querying and sharing data using SQL. The new feature is integrated with Collibra’s data catalog, which reduces data isolation while adding data governance.

Simplified querying, meanwhile, has the potential to attract new users to Collibra and help the vendor compete with others such as Hex and SingleStore that also offer SQL notebooks, according to Bond.

SQL notebooks are similar to the Python and Jupyter notebooks used by data scientists to search data and conduct advanced analysis. Python and Jupyter, however, are advanced data languages while SQL is a more accessible programming language.

“Collibra Data Notebooks brings the same notebook concept to users that aren’t as comfortable with more advanced data languages but are comfortable using SQL,” Bond said.

Customer input was the impetus for developing the three new features, according to Laura Sellers, Collibra’s chief product officer.

AI Governance resulted from their need to train AI on trusted data; Collibra AI was derived from the need to improve productivity; and Data Notebooks came from a request from with the vendor’s user community.

“Since our founding, Collibra has tied innovation to input from our customers,” Sellers said.

The aspects of an AI governance framework.
Enterprise AI governance framework.

Next steps

With AI Governance now generally available, one of the main focal points of Collibra’s roadmap will be adding new features to the tool and improving existing ones, according to Sellers.

“On deck, we will continue to innovate and advance AI Governance as this space evolves rapidly, as evidenced by the number of new regulations on the horizon surrounding how organizations deploy AI,” she said.

In addition, the vendor plans to work on user experience by adding more generative AI capabilities and providing new data products aimed at helping users derive value from their data, Sellers continued.

That focus on improving support for AI is appropriate, according to Petrie. But beyond internal product development, Collibra could benefit from integrating with dedicated AI and machine learning platforms from vendors such DataRobot and Dataiku.

“This would further help data science teams train, deploy and optimize AI and ML models,” Petrie said.

Bond, meanwhile, suggested that Collibra expand its data management capabilities to include unstructured data.

Historically, business intelligence and data science have dealt largely with structured data such as transactions and financial records and not unstructured data such as text, images and audio filed. Unstructured data, however, makes up an estimated 80% or more of all data, meaning that organizations are missing out on the potential insights such data could provide.

As AI becomes more prevalent, enterprises are making a more concerted attempt to operationalize unstructured data.

For example, vector embeddings that assign numerical representations to unstructured data to give it some form of structure have become popular. Beyond giving structure to unstructured data, vector embeddings enable similarity searches that help users discover data that can be fed into retrieval-augmented generation pipelines that feed AI models and applications.

“Many of the data intelligence software vendors, including Collibra, have been focused on structured data, but there is a need to capture intelligence about all kinds of data in the enterprise,” Bond said.

However, given the greater potential for private and other sensitive information to exist in unstructured data, such data needs the same governance as structured data, he continued.

“I would like to see Collibra and other data intelligence software vendors expand into capturing intelligence about unstructured data so that chief data officers can have visibility into the entire data estate,” Bond said.

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.



Source

Related Articles

Back to top button