a data standard to underpin lawyers’ use of generative AI
Much has been claimed about the impact of generative artificial intelligence technology on professional services. However, even though more lawyers are using generative AI for tasks such as drafting contracts or providing initial legal opinions, it is evident that it cannot yet promise to be a panacea.
One ever clearer constraint is data. Generative AI needs a solid foundation of accurate and up-to-date information, with widely agreed legal definitions, if it is to produce reliable outputs for the profession.
“AI is all about data,” says Ryan O’Leary, a legal technology expert at research company IDC. “‘Garbage in, garbage out’ has never been truer. By standardising data across stakeholders . . . in theory, the [legal] industry could ensure that better data is being used to train [AI] models.”
If such a system became the norm, lawyers everywhere would be saved a lot of time, money and stress, experts say. But the barrier to creating such a system is the sheer initial effort required.
Despite law’s focus on precise definitions, the sector lacks a universal taxonomy — a scheme of classification of legal terms — even within the same jurisdiction.
Firms have their own taxonomies and data management systems, each with minor differences. And that can cause confusion when lawyers search for, and exchange, information electronically.
If, for example, one person is preparing to sue another person, three lawyers might each categorise the legal matter slightly differently.
“Lawyer one would say it is a negligence claim, and they’d be correct,” says Damien Riehl, a technology and intellectual property lawyer. “Lawyer two would say, well, it is, but it’s also a misrepresentation claim, and they would be correct. And lawyer three would say yes, it’s also that, but it’s also a defamation claim because [one party] is saying something false about [another person].”
Riehl is also co-leader of a not-for-profit organisation — Standards Advancement for the Legal Industry (Sali) — which hopes to tackle this problem.
It has developed a common data language and standard for organising, defining, and categorising (“tagging”) contracts, court rulings, patents, and other legal data that the industry churns out.
“Historically, legal tech vendors, law firms, and their clients have all been islands that have not been able to communicate with . . . each other effectively,” says Riehl. “Sali is a way to be able to connect those islands.”
The aim is to create the legal equivalent of an electronic healthcare patient record, that any computer system can understand and exchange. And that, argue Sali supporters, will improve lawyers’ productivity and, therefore, benefit their clients.
Established in 2017, Sali comprises legal industry professionals from large law firms, in-house corporate legal teams, legal operations, big tech companies including Microsoft, and specialist legal software providers.
The standard is supported by industry groups such as the Corporate Legal Operations Consortium and International Legal Technology Association.
How long have you got?
Prominent legal tech suppliers are already starting to incorporate the standard into their products, the organisation says. But it remains unclear how long this will take.
For those that are using it, Sali reduces the ambiguity in data and terminology, which means lawyers spend less time “trying to understand what this actually means”, says Imran Aziz, legal tech product manager at Thomson Reuters, the legal data and media company.
If, instead, law firms and in-house teams develop their own legal taxonomy in isolation, it is “time consuming and prone to duplication and errors”, he says.
“A standard way of communication between technical solutions using standard terminology [such as Sali] helps simplify these interactions and reduces cost,” Aziz explains.
However, the legal sector is playing catch-up with other sectors, such as finance and healthcare (electronic medical records), that have their own data standards and taxonomies, notes Riehl.
And law firms and legal teams will find standardisation is a big task. Riehl estimates that, for a large law firm with five existing taxonomies scattered across 10 IT systems, switching to the Sali data standard can take about a year.
The two-step process involves a law firm or company comparing its current legal taxonomy to Sali’s, spotting any differences and updating the previous taxonomy to match the Sali data specifications. This may require legal data to be tagged manually — although Sali provides free open-source software to automate the process.
Law firm Ogletree Deakins has been using Sali for about two years. Timothy Fox, director of practice innovation and analytics at the South Carolina firm, says it used Sali to create an internal legal taxonomy, based mainly on litigation data in its document management system. “It’s very easy to use,” he says.
Cataloguing data has helped Ogletree, by enabling its lawyers to find documents faster, and keep track of legal procedures and developments in litigation court filings and legal motions. Fox explains: “We’ve now catalogued our [document management system] using a taxonomy that we didn’t have previously.”
An additional benefit for Ogletree, and other law firms, Fox adds, will be if all big legal tech suppliers — such as Thomson Reuters, Bloomberg and LexisNexis — use the Sali standard in their products. It would also make it easier for law firms and company legal departments to switch legal IT suppliers.
He predicts: “If you want to use this [legal] data source, you’re going to need to be on the Sali standard . . . and that will be the turning point for it.”
Sali is a promising prospect because it is trying to be a global standard and has wide backing. But, despite steady growth and becoming a prominent legal data standard, Sali is far from universally used. Alternatives include the “Uniform Task-Based Management System” codes, developed by the American Bar Association for different types of legal services. While Sali’s taxonomy is considered extensive, it is not well known globally.
So, given that many legal firms and in-house legal departments will have their own data management systems and taxonomies, why should they undergo the time and expense of moving to a new system?
The need for evidence
It is unclear how many law firms and corporate legal departments are using Sali so far. Riehl declines to give an estimate, saying that the open-source, decentralised nature of its technology standard makes it hard to keep a tally of users.
Lisa Maxwell, general counsel at FE Fundinfo, a provider of information and technology for the investment management industry, says: “There’s a degree of resistance to change — in the sense that lots of corporates and in-house [legal] teams already have quite entrenched data systems or data management systems.”
If, however, legal industry leaders start to use Sali, and demonstrate the tangible benefits, they might provide the “impetus” for the rest to follow, she believes.
Making Sali easier to implement may help it become more common. Last year, US legal tech start-up 273 Ventures released an AI system to tag legal data automatically using the Sali standard.
Daniel Katz, 273 co-founder and a law professor at Illinois Tech’s Chicago-Kent College of Law, suggests the laborious task of manually tagging legal data to match the Sali standard has created a “bottleneck of adoption”. Big corporate legal departments can generate hundreds of thousands or more lines of legal data each month, so manually tagging them is impractical and “way too expensive”.
AI could automate much of the creation of a new legal data standard. But, until the legal sector has an extensive — and extensively adopted — legal taxonomy in place, all the other potential benefits of generative AI for lawyers may be lost in translation.