GenAI might be the least-trustworthy software that exists. Yet IT is expected to trust it. – Computerworld

June 10, 2024

53 4 minutes read

GenAI might be the least-trustworthy software that exists. Yet IT is expected to trust it. – Computerworld — 8 employees dont believe you skeptics doubt unsure 100765411 orig 1.jpg

Trust. It’s the critical word when talking about artificial intelligence in just about all of its forms. Do end-users or executives trust what generative AI (genAI) says? Presumably they do or they would never bother using it.

But is AI indeed trustworthy? (I’ll give you the Cliff Notes version now: No, it’s not.)

But before we even need to consider how trustworthy genAI results are, let’s start with how trustworthy the executives running the companies that create AI algorithms are. This is a serious issue because if enterprise IT executives can’t trust the people who make AI products, how can they trust the products? How can you trust what AI tells your people, trust what it will do with their queries, or be OK with whatever AI wants to do with all of your sensitive data.

Like it or not, when you use their software, you’re saying you’re comfortable with delivering all of that trust.

Let’s start with OpenAI. These days it might be better called “Open” AI? To have a company that doesn’t share the sources it uses to train its model call itself Open is challenging the definition of that word. But I digress.

The journal Artificial Intelligence and Law LiveScience offered a detailed look into OpenAI’s claims that GPT-4 did amazingly well on the bar exam. Stunner: It was a lie. A statistical lie, but a lie, nonetheless.

“Perhaps the most widely touted of GPT-4’s at-launch, zero-shot capabilities has been its reported 90th-percentile performance on the Uniform Bar Exam,” the publication reported. “Although GPT-4’s UBE score nears the 90th percentile when examining approximate conversions from February administrations of the Illinois Bar Exam, these estimates are heavily skewed towards repeat test-takers who failed the July administration and score significantly lower than the general test-taking population.

“When examining only those who passed the exam — i.e. licensed or license-pending attorneys — GPT-4’s performance is estimated to drop to 48th percentile overall, and 15th percentile on essays.”

It also noted that GPT-4, by its very nature, cheated. “Although the UBE is a closed-book exam for humans, GPT-4’s huge training corpus largely distilled in its parameters means that it can effectively take the UBE open-book, indicating that UBE may not only be an accurate proxy for lawyerly competence but is also likely to provide an overly favorable estimate of GPT-4’s lawyerly capabilities relative to humans.” the publication said.

It also concluded that GPT-4 did quite poorly in the written sections, which should come as no surprise to anyone who has asked ChatGPT almost anything.

“Half of the Uniform Bar Exam consists of writing essays and GPT-4 seems to have scored much lower on other exams involving writing, such as AP English Language and Composition (14th–44th percentile), AP English Literature and Composition (8th–22nd percentile) and GRE Writing ( 54th percentile). In each of these three exams, GPT-4 failed to achieve a higher percentile performance over GPT-3.5 and failed to achieve a percentile score anywhere near the 90th percentile,” the publication noted.

Then there’s the matter of OpenAI CEO Sam Altman, who last year was briefly the company’s former CEO. One of the board members who fired Altman has finally gone public and explained her rationale: Altman lied to the board — a lot.

“The board is a nonprofit board that was set up explicitly for the purpose of making sure that the company’s public good mission was primary, was coming first — over profits, investor interests, and other things,” OpenAI former board member Helen Toner said on “The TED AI Show” podcast, according to a CNBC story. “But for years, Sam had made it really difficult for the board to actually do that job by withholding information, misrepresenting things that were happening at the company, in some cases outright lying to the board.”

Toner said Altman gave the board “inaccurate information about the small number of formal safety processes that the company did have in place” on multiple occasions. “For any individual case, Sam could always come up with some kind of innocuous-sounding explanation of why it wasn’t a big deal, or misinterpreted, or whatever. But the end effect was that after years of this kind of thing, all four of us who fired him came to the conclusion that we just couldn’t believe things that Sam was telling us, and that’s just a completely unworkable place to be in as a board — especially a board that is supposed to be providing independent oversight over the company, not just helping the CEO to raise more money.”

Let’s put this into context. Since the first company hired its first CIO, IT execs and managers have struggled to trust vendors. It’s in their nature. So, a lack of trust regarding technology is nothing new. But AI — and specifically genAI and all of its forms — are being given capabilities and data access orders of magnitude more extensive than any software ever before..

And we are being asked to grant this all-but-unlimited access to software that’s been trained on an extensive and secret list of sources — and what it does with the data it captures is also vague and/or secret.

What protects the enterprise from all of this? Vendor-delivered guardrails or, sometimes, in-house-crafted IT-written guardrails. And in the least surprising development ever, companies are now creating applications that are explicitly designed to circumvent guardrails. (They work quite effectively at that task.)

Consider the “abliteration” approach, which is now available on Google Colab and in the LLM Course on GitHub, according to an insightful piece from Hugging Face.

Abliteration is a technique “that can uncensor any LLM without retraining. This technique effectively removes the model’s built-in refusal mechanism, allowing it to respond to all types of prompts. This refusal behavior is mediated by a specific direction in the model’s residual stream. If we prevent the model from representing this direction, it loses its ability to refuse requests. Conversely, adding this direction artificially can cause the model to refuse even harmless requests.”

That means that guardrails as protection mechanisms are relatively easy to overrule or circumvent. And that brings us back to the issue of trust in these companies and their leaders.

As more reports come out, that trust is getting harder and harder to deliver.

Source

June 10, 2024

53 4 minutes read