Artificial intelligence black boxes just got a little less mysterious | Tech News

May 22, 2024

59 3 minutes read

Artificial intelligence black boxes just got a little less mysterious | Tech News — 1716400084 3457.jpg

One of the weirder, more unnerving things about today’s leading artificial intelligence (AI) systems is that nobody — not even the people who build them — really knows how the systems work.

Click here to follow our WhatsApp channel

That’s because large language models, the type of AI systems that power ChatGPT and other popular chatbots, are not programmed line by line by human engineers, as conventional computer programs are.

Instead, these systems essentially learn on their own, by ingesting massive amounts of data and identifying patterns and relationships in language, then using that knowledge to predict the next words in a sequence.

One consequence of building AI systems this way is that it’s difficult to reverse-engineer them or to fix problems by identifying specific bugs in the code. Right now, if a user types “Which American city has the best food?” and a chatbot responds with “Tokyo,” there’s no real way of understanding why the model made that error, or why the next person who asks may receive a different answer.

And when large language models do misbehave or go off the rails, nobody can really explain why. The inscrutability of large language models is a major reason some researchers fear that powerful AI systems could eventually become a threat to humanity.

After all, if we can’t understand what’s happening inside these models, how will we know if they can be used to create novel bioweapons, spread political propaganda or write malicious computer code for cyberattacks? If powerful AI systems start to disobey or deceive us, how can we stop them if we can’t understand what’s causing that behavior in the first place?

But this week, a team of researchers at the AI company Anthropic announced what they called a major breakthrough — one they hope will give us the ability to understand more about how AI language models actually work, and to possibly prevent them from becoming harmful. The team summarised its findings in a blog post called “Mapping the Mind of a Large Language Model.”

They looked inside one of Anthropic’s AI models — Claude 3 Sonnet, a version of the company’s Claude 3 language model — and used a technique known as “dictionary learning” to uncover patterns in how combinations of neurons, the mathematical units inside the AI model, were activated when Claude was prompted to talk about certain topics. They identified roughly 10 million of these patterns, which they call “features.” They found that one feature, for example, was active whenever Claude was asked to talk about San Francisco. Other features were active whenever topics like immunology or specific scientific terms, such as the chemical element lithium, were mentioned. And some features were linked to more abstract concepts, like deception or gender bias.

They also found that manually turning certain features on or off could change how the AI system behaved. For example, they discovered that if they forced a feature linked to the concept of sycophancy to activate more strongly, Claude would respond with flowery, over-the-top praise for the user, including in situations where flattery was inappropriate.

Chris Olah, who led the Anthropic interpretability research team, said these findings could allow AI firms to control their models more effectively.

“We’re discovering features that may shed light on concerns about bias, safety risks and autonomy,” he said. “I’m feeling really excited that we might be able to turn these controversial questions that people argue about into things we can actually have more productive discourse on.” Other researchers have found similar phenomena in these language models. But Anthropic’s team is among the first to apply these techniques . Jacob Andreas, an associate professor of computer science at MIT, who reviewed a summary of the research, characterised it as a hopeful sign that large-scale interpretability might be possible.

CRACKING THE CODE

> Black box Problem can be defined as an inability to fully understand an AI’s decision-making process

> Large language models, such as ChatGPT, are not programmed line by line by human engineers

> So, when they misbehave or go off the rails, nobody can really explain why

> Researchers looked inside one of Anthropic’s AI models — Claude 3 Sonnet

> They used a technique known as “dictionary learning” to uncover patterns in how combinations of neurons were activated when Claude was prompted to talk about certain topics

> They found that one feature, for example, was active whenever Claude was asked to talk about San Francisco

> They also found that manually turning certain features on or off could change how the AI system behaved

> Researchers believe these findings could allow AI firms to control their models more effectively

Source

May 22, 2024

59 3 minutes read

Artificial intelligence black boxes just got a little less mysterious | Tech News

Bank-run accelerator programmes — what are they good for?

Top 10 Biggest Car Manufacturers In The World 2024

Two Arrested for Burglary of Automobile on Highway 7 in Oxford – The Local Voice

10 Artificial General Intelligence (AGI) Companies To Know

everything you need to know

Bank-run accelerator programmes — what are they good for?

Top 10 Biggest Car Manufacturers In The World 2024

Two Arrested for Burglary of Automobile on Highway 7 in Oxford – The Local Voice

10 Artificial General Intelligence (AGI) Companies To Know

everything you need to know

Using Data Analytics and Artificial Intelligence for Public Disclosures

Initiative aids student entrepreneurs – Chinadaily.com.cn

Senate Races to Pass Bill to Reauthorize FAA and Improve Air Travel

Rivian launches second generation of its flagship EV models

Get Into Tech: Entering The Field Through Data Analytics with Ebere Oyekwe, Founder of Tekdlin

Canada – Patent – Generative AI And IP: Challenges In Protecting And Using GenAI Tools

Prediction: 2 Artificial Intelligence (AI) Stocks That Could Be Worth More Than Nvidia 5 Years From Now

Tuum x DDCAP ETHOS Announce Partnership to Provide a Pre-Integrated Fintech Solution for the Islamic Financial Market

Where Design Meets Cyber Security

The A.I. Boom Makes Millions for an Unlikely Industry Player: Anguilla

Biden Administration Announces New Tailpipe Rules Aimed to Expand EVs

Roche subsidiary Foundation Medicine opens new headquarters

Merck, Vertex, and Viking updates

Opinion | Cultivated Meat’s Empty Promise of Revolution

Oral obesity drug from Viking Therapeutics hits key early target

How IBM Watsonx BI Assistant Can Enhance Your Business Analysis

Web Design Trends to Watch in 2024 — SitePoint

Related Articles

Artificial intelligence (AI) | The Guardian

This Top Hedge Fund Thinks Palantir Is the Top Artificial Intelligence Stock in the Market. Is It Right?

Using Artificial Intelligence in Clinical Practice

Artificial intelligence brings historical figure to life as RPI celebrates bicentennial – NEWS10 ABC

Bank-run accelerator programmes — what are they good for?

Top 10 Biggest Car Manufacturers In The World 2024

Two Arrested for Burglary of Automobile on Highway 7 in Oxford – The Local Voice

10 Artificial General Intelligence (AGI) Companies To Know

everything you need to know

Using Data Analytics and Artificial Intelligence for Public Disclosures

Initiative aids student entrepreneurs – Chinadaily.com.cn

Senate Races to Pass Bill to Reauthorize FAA and Improve Air Travel

Rivian launches second generation of its flagship EV models

Get Into Tech: Entering The Field Through Data Analytics with Ebere Oyekwe, Founder of Tekdlin

Canada – Patent – Generative AI And IP: Challenges In Protecting And Using GenAI Tools

Prediction: 2 Artificial Intelligence (AI) Stocks That Could Be Worth More Than Nvidia 5 Years From Now

Tuum x DDCAP ETHOS Announce Partnership to Provide a Pre-Integrated Fintech Solution for the Islamic Financial Market

Where Design Meets Cyber Security

The A.I. Boom Makes Millions for an Unlikely Industry Player: Anguilla

Biden Administration Announces New Tailpipe Rules Aimed to Expand EVs

Roche subsidiary Foundation Medicine opens new headquarters

Merck, Vertex, and Viking updates

Opinion | Cultivated Meat’s Empty Promise of Revolution

Oral obesity drug from Viking Therapeutics hits key early target